All posts
macOS

Thuki v0.15: a Built-In Local AI Engine for macOS

Thuki v0.15 ships its own built-in AI engine. Install the app, pick a local model in-app, and start asking. No Ollama required, no API keys, no cloud.

Written by

Logan Nguyen

Last updated

For its first months, Thuki had one hard dependency. Before it could answer anything, you had to install Ollama, pull a model from the command line, and leave a server running in the background. The overlay was the easy part. The setup was the wall.

v0.15 takes that wall down. Thuki now ships its own AI engine. You install the app, pick a model inside it, and start asking. There is nothing to install alongside it, no second tool to keep alive, no Terminal.

Thuki (thư kí, Vietnamese for secretary) is a floating AI secretary for macOS. Double-tap Control to summon it over any app, including fullscreen ones, ask, and dismiss. It runs entirely on your Mac, with no cloud and no API keys. As of v0.15, it runs the model too.


The shift: Thuki runs the model now

Until now, Thuki was the interface and Ollama was the engine. You supplied the engine. That split kept the app small, but it pushed the hardest part of getting started onto you: install a separate runtime, learn ollama pull, keep a server alive.

v0.15 folds the engine in. Thuki bundles a llama.cpp llama-server and manages it for you. It spawns the process, supervises it, can keep it warm between messages, and shuts it down when you quit. The server binds to 127.0.0.1 only, with its web UI disabled, so nothing outside your Mac can reach it. The llama.cpp release Thuki ships is pinned and SHA-256 verified at build time, and every model you download is checked against a pinned Hugging Face revision before it installs.

The result is the thing that was missing: install Thuki, pick a model in-app, and start asking. That is the whole onboarding now.

Ollama is not gone. It is one provider among others, optional rather than required. If you already run Ollama and want Thuki to use it, switch to it in Settings. But you no longer need it, or anything else, to get an answer.


Why this matters

The old first run was a checklist: install Ollama, pull a model in the terminal, confirm it was serving, then install Thuki. Every step was a place to get stuck, and the people most likely to want a private, local assistant were often the least interested in babysitting a command-line server to get one.

Now the model lives behind the same app you already opened. You download it from a picker, sized to your Mac, and Thuki runs it. The command line is optional, not the path in.

This also tightens the privacy story, because privacy here is not a promise, it is the shape of the system. Your prompt goes to a model running on your own machine. There is no Thuki server to send it to, no account, and the app sends no telemetry. With a model downloaded, turn Wi-Fi off and everyday chat still works. The honest caveat: anything that reaches past your Mac is opt-in and obvious. /search fetches the public web when you choose to run it, and the app checks GitHub for updates. Inference, the part you do all day, stays local.


What about LM Studio and Jan?

Bundling a llama.cpp engine and a one-click model library is not new. LM Studio and Jan have done it well for years, and on model management v0.15 puts Thuki at parity with them, not ahead. That is the honest read, and it is exactly the point.

Once setup is a solved problem for everyone, the only thing left that genuinely differs is how you reach the model. LM Studio, Jan, Msty: each is a window you switch to, paste into, read, and switch back from. Thuki is the one that comes to you. Double-tap Control and it is over whatever app you are already in, holding your highlighted text, and gone again when you dismiss it. v0.15 did not invent a new way to run models; it removed the last reason to tolerate a windowed app just to get the summon-anywhere overlay. For the full landscape, see how Thuki compares to other macOS local AI tools.


What's new in v0.15

A built-in engine, and a model library to feed it

The engine is a bundled llama.cpp llama-server that Thuki spawns, supervises, and shuts down for you. To feed it, Thuki has an in-app model library: download GGUF models such as Llama, Gemma, Qwen, etc. straight from the Hugging Face Hub without leaving the app. Onboarding offers a curated set of Staff Picks sized for different Macs, and Settings, Models, Discover lets you browse and pull anything else. Downloads are content-addressed, resumable, and SHA-256 verified. No accounts, no API keys, no cost per query.

Instant follow-ups

Loading a model into memory takes a moment. Doing it on every single message would make Thuki feel slow exactly when you want it to feel quick. So Thuki can keep your model warm between messages: the first ask loads it, and every follow-up after that starts right away instead of stalling to reload.

A memory-fit hint before you download

Bigger models are smarter and hungrier, and a parameter count tells you nothing about whether one will actually run well on your machine. So the model library shows a memory-fit verdict for each model against your Mac: Comfortable, Tight, or Heavy. It is a quick read before you spend the download, so you pick a model that fits instead of guessing.

Step-by-step reasoning, on demand

Some models can reason through a problem step by step before answering, which helps on harder questions and costs time on easy ones. Thuki leaves that off by default so everyday asks stay fast, and lets you turn it on per message: add /think to a message and a reasoning-capable model will work through it first. It only applies to models that support reasoning, since not every model does.

Providers: built-in by default, Ollama if you want it

End users get two providers. The built-in engine is the default and needs no setup. Ollama is there for people who already run it and prefer to. Switching between them frees the other one's memory, so only one model is resident at a time and you are never paying for two. Support for pointing Thuki at your own OpenAI-compatible server is on the roadmap, not in this release.

Everything that already made Thuki Thuki

The engine is new. The reason you reach for Thuki is not. The overlay still floats above every app, fullscreen included, on a double-tap of Control. Highlight text anywhere and summon Thuki to open with that selection pre-filled as a quote. Use /screen or the screenshot button to attach your screen as context, and paste or drag images straight into the chat. Commands like /extract, /explain, /tldr, and /translate pull the text out of an image with on-device macOS Vision OCR, so they work on screenshots and documents even when the model has no vision of its own. They read text, not scenes, so a textless photo still needs a vision model. Swap models mid-conversation and Thuki carries the history across. Conversations live in a local SQLite database on your machine. Agentic /search is still available too, though it is not yet bundled: enabling it currently means cloning the repo to run two local Docker services, and a zero-setup version is on the roadmap.


Getting Thuki onto your Mac

Installing is now one line:

curl -fsSL https://thuki.app/install.sh | sh

This downloads the latest Thuki.dmg over HTTPS, verifies its RSA-4096 signature with the openssl already on your Mac, installs it to /Applications, and automatically launch Thuki for you.

Prefer to install by hand, or want the first-launch permission details? The Installation Guide has the manual DMG path and the rest. Either way you finish in the same place: open Thuki, pick a starter model when onboarding offers one, and double-tap Control to ask your first question.


What's next

v0.15 is a big step, and Thuki is still early, beta software being built in the open. A few things are coming, listed here so you know the direction, with no claim about timing:

  • Connect your tools via MCP: draft a reply, summarize a thread, or schedule a meeting from where you already are.
  • Type with your voice: speak, get clean text in any app.
  • Notes from any meeting: live transcripts and summaries.
  • Automate the routine: teach Thuki a multi-step task and run it on a word.
  • Built-in /search with no Docker to run yourself.
  • More providers: bring your own OpenAI-compatible server alongside the built-in engine and Ollama.

Whatever comes next, the aim stays the same: a local-first secretary that runs open models on your own machine. Anything that reaches beyond your Mac will always be opt-in.


Founder note

Hey, Logan here. Thanks a lot for reading up to this point!

Anywho, I'm building Thuki in the open and around how people actually use it, so if you have feedback, an idea, or just want to say hi, reach out on X. Or leave your email and I'll reach out personally. I'd love to work with you to make Thuki genuinely useful for you.


Frequently Asked Questions

Do I still need Ollama to use Thuki?

No. As of v0.15, Thuki ships its own built-in AI engine and runs models itself, so a fresh install can answer questions with nothing else installed. Ollama is now optional. If you already run it and prefer to, switch to it in Settings; otherwise you can ignore it entirely.

Is Thuki really free?

Thuki is free and open source under Apache 2.0. Local inference costs you nothing, with no per-query fees, because the work happens on your own hardware rather than a billed cloud. There is no account and no subscription. The open-source models it runs are free to download too.

What models can I run?

Any GGUF model from the Hugging Face Hub: Llama, Gemma, Qwen, and many more. Onboarding suggests a few Staff Picks sized for different Macs, and Settings, Models, Discover lets you browse and download the rest. Model files are large, roughly 2 to 12 GB each, so the first download takes a few minutes.

Does Thuki send my data anywhere?

Inference stays on your machine. Your prompt goes to a model running locally, with no Thuki server in the path, no account, and no telemetry from the app. You can verify it by turning off Wi-Fi and using everyday chat. The exceptions are opt-in and obvious: /search fetches the public web when you run it, and the app checks GitHub for updates.

What Mac do I need?

Thuki is a macOS app, built and tuned for Apple Silicon, where models run on the built-in GPU. Larger models want more memory, so the in-app memory-fit hint flags whether a given model is Comfortable, Tight, or Heavy on your machine. Check the GitHub releases page for the builds currently published.

Why does the built-in engine feel faster on follow-ups?

Thuki can keep your model warm in memory between messages. The first ask loads the model; every follow-up reuses it instead of reloading, so replies start right away. Switching to a different model or provider frees that memory, keeping only one model resident at a time.

Can I still install the old way?

Yes. The one-line installer is the easy path, but the manual DMG route still works: download Thuki.dmg from the releases page, drag it to Applications, and clear the quarantine flag once with xattr -rd com.apple.quarantine /Applications/Thuki.app. Both end at the same app.

Is this a finished, stable release?

No, and the badge says so: Thuki is beta, work in progress, built in the open. v0.15 is a real milestone, the built-in engine, but expect rough edges and frequent updates. Bug reports and stars on GitHub genuinely help shape what ships next.


The Short Version

Before v0.15, Thuki was a great overlay bolted to a setup wall: install Ollama, pull a model in the terminal, then install the app. Now Thuki runs its own llama.cpp engine, downloads models for you inside the app, and keeps them warm between messages. You install Thuki, pick a model, and ask. That is it. Ollama drops from required to optional, and getting Thuki onto your Mac is now a single line.

It is still local-first and private by construction: inference runs on your machine, history lives in a local database, the app sends no telemetry, and anything that reaches the network is something you chose. And it is still early software, so it will keep changing.

Next steps: