What does Thuki even mean?

Thuki (Thư ký) is Vietnamese for secretary: a personal assistant who handles the details so you can focus on the work. The product is built for quick, throwaway conversations powered by local AI models: ask something, get an answer, move on. No history you didn't ask for, no cloud logging everything you type. Long term, Thuki is being built toward a fully agentic assistant that can actually do work on your behalf (Gmail, Calendar, Slack, and more via MCP integrations). The kind of assistant that handles the task, not just the answer.

Which Mac does Thuki run on?

Apple Silicon Macs (M1 or later) running macOS 13 Ventura or newer. Ventura is the floor because two capabilities Thuki depends on (the NSPanel overlay that floats above every app including fullscreen, and the HID-level event tap that intercepts your shortcut system-wide) only behave reliably from that release onward. Intel Mac support is on the roadmap.

Do I need an internet connection?

No. Thuki uses Ollama to run language models directly on your hardware. After the initial model download, everything is fully offline.

Which models can I use?

Any model Ollama supports: Llama 3, Mistral, Phi-3, Qwen, Gemma, and more. New models become available automatically as Ollama updates.

Where is my data stored?

A single SQLite file on your disk. There is no backend, no server, no log you cannot cat. Delete the file and the data is gone.

Is there a subscription or per-query charge?

No. Once you download Thuki there is no cloud service billing you. Your GPU does the work.

Does Thuki work with every app?

Yes. Highlight text in any app, summon Thuki with your keyboard shortcut, and type a slash command. The system-wide overlay works wherever your cursor is.

Thuki Is Not an AI Model. Here Is What It Actually Does

A confusion comes up almost every time Thuki gets introduced to someone new: "Wait, isn't this just a wrapper around an open-source model?" The implied criticism is that Thuki is somehow cheating. It does not make the AI, it just uses the AI, so why is it a product at all?

The confusion is fair, because the AI industry talks about "AI apps" as if they are monolithic things. They are not. Every AI product is three distinct layers, and which layer a product actually owns is the single most useful question to ask when evaluating it.

This post is the answer: a mental model that clarifies what Thuki is, what it is not, and why the separation is the point.

The Three Layers of Any AI Product

Every AI product, without exception, has three layers:

Interface. What the user actually touches: the chat window, the keyboard shortcut, the screen on the phone, the developer's API client. This is the experience layer.
Runtime. What loads the model into memory and runs inference: the GPU drivers, the inference engine, the network infrastructure for cloud products, the OS-level acceleration for local products. This is the execution layer.
Model. The actual neural network weights produced by training: the billions of parameters that decide what tokens come next. This is the intelligence layer.

These three layers are independent. The same model can power many interfaces. The same interface can run many models. The same runtime can host many models from many providers. Treating them as a single thing is the mistake that makes the wrapper question feel like a gotcha when it is actually a category error.

Concrete example: ChatGPT

Layer	What it is
Interface	`chat.openai.com` web app, ChatGPT macOS/iOS/Android apps
Runtime	OpenAI's GPU infrastructure (private datacenters)
Model	GPT-4o, GPT-5, o3, and other OpenAI-trained models

OpenAI happens to own all three layers, which makes ChatGPT feel like one indivisible product. But the layers are still there. If OpenAI replaced GPT-4o with a new model tomorrow, the interface would not change. If they ported the web app to a different language, the model would not care.

Concrete example: Thuki

Layer	What it is
Interface	Thuki overlay, hotkey summon, AX context capture, screen attach, conversation UI
Runtime	Ollama running on `localhost:11434` (independent open-source project)
Model	Whatever you pulled via `ollama pull`: Llama 3.2, Gemma 3, Qwen 2.5, Mistral, others

Thuki owns the interface layer. The runtime is Ollama (a project Thuki did not build and does not maintain). The model is whatever the user chose to install. Three different layers, three different owners.

This is not a flaw. This is how Thuki is supposed to work, and the reasons matter.

What Thuki Actually Does

If Thuki is the interface layer, what does that layer actually include? More than the wrapper framing suggests.

Window management on macOS. The overlay is an NSPanel, configured to float above fullscreen applications. This is not a flag on a standard window; it requires tauri-nspanel bridging into AppKit internals. Most desktop frameworks cannot do this at all.

System-wide hotkey detection. The double-tap Control shortcut runs on a CGEventTap at the HID layer, the lowest point in the macOS input stack. This survives focus changes, secure input mode, and the macOS 15 Sequoia event-routing changes that break easier approaches.

Context capture from the focused app. When the overlay activates, Thuki queries the macOS Accessibility API (AXUIElement) to read whatever text is selected in the focused application. If the focused app does not expose AX cleanly (most Electron apps), Thuki falls back to a clipboard-preserving Cmd+C simulation with exponential backoff.

Screen-aware context. The /screen command captures the current display and either passes it to a vision-capable model or runs on-device OCR via Apple's Vision framework. The screenshot never leaves your machine.

Streaming token rendering with cancellation. When the model is responding, Thuki streams tokens to the chat UI as they arrive from Ollama, with markdown rendering and code-block syntax highlighting computed incrementally. The user can cancel mid-stream.

Conversation persistence. Conversations are stored locally in a bundled SQLite database. No cloud sync, no account, no leak path.

Permission orchestration. First-launch onboarding walks the user through the Accessibility permission grant with polling, so the app activates the moment the toggle flips, without an app restart.

Model selection and inventory. The model picker queries Ollama's /api/tags endpoint as the authoritative source, falls back gracefully if the previously-selected model was uninstalled, and refuses to invent a default the user did not pick.

The workflow itself. Stitching all of the above into a single seamless experience: hotkey fires → context is captured → window appears → user types → tokens stream → user dismisses → window vanishes → context resets. The composition is the product.

If that is "just a wrapper," every web browser is "just a wrapper around HTTP."

What Thuki Does Not Do

Honesty matters here too. Thuki is the interface layer specifically because it explicitly does not do certain things.

Thuki does not generate text. Every token in every response comes from a model running in Ollama. Thuki forwards the request, streams the bytes back, and renders them. The intelligence is the model's.

Thuki does not decide what answers are good. Quality, factuality, tone, and reasoning depth are all properties of the model. Switching from Llama 3.2 to Qwen 2.5 changes the answers; switching from Thuki to a different interface (with the same model running underneath) does not.

Thuki does not run a backend. There is no Thuki server. No analytics, no telemetry, no Thuki account database. The interface runs entirely on your machine, and the only network traffic Thuki itself generates is software updates from GitHub.

Thuki does not bundle a model. Models are downloaded via Ollama, not Thuki. The Thuki app binary is under 100 MB; the model files (2-8 GB each) live in ~/.ollama/models and are managed by Ollama, not Thuki.

Thuki does not train, fine-tune, or modify models. It accepts whatever the model returns and renders it. It does not post-process responses, apply guardrails, or filter content.

This is deliberate scoping. A product that tries to own all three layers is a much larger project (OpenAI, Anthropic, and Google have thousands of employees between them). A product that owns one layer well can be built and maintained by a small team.

Why the Separation Is the Point

The three-layer separation is not a compromise Thuki makes because building all three is hard. It is a deliberate design choice with real benefits to the user.

You can swap models freely. When a better open-source model is released next month, you can be running it the same day. Thuki does not need a release, an update, or a feature flag. The interface keeps working; the brain changes.

You are not locked into anyone's model quality curve. ChatGPT's user gets exactly the models OpenAI ships, on OpenAI's schedule. Thuki's user gets whatever the open-source community has produced, on the community's schedule. That schedule has been faster than OpenAI's for the last 18 months on the categories of model that matter for daily local use.

Privacy is enforced at the runtime layer, not promised at the interface layer. Most "private" AI products promise that the company will not misuse your data. Thuki's privacy is structural: the data never leaves your machine because the runtime (Ollama) is on your machine. There is no Thuki server to promise anything about. When BYOK ships, the trust boundary moves to whichever provider you chose, on your terms.

Interface improvements do not depend on model releases. Thuki can ship a better window animation, a better hotkey config, a better OCR pipeline, without waiting on any model. Conversely, when a new model is released, Thuki gets the benefit without shipping a thing.

When BYOK arrives, frontier models work without Thuki "supporting" each one specifically. Any OpenAI-compatible API works. Thuki does not need to write a plugin for each new cloud provider; the interface speaks one protocol and the rest is configuration.

The shorthand: Thuki competes on the interface layer, and only on the interface layer. Whether the answers are good is a model question, not a Thuki question.

How to Compare AI Products Honestly

Once the three layers are visible, the right comparisons stop looking like cross-category fights and start looking like layer-by-layer choices.

Comparison	What you are actually comparing
ChatGPT vs Claude	Model layer (GPT-5 vs Claude Opus)
ChatGPT app vs ChatGPT API	Interface layer (web/native vs raw API)
Ollama vs OpenAI's infrastructure	Runtime layer (your Mac vs OpenAI's GPUs)
Thuki vs LM Studio	Interface layer (ambient overlay vs windowed chat app)
Thuki vs Raycast AI	Interface layer (local-first overlay vs cloud-only launcher)
Llama 3.2 vs Qwen 2.5	Model layer (two open-source models)

"Thuki vs ChatGPT" is not a real comparison because they sit on different layers. The honest comparison is Thuki + Llama 3.2 + Ollama vs ChatGPT app + GPT-5 + OpenAI infrastructure: full stack against full stack. And when you compare them that way, the answer depends on what the user actually values (privacy, cost, flexibility, frontier capability), not on which app is "better."

Frequently Asked Questions

So Thuki is just a wrapper? Thuki is the interface and workflow layer for a local-AI stack. Calling it "just a wrapper" is like calling a code editor "just a wrapper" because it does not implement the compiler. The interface layer is where the day-to-day product experience actually lives, and building a real macOS overlay with NSPanel + CGEventTap + AX is non-trivial engineering. The model layer is delegated to Ollama on purpose, not by accident.

Why doesn't Thuki bundle a model? Bundling a model would lock every user to whatever was current at install time, balloon the download to 5+ GB, and force a Thuki release every time a better open-source model dropped. Delegating to Ollama means model updates are independent of Thuki updates, and the user picks. Both are features.

Can Thuki improve without a model upgrade? Yes, often. Better hotkey behavior, better window animations, better OCR pipelines, better context capture, BYOK support, better permission flows, conversation history features: all of these live entirely in the interface layer and can ship without touching the model layer.

What happens if Ollama changes its API? The Ollama HTTP API is small (essentially /api/generate, /api/chat, /api/tags, and /api/show) and stable, but if it ever changes, Thuki adapts in the model-client layer. Nothing else in the stack is affected. This is one of the reasons the layers are kept independent.

What about apps that bundle everything together? Some products try to own all three layers (interface + runtime + model). OpenAI does. Anthropic does. Local-only apps like LM Studio and Jan own the first two and let you pick the model. The trade-off is always the same: tighter integration vs. more user flexibility. Thuki picked the flexibility side intentionally.

If Thuki only owns the interface, what is the product? The product is the workflow: NSPanel + CGEventTap + AX capture + screen attach + streaming render + conversation persistence, composed into one ambient macOS experience. The model is the engine; Thuki is the car. Engines are commodities; cars are not.

The Short Version

Thuki is not an AI model. It is not a runtime. It is the interface and workflow layer of a local AI stack, the part that decides how you reach for AI, what context the AI sees by default, and how the experience fits into the rest of your day on a Mac.

The three layers (interface, runtime, model) are independent for good reasons. The model can change without the interface changing. The interface can improve without the model improving. Privacy is enforced at the runtime layer, not promised at the interface. And when you compare AI products, the honest comparison is layer-by-layer, not "this app vs that app."

Next steps:

New to Thuki? Read the introduction post
The philosophy: why Thuki is local-first
The landscape: how Thuki compares to other macOS local AI tools
The engineering: building Thuki on macOS
The code: github.com/quiet-node/thuki