All posts
Local AI

Thuki Is Not an AI Model. Here Is What It Actually Does

Every AI product is three layers: interface, runtime, model. Thuki owns the interface and ships its own runtime; only the model is yours to pick. This changes how you evaluate any AI app.

Written by

Logan Nguyen

Last updated

A confusion comes up almost every time Thuki gets introduced to someone new: "Wait, isn't this just a wrapper around an open-source model?" The implied criticism is that Thuki is somehow cheating. It does not make the AI, it just uses the AI, so why is it a product at all?

The confusion is fair, because the AI industry talks about "AI apps" as if they are monolithic things. They are not. Every AI product is three distinct layers, and which layer a product actually owns is the single most useful question to ask when evaluating it.

This post is the answer: a mental model that clarifies what Thuki is, what it is not, and why the separation is the point.


The Three Layers of Any AI Product

Every AI product, without exception, has three layers:

  1. Interface. What the user actually touches: the chat window, the keyboard shortcut, the screen on the phone, the developer's API client. This is the experience layer.
  2. Runtime. What loads the model into memory and runs inference: the GPU drivers, the inference engine, the network infrastructure for cloud products, the OS-level acceleration for local products. This is the execution layer.
  3. Model. The actual neural network weights produced by training: the billions of parameters that decide what tokens come next. This is the intelligence layer.

These three layers are independent. The same model can power many interfaces. The same interface can run many models. The same runtime can host many models from many providers. Treating them as a single thing is the mistake that makes the wrapper question feel like a gotcha when it is actually a category error.

Concrete example: ChatGPT

Layer What it is
Interface chat.openai.com web app, ChatGPT macOS/iOS/Android apps
Runtime OpenAI's GPU infrastructure (private datacenters)
Model GPT-4o, GPT-5, o3, and other OpenAI-trained models

OpenAI happens to own all three layers, which makes ChatGPT feel like one indivisible product. But the layers are still there. If OpenAI replaced GPT-4o with a new model tomorrow, the interface would not change. If they ported the web app to a different language, the model would not care.

Concrete example: Thuki

Layer What it is
Interface Thuki overlay, hotkey summon, AX context capture, screen attach, conversation UI
Runtime Thuki's own built-in engine: a bundled llama.cpp llama-server it starts and supervises (Ollama optional)
Model An open model you download: Llama, Gemma, Qwen, Mistral, and others

Here is the part people get wrong: Thuki owns two of the three layers. It owns the interface, and as of v0.15 it owns the runtime, because it ships its own inference engine and runs the model for you. The one layer Thuki does not own is the model itself: those are third-party open weights you choose and download. Two layers Thuki, one layer yours.

This is not a flaw. This is how Thuki is supposed to work, and the reasons matter.


What Thuki Actually Does

Thuki owns the interface and the runtime. Here is what those two layers actually include. It is more than the wrapper framing suggests.

Window management on macOS. The overlay is an NSPanel, configured to float above fullscreen applications. This is not a flag on a standard window; it requires tauri-nspanel bridging into AppKit internals. Most desktop frameworks cannot do this at all.

System-wide hotkey detection. The double-tap Control shortcut runs on a CGEventTap at the HID layer, the lowest point in the macOS input stack. This survives focus changes, secure input mode, and the macOS 15 Sequoia event-routing changes that break easier approaches.

Context capture from the focused app. When the overlay activates, Thuki queries the macOS Accessibility API (AXUIElement) to read whatever text is selected in the focused application. If the focused app does not expose AX cleanly (most Electron apps), Thuki falls back to a clipboard-preserving Cmd+C simulation with exponential backoff.

Screen-aware context. The /screen command captures the current display and either passes it to a vision-capable model or runs on-device OCR via Apple's Vision framework. The screenshot never leaves your machine.

The inference engine. As of v0.15 this is Thuki's own. It bundles llama.cpp's llama-server and manages the whole lifecycle: it starts the engine on demand when you send a message, keeps the model warm between messages so follow-ups are instant, and stops it to free memory when you are done. The server binds to loopback only and speaks an OpenAI-compatible API. There is nothing for you to install and no separate runtime to wire up.

Streaming token rendering with cancellation. When the model is responding, Thuki streams tokens to the chat UI as they arrive from the engine, with markdown rendering and code-block syntax highlighting computed incrementally. The user can cancel mid-stream.

Conversation persistence. Conversations are stored locally in a bundled SQLite database. No cloud sync, no account, no leak path.

Permission orchestration. First-launch onboarding walks the user through the Accessibility permission grant with polling, so the app activates the moment the toggle flips, without an app restart.

Model selection and inventory. Thuki ships a small, vetted catalog (Staff Picks) and downloads models from the Hugging Face Hub, verifying each file against a pinned hash before it installs. A built-in fit hint flags whether a given model is Comfortable, Tight, or Heavy on your Mac. If you already run Ollama and prefer it, it is available as an optional provider.

The workflow itself. Stitching all of the above into a single seamless experience: hotkey fires, context is captured, window appears, user types, tokens stream, user dismisses, window vanishes, context resets. The composition is the product.

If that is "just a wrapper," every web browser is "just a wrapper around HTTP."


What Thuki Does Not Do

Honesty matters here too. Thuki owns the interface and the runtime, but it specifically does not do certain things.

Thuki does not generate text. It runs the model that does. Every token comes from the open model's weights; Thuki's engine executes them, streams the bytes, and renders them. Thuki runs the model, but it did not train it. The intelligence is the model's.

Thuki does not decide what answers are good. Quality, factuality, tone, and reasoning depth are all properties of the model. Switching from one open model to another changes the answers; switching from Thuki to a different interface (with the same model running underneath) does not.

Thuki does not run a backend. There is no Thuki server. No analytics, no telemetry, no Thuki account database. Inference happens entirely on your machine. The only network Thuki itself touches is the one-time model download from the Hugging Face Hub and software updates from GitHub.

Thuki bundles the engine, not the model. This is the subtle part of v0.15, and it is why the title of this post is still true. Thuki now ships the runtime (the llama.cpp engine), but it still does not ship the model. The weights are third-party open models you download once (roughly 2 to 12 GB), and Thuki keeps them in its own local blob store. The engine is Thuki's; the intelligence is not.

Thuki does not train, fine-tune, or modify models. It accepts whatever the model returns and renders it. It does not post-process responses, apply guardrails, or filter content.

This is deliberate scoping. A product that tries to own all three layers, the model included, is a much larger project (OpenAI, Anthropic, and Google have thousands of employees between them). Thuki owns two layers well: the interface and the runtime. The third, the part that needs a research lab and a fortune in compute, it leaves to the open-source community.


Why the Separation Is the Point

Keeping the model layer separate is not a compromise Thuki makes because training a model is hard. It is a deliberate design choice with real benefits to the user.

You can swap models freely. When a better open-source model is released next month, you can be running it the same day. Thuki does not need a release, an update, or a feature flag. The interface and the engine keep working; the brain changes.

You are not locked into anyone's model quality curve. ChatGPT's user gets exactly the models OpenAI ships, on OpenAI's schedule. Thuki's user gets whatever the open-source community has produced, on the community's schedule. That schedule has been fast on the categories of model that matter for daily local use.

Privacy is structural, not a promise. Most "private" AI products promise that the company will not misuse your data. Thuki's privacy comes from where the work happens: the engine runs on your Mac, the model runs on your Mac, and there is no Thuki server to promise anything about. The honest caveats are the one-time model download and any /search query you choose to run; everyday chat stays on the machine.

Interface improvements do not depend on model releases. Thuki can ship a better window animation, a better hotkey config, a better OCR pipeline, without waiting on any model. Conversely, when a new model is released, Thuki gets the benefit without shipping a thing.

The shorthand: Thuki competes on the interface and the workflow, not on the model. It ships an engine so there is nothing to set up, but bundling an engine is table stakes; every local-AI app can run a model. What they cannot do is summon over any app, including fullscreen. Whether the answers are good is a model question, not a Thuki question.


How to Compare AI Products Honestly

Once the three layers are visible, the right comparisons stop looking like cross-category fights and start looking like layer-by-layer choices.

Comparison What you are actually comparing
ChatGPT vs Claude Model layer (GPT-5 vs Claude Opus)
ChatGPT app vs ChatGPT API Interface layer (web/native vs raw API)
Thuki's engine vs OpenAI's infrastructure Runtime layer (your Mac vs OpenAI's GPUs)
Thuki vs LM Studio Interface layer (floating overlay vs windowed chat app)
Thuki vs Raycast AI Interface layer (local-first overlay vs cloud-only launcher)
Llama vs Qwen Model layer (two open-source models)

"Thuki vs ChatGPT" is not a real comparison because they sit on different layers. The honest comparison is Thuki (interface + built-in engine) + an open model vs ChatGPT app + GPT-5 + OpenAI infrastructure: full stack against full stack. And when you compare them that way, the answer depends on what the user actually values (privacy, cost, flexibility, frontier capability), not on which app is "better."


Frequently Asked Questions

So Thuki is just a wrapper?

Thuki is the interface and runtime layer for a local-AI stack. Calling it "just a wrapper" is like calling a code editor "just a wrapper" because it does not implement the compiler. The interface is where the day-to-day experience lives, and the engine that runs the model is Thuki's own. The model layer is delegated to the open-source community on purpose, not by accident.

Why doesn't Thuki bundle a specific model?

Bundling specific weights would lock every user to whatever model was current at install, balloon the download, and force a Thuki release every time a better open model dropped. Instead Thuki bundles the engine and lets you download the model you want, once. Model updates stay independent of app updates, and you pick.

Do I still need Ollama?

No. As of v0.15 Thuki ships its own engine and runs models itself, so a fresh install needs nothing else. Ollama is still supported as an optional provider if you already run it and prefer it, but it is no longer required. The built-in engine is the default.

Can Thuki improve without a model upgrade?

Yes, often. Better hotkey behavior, better window animations, better OCR pipelines, better context capture, better permission flows, faster engine lifecycle, conversation history features: all of these live in the interface and runtime layers and can ship without touching the model.

What about apps that bundle everything together?

Some products own all three layers: OpenAI does, Anthropic does. Local apps like LM Studio and Jan own the interface and runtime and let you pick the model, which is exactly where Thuki now sits too. Among that group the differentiator is not layer ownership, it is the interface: a floating, summon-from-anywhere overlay versus a windowed app you switch to.

If Thuki owns the interface and runtime, what is the product?

The product is the workflow: NSPanel + CGEventTap + AX capture + screen attach + streaming render + conversation persistence, composed into one floating macOS experience, with an engine bundled so it runs the moment you install it. The model is the intelligence you bring; everything around it is Thuki.


The Short Version

Thuki is not an AI model. It ships an engine to run one, but the intelligence, the weights, is a third-party open model you choose. Thuki is the interface and the runtime of a local AI stack: the part that decides how you reach for AI, what context the AI sees by default, and how the experience fits into the rest of your day on a Mac.

The three layers (interface, runtime, model) are independent for good reasons. The model can change without the interface changing. The interface can improve without the model improving. Privacy comes from where the work happens, on your machine, not from a promise. And when you compare AI products, the honest comparison is layer-by-layer, not "this app vs that app."

Next steps: