Most AI assistants have the same three problems: they cost money every month, they require you to open a browser tab to use them, and they send everything you type to a server you do not control.
Thuki is built to fix all three at once.
Thuki is a floating AI assistant for macOS. It sits above every app, including full-screen ones. You summon it with a hotkey, ask your question, and dismiss it without ever leaving what you were doing. No subscription, no API keys, and answers come from a model on your own machine, not a cloud server.
Why Thuki Exists
The workflow most people use for AI today looks like this: stop what you are doing, switch to a browser, open ChatGPT, type your question, read the answer, switch back to what you were doing. Every single time.
That context switch is the problem. It takes you out of flow. And you are paying $20 a month for the privilege.
Thuki was built around a different idea: your AI assistant should come to you, not the other way around. It should appear when you need it and disappear when you do not. It should never touch your data. And it should not cost anything.
The name comes from the Vietnamese word thư kí, meaning secretary. That is exactly what it is: a quiet, capable assistant that is always available but never in the way.
What Makes Thuki Different
It Floats Above Everything
Thuki is a macOS overlay, not a standard app window. It renders above every other window, including full-screen apps, video calls, and games. You never need to minimize or switch workspaces to use it.
One Hotkey From Anywhere
Double-tap Control from any app, any context, any moment. Thuki appears. Ask your question. Double-tap again to dismiss it. Your original app is exactly where you left it.
Your Highlighted Text Becomes the Question
Highlight any text in any app, then double-tap Control. Thuki opens with that text already pre-filled as context. No copying, no pasting, no switching windows. You are already halfway done asking your question before Thuki finishes appearing.
You can also use /screen to capture your entire screen and attach it as context. Ask Thuki to explain an error message, summarize a document, or describe what it sees.
Completely Local and Private
Thuki has no backend. There is no Thuki server that sees your queries, no analytics, no telemetry, no account. Your prompts go to a model running on your own machine, in Thuki's built-in engine, never to a cloud service. With a model already downloaded, turn off Wi-Fi and everyday chat still works exactly the same.
Free and Open Source
Thuki is free and open source under Apache 2.0, and the open-source models it works with are free too. There is no metered usage, because the work happens on your own hardware, not a billed cloud. If premium features arrive later, they would be additive, not a paywall on what is free today.
How Thuki Works Under the Hood
The overlay layer: NSPanel via Tauri
Thuki is built with Tauri, a Rust-based framework for building native desktop apps with a web frontend. On macOS, the overlay window is not a standard NSWindow; it is an NSPanel, a subclass of NSWindow that macOS treats differently at the window manager level.
The distinction matters because a standard NSWindow, even when set to the highest window level, cannot appear above native fullscreen applications. The macOS window server simply blocks it. NSPanel with the floating style mask bypasses this restriction, which is why Thuki can appear on top of a fullscreen video, a game, or a fullscreen editor without you needing to leave that space.
Thuki uses the tauri-nspanel crate to convert the Tauri window into a floating NSPanel at startup. The panel is configured with is_floating_panel: true and can_become_key_window: true, so it can receive keyboard input the moment it appears without any explicit focus management on your part.
The hotkey: CGEventTap at the HID layer
The double-tap Control shortcut works differently from a standard global hotkey. Thuki registers a CGEventTap at CGEventTapLocation::HID (the Human Interface Device layer), which is the lowest point in the macOS input stack, below the application layer. This means Thuki sees your keystrokes before they reach any other application, regardless of what is focused.
When you tap Control twice within 400 milliseconds, the activator fires. A 600-millisecond cooldown prevents accidental double-toggles. The tap runs on a dedicated background thread with its own CFRunLoop, so it adds zero latency to your main system.
This approach requires macOS Accessibility permission, which Thuki requests on first launch.
The engine: a bundled llama.cpp server
As of v0.15, Thuki ships its own inference engine, so there is nothing separate to install. It bundles llama.cpp's llama-server, the widely used open-source engine that also powers Ollama and LM Studio, and manages its whole lifecycle: it starts the server on demand when you send a message, keeps the model warm between messages so follow-ups are instant, and stops it to free memory when you are idle. The server binds to loopback only and exposes an OpenAI-compatible /v1 API that Thuki talks to. Inference runs on your CPU and GPU via Apple's Metal framework, and the entire round trip stays on your machine. The model itself is an open GGUF model you download once; if you already run Ollama, it stays available as an optional provider.
Then why not just use LM Studio, Jan, or Ollama?
All of them run local models well, and now so does Thuki. But they share the same shape: they are destinations. To use them you stop what you are doing, switch to their window, type your question, and switch back. That window has no idea what you were working on, cannot read your highlighted text, and cannot see your screen. Every question starts from a blank slate.
Thuki is not a destination. You never switch to it. Double-tap Control and it appears on top of whatever you are already doing, pre-filled with whatever you had highlighted. When you are done, it disappears. Your original app, cursor position, and mental context are exactly where you left them. They all run the same open models; the difference is the summon-anywhere workflow around them.
How to Set It Up
Thuki ships its own engine, so there is no separate runtime to install. Two steps: install the app, then pick a model on first launch. You should not need to leave this page.
What You Need
- A Mac running macOS (Thuki is built and tuned for Apple Silicon)
- Disk space for a model: roughly 2 to 12 GB each
- A few minutes for the one-time model download
Step 1: Install Thuki
-
Download
Thuki.dmgfrom the latest release, or the auto-builtnightlychannel. -
Double-click the
.dmg, then dragThukionto theApplicationsshortcut. -
Eject the disk image.
-
Before opening Thuki the first time, clear the quarantine flag in Terminal:
xattr -rd com.apple.quarantine /Applications/Thuki.appThuki is free, open source, and distributed directly rather than through the Mac App Store, so macOS Gatekeeper blocks it until you run this one-time command. It is safe and officially documented by Apple.
-
Open Thuki. It appears in your menu bar.
Step 2: Pick a Model
On first launch, Thuki walks you through a short setup and offers a few Staff Picks: open models sized for different Macs. Choose one and Thuki downloads it once from the Hugging Face Hub and runs it on your own machine with its built-in engine. A fit hint flags whether a model is Comfortable, Tight, or Heavy on your hardware, so you do not have to guess. Model files are large (roughly 2 to 12 GB), so the first download takes a few minutes; you only do it once.
macOS also asks for two permissions on first launch. Accessibility powers the global shortcut that summons Thuki from any app. Screen Recording powers the /screen command. Grant both once; they persist.
Already run Ollama and prefer it? Thuki supports it as an optional provider: switch to it in Settings, Providers.
You Are Ready
Double-tap Control to confirm Thuki appears. Type anything; you should see a response in a few seconds.
Prefer to build from source? You need Bun and Rust: clone the repo, run bun install, then bun run dev.
Optional: Enable /search
The /search command runs a local agentic pipeline backed by two Docker services (SearXNG plus a Trafilatura reader). It is not bundled with the app and currently requires cloning the repo to run those services; every other feature works without it. Setup steps are in docs/agentic-search.md.
That is the whole setup. The always-current reference is the Getting Started guide on GitHub if you want the full detail.
Frequently Asked Questions
Is Thuki really free?
Yes. The app is free and open source today, with no paid tier and no usage limits, because all usage happens on your own hardware. Any future premium features would be additive, not a paywall on what is free now.
Do I need to create an account?
No. Thuki requires no account, no email, and no sign-up. Download and use it immediately.
What Mac do I need?
Thuki is a macOS app, built and tuned for Apple Silicon Macs, where the model runs on the built-in GPU. Larger models need more memory; smaller models run on more modest machines, and a built-in fit hint tells you which is which. Check the GitHub releases page for the builds currently published.
Is local AI as capable as ChatGPT?
For everyday tasks like summarizing text, drafting emails, explaining code, and answering quick questions, modern 7-8B open-source models handle the work well. The largest frontier cloud models still lead on long-context reasoning and complex multi-step tasks. The trade-off is privacy, cost, and latency: local models respond instantly, never bill you, and never see your data.
Which model is best for coding on a Mac?
Code-tuned models in the Qwen and DeepSeek families are strong for programming, and a solid general model like Gemma or Mistral handles code well as an all-rounder. Larger models are smarter but need more memory, so pick a size the fit hint marks Comfortable. You can browse and download options in Settings, Discover.
What macOS version do I need?
Thuki requires macOS 13.4 (Ventura) or later on Apple Silicon (M1 or later). Intel is not supported — Thuki ships no Intel build. Thuki uses native AppKit overlay primitives (NSPanel) plus a system-wide event tap, so it asks for Accessibility permission on first launch. See the GitHub releases page for the builds currently published.
How much disk space do I need?
The Thuki app itself is small. Models are the bulk of the storage: each takes roughly 2 to 12 GB. Plan for several GB if you keep one or two installed. Models are managed in-app, and you can remove any of them from the Library at any time.
What happens to my data?
During inference, nothing leaves your machine: Thuki runs the model locally on loopback, with no network request for the model's answer. You can verify this by disconnecting from Wi-Fi and using everyday chat. Web tools like /search reach the internet only when you choose to use them.
Can I use Thuki offline?
Yes, after the initial model download, for everyday chat. This makes Thuki useful on flights, in restricted networks, or anywhere connectivity is unreliable. Web tools like /search reach the internet when you use them.
The Short Version
Thuki is the AI assistant that should have existed from the start: free, private, and built around not interrupting you. It lives on your Mac, runs open-source models locally on its own engine, and stays out of your way until you need it.
If you have been paying for a cloud AI subscription and wondering whether there is a better way, this is it.
Next steps:
- See what else Thuki does on the features overview
- Read the step-by-step setup guide on GitHub
- Star the project on GitHub if you find it useful
Related reading: