What does Thuki even mean?

Thuki (Thư ký) is Vietnamese for secretary: a personal assistant who handles the details so you can focus on the work. The product is built for quick, throwaway conversations powered by local AI models: ask something, get an answer, move on. No history you didn't ask for, no cloud logging everything you type. Long term, Thuki is being built toward a fully agentic assistant that can actually do work on your behalf (Gmail, Calendar, Slack, and more via MCP integrations). The kind of assistant that handles the task, not just the answer.

Which Mac does Thuki run on?

Thuki is a macOS-only app, built and tuned for Apple Silicon Macs where the local model runs on the built-in GPU. On first launch macOS asks for two permissions: Accessibility (for the global shortcut that summons Thuki from any app) and Screen Recording (for the /screen command). Grant both once. Check the GitHub releases page for the current supported builds.

Does Thuki run the AI model itself?

No. Thuki is the interface: the floating window, the global shortcut, context capture, and slash commands. The model runs in Ollama, a separate open-source runtime you install once that runs on your own machine. Thuki sends your prompt to Ollama on localhost and streams the answer back. You pick the model; Ollama runs it.

Do I need an internet connection?

No. After the initial model download, the model runs on your own hardware, so everyday use works offline. Web tools like /search reach the internet when you use them.

Which models can I use?

Open models like Llama 3, Mistral, Phi-3, Qwen, and Gemma. You choose which one to run; see "Does Thuki run the AI model itself?" below for how the model is set up and managed.

Where is my data stored?

A single SQLite file on your disk. There is no backend, no server, no log you cannot cat. Delete the file and the data is gone.

Is there a subscription or per-query charge?

No. Once you download Thuki there is no cloud service billing you. Your GPU does the work.

Meet Thuki: a Floating AI Assistant for macOS

Most AI assistants have the same three problems: they cost money every month, they require you to open a browser tab to use them, and they send everything you type to a server you do not control.

Thuki is built to fix all three at once.

Thuki is a floating AI assistant for macOS. It sits above every app, including full-screen ones. You summon it with a hotkey, ask your question, and dismiss it without ever leaving what you were doing. No subscription, no API keys, and answers come from a model on your own machine, not a cloud server.

Why Thuki Exists

The workflow most people use for AI today looks like this: stop what you are doing, switch to a browser, open ChatGPT, type your question, read the answer, switch back to what you were doing. Every single time.

That context switch is the problem. It takes you out of flow. And you are paying $20 a month for the privilege.

Thuki was built around a different idea: your AI assistant should come to you, not the other way around. It should appear when you need it and disappear when you do not. It should never touch your data. And it should not cost anything.

The name comes from the Vietnamese word thư kí, meaning secretary. That is exactly what it is: a quiet, capable assistant that is always available but never in the way.

What Makes Thuki Different

It Floats Above Everything

Thuki is a macOS overlay, not a standard app window. It renders above every other window, including full-screen apps, video calls, and games. You never need to minimize or switch workspaces to use it.

One Hotkey From Anywhere

Double-tap Control from any app, any context, any moment. Thuki appears. Ask your question. Double-tap again to dismiss it. Your original app is exactly where you left it.

Your Highlighted Text Becomes the Question

Highlight any text in any app, then double-tap Control. Thuki opens with that text already pre-filled as context. No copying, no pasting, no switching windows. You are already halfway done asking your question before Thuki finishes appearing.

You can also use /screen to capture your entire screen and attach it as context. Ask Thuki to explain an error message, summarize a document, or describe what it sees.

Completely Local and Private

Thuki has no backend. There is no Thuki server that sees your queries, no analytics, no telemetry, no account. Your prompts go to a model running locally on your own machine, never to a cloud service. With a model already downloaded, turn off Wi-Fi and everyday chat still works exactly the same.

Free Forever

There is no free tier with limits and a paid tier without them. Thuki is free and open source under Apache 2.0. The open-source models it works with are free too. There is no metered usage, because the work happens on your own hardware, not a billed cloud.

How Thuki Works Under the Hood

The overlay layer: NSPanel via Tauri

Thuki is built with Tauri, a Rust-based framework for building native desktop apps with a web frontend. On macOS, the overlay window is not a standard NSWindow; it is an NSPanel, a subclass of NSWindow that macOS treats differently at the window manager level.

The distinction matters because a standard NSWindow, even when set to the highest window level, cannot appear above native fullscreen applications. The macOS window server simply blocks it. NSPanel with the floating style mask bypasses this restriction, which is why Thuki can appear on top of a fullscreen video, a game, or a fullscreen editor without you needing to leave that space.

Thuki uses the tauri-nspanel crate to convert the Tauri window into a floating NSPanel at startup. The panel is configured with is_floating_panel: true and can_become_key_window: true, so it can receive keyboard input the moment it appears without any explicit focus management on your part.

The hotkey: CGEventTap at the HID layer

The double-tap Control shortcut works differently from a standard global hotkey. Thuki registers a CGEventTap at CGEventTapLocation::HID (the Human Interface Device layer), which is the lowest point in the macOS input stack, below the application layer. This means Thuki sees your keystrokes before they reach any other application, regardless of what is focused.

When you tap Control twice within 400 milliseconds, the activator fires. A 600-millisecond cooldown prevents accidental double-toggles. The tap runs on a dedicated background thread with its own CFRunLoop, so it adds zero latency to your main system.

This approach requires macOS Accessibility permission, which Thuki requests on first launch.

The model layer: Ollama

Under all of this sits Ollama, the open-source runtime that downloads and runs large language models on your Mac. Ollama handles model quantization and GPU acceleration via Apple's Metal framework, and exposes a local HTTP API on localhost that Thuki connects to. When you send a message, Thuki forwards it to Ollama, which runs inference on your CPU and GPU and streams the response back. The entire inference round trip stays on your machine.

Then why not just use Ollama directly?

Ollama ships with a native macOS chat UI and you can run it through the terminal too. Both work. But both share the same fundamental problem: they are destinations. To use them, you stop what you are doing, switch to the Ollama window, type your question, and switch back. The chat window has no idea what you were working on, cannot read your highlighted text, and cannot see your screen. Every question starts from a blank slate.

Thuki is not a destination. You never switch to it. Double-tap Control and it appears on top of whatever you are already doing, pre-filled with whatever you had highlighted. When you are done, it disappears. Your original app, cursor position, and mental context are exactly where you left them.

Ollama handles the model. Thuki handles the workflow.

How to Set It Up

Two steps: set up your AI engine, then install Thuki. The full walkthrough is below; you should not need to leave this page.

What You Need

A Mac running macOS (Thuki is built and tuned for Apple Silicon)
Disk space for a model: roughly 2 to 8 GB each
An AI engine: a local Ollama install, or the optional Docker sandbox (Step 1)

Step 1: Set Up Your AI Engine

Pick one of the two options below before installing Thuki.

Option A: Local Ollama (recommended for most people)

Ollama runs the model directly on your Mac. It is free, open source, and takes about five minutes.

1A. Install Ollama. Download it from ollama.com, or use Homebrew:

brew install ollama

1B. Pull a model.

ollama pull gemma4:e2b

Model files are large (typically 2 to 8 GB) and the download can take several minutes depending on your connection. You only do this once. Any model in the Ollama library works; gemma4:e2b is a good starting point, and you can pull more and switch between them from Thuki's ask bar.

1C. Verify the model is ready.

ollama list

Once your model appears, Ollama is ready. Thuki connects to it automatically at http://127.0.0.1:11434.

Option B: Docker Sandbox (for the security-conscious)

Want the strongest isolation between the model and your system? Run the model in a hardened container instead. It needs Docker Desktop, then:

bun run sandbox:start

The first run pulls the model inside the container, which can take a few minutes; later starts are instant. When you are done, stop and wipe all model data:

bun run sandbox:stop

This is opt-in and not bundled with the app. The container cannot reach the internet, cannot write to your filesystem, and leaves no trace when stopped.

Step 2: Install Thuki

Download Thuki.dmg from the latest release, or the auto-built nightly channel.
Double-click the .dmg, then drag Thuki onto the Applications shortcut.
Eject the disk image.
Before opening Thuki the first time, clear the quarantine flag in Terminal:
```
xattr -rd com.apple.quarantine /Applications/Thuki.app
```
Thuki is free, open source, and distributed directly rather than through the Mac App Store, so macOS Gatekeeper blocks it until you run this one-time command. It is safe and officially documented by Apple.
Open Thuki. It appears in your menu bar.

On first launch macOS asks for two permissions. Accessibility powers the global shortcut that summons Thuki from any app. Screen Recording powers the /screen command. Grant both once; they persist.

Prefer to build from source? You need Bun and Rust: clone the repo, run bun install, then bun run dev.

You Are Ready

Double-tap Control to confirm Thuki appears. Type anything; you should see a response in a few seconds. Thuki auto-detects your engine and the model you pulled.

Optional: Enable `/search`

The /search command runs a local agentic pipeline backed by two Docker services (SearXNG plus a Trafilatura reader). It is not bundled with the app and currently requires cloning the repo to run those services; every other feature works without it. Setup steps are in docs/agentic-search.md.

That is the whole setup. The always-current reference is the Getting Started guide on GitHub if you want the full detail.

Frequently Asked Questions

Is Thuki really free?

Yes. Thuki has no paid tier. The app is free, the models are free, and there are no usage limits because all usage happens on your own hardware.

Do I need to create an account?

No. Thuki requires no account, no email, and no sign-up. Download and use it immediately.

What Mac do I need?

Thuki is a macOS app, built and tuned for Apple Silicon Macs, where the model runs on the built-in GPU. Larger models need more memory; smaller models run on more modest machines. Check the GitHub releases page for the builds currently published.

Is local AI as capable as ChatGPT?

For everyday tasks like summarizing text, drafting emails, explaining code, and answering quick questions, modern 7-8B open-source models handle the work well. The largest frontier cloud models (GPT-4 class and above) still lead on long-context reasoning and complex multi-step tasks. The trade-off is privacy, cost, and latency: local models respond instantly, never bill you, and never see your data.

Which model is best for coding on a Mac?

For code-specific tasks, try qwen2.5-coder:7b or deepseek-coder-v2. Both are tuned for programming and understand most languages. For a single general-purpose model that also handles code well, mistral is a strong default. Larger models need more memory, so pick a size that fits your machine.

What macOS version do I need?

Thuki is macOS-only. It uses native AppKit overlay primitives (NSPanel) plus a system-wide event tap, so it asks for Accessibility permission on first launch. See the GitHub releases page for the builds currently published.

How much disk space do I need?

The Thuki app itself is small. Models are the bulk of the storage: each takes roughly 2 to 8 GB. Plan for several GB if you keep one or two installed. Models are managed by Ollama and can be removed any time with ollama rm <model>.

What happens to my data?

During inference, nothing leaves your machine: Thuki talks to a local model on localhost, with no network request for the model's answer. You can verify this by disconnecting from Wi-Fi and using everyday chat. Web tools like /search reach the internet only when you choose to use them.

Can I use Thuki offline?

Yes, after the initial model download, for everyday chat. This makes Thuki useful on flights, in restricted networks, or anywhere connectivity is unreliable. Web tools like /search reach the internet when you use them.

The Short Version

Thuki is the AI assistant that should have existed from the start: free, private, and built around not interrupting you. It lives on your Mac, connects to open-source models running locally via Ollama, and stays out of your way until you need it.

If you have been paying for a cloud AI subscription and wondering whether there is a better way, this is it.

Next steps:

See what else Thuki does on the features overview
Read the step-by-step setup guide on GitHub
Star the project on GitHub if you find it useful

Related reading: