All posts
local AI

Run AI Locally on Mac: Meet Thuki, the Free Private Assistant

Thuki is a floating AI assistant for macOS powered by local models. No API keys, no subscriptions, no data leaving your Mac. Set it up in ten minutes.

Written by

Logan Nguyen

Last updated

Most AI assistants have the same three problems: they cost money every month, they require you to open a browser tab to use them, and they send everything you type to a server you do not control.

Thuki is built to fix all three at once.

Thuki is a floating AI assistant for macOS that runs entirely on your machine. It sits above every app, including full-screen ones. You summon it with a hotkey, ask your question, and dismiss it without ever leaving what you were doing. No subscription. No API keys. No data leaves your Mac.


Why Thuki Exists

The workflow most people use for AI today looks like this: stop what you are doing, switch to a browser, open ChatGPT, type your question, read the answer, switch back to what you were doing. Every single time.

That context switch is the problem. It takes you out of flow. And you are paying $20 a month for the privilege.

Thuki was built around a different idea: your AI assistant should come to you, not the other way around. It should appear when you need it and disappear when you do not. It should never touch your data. And it should not cost anything.

The name comes from the Vietnamese word thư kí, meaning secretary. That is exactly what it is: a quiet, capable assistant that is always available but never in the way.


What Makes Thuki Different

It Floats Above Everything

Thuki is a macOS overlay, not a standard app window. It renders above every other window, including full-screen apps, video calls, and games. You never need to minimize or switch workspaces to use it.

One Hotkey From Anywhere

Double-tap Control from any app, any context, any moment. Thuki appears. Ask your question. Double-tap again to dismiss it. Your original app is exactly where you left it.

Your Highlighted Text Becomes the Question

Highlight any text in any app, then double-tap Control. Thuki opens with that text already pre-filled as context. No copying, no pasting, no switching windows. You are already halfway done asking your question before Thuki finishes appearing.

You can also use /screen to capture your entire screen and attach it as context. Ask Thuki to explain an error message, summarize a document, or describe what it sees.

Completely Local and Private

Thuki does not have a backend. There is no server that processes your queries. No analytics, no telemetry, no account required. The AI model runs on your CPU and GPU using Apple's Metal framework. Turn off your Wi-Fi and Thuki still works exactly the same.

Free Forever

There is no free tier with limits and a paid tier without them. Thuki is free. The models it uses are free and open source. The only thing powering Thuki is your own hardware.


How Thuki Works Under the Hood

The overlay layer: NSPanel via Tauri

Thuki is built with Tauri, a Rust-based framework for building native desktop apps with a web frontend. On macOS, the overlay window is not a standard NSWindow; it is an NSPanel, a subclass of NSWindow that macOS treats differently at the window manager level.

The distinction matters because a standard NSWindow, even when set to the highest window level, cannot appear above native fullscreen applications. The macOS window server simply blocks it. NSPanel with the floating style mask bypasses this restriction, which is why Thuki can appear on top of a fullscreen video, a game, or a fullscreen editor without you needing to leave that space.

Thuki uses the tauri-nspanel crate to convert the Tauri window into a floating NSPanel at startup. The panel is configured with is_floating_panel: true and can_become_key_window: true, so it can receive keyboard input the moment it appears without any explicit focus management on your part.

The hotkey: CGEventTap at the HID layer

The double-tap Control shortcut works differently from a standard global hotkey. Thuki registers a CGEventTap at CGEventTapLocation::HID (the Human Interface Device layer), which is the lowest point in the macOS input stack, below the application layer. This means Thuki sees your keystrokes before they reach any other application, regardless of what is focused.

When you tap Control twice within 400 milliseconds, the activator fires. A 600-millisecond cooldown prevents accidental double-toggles. The tap runs on a dedicated background thread with its own CFRunLoop, so it adds zero latency to your main system.

This approach requires macOS Accessibility permission, which Thuki requests on first launch.

The model layer: Ollama

Under all of this sits Ollama, the open-source runtime that downloads and runs large language models on your Mac. Ollama handles model quantization and GPU acceleration via Apple's Metal framework, and exposes a local HTTP API on localhost that Thuki connects to. When you send a message, Thuki forwards it to Ollama, which runs inference on your CPU and GPU and streams the response back. The entire round trip stays on your machine. Nothing reaches the internet.

Then why not just use Ollama directly?

Ollama ships with a native macOS chat UI and you can run it through the terminal too. Both work. But both share the same fundamental problem: they are destinations. To use them, you stop what you are doing, switch to the Ollama window, type your question, and switch back. The chat window has no idea what you were working on, cannot read your highlighted text, and cannot see your screen. Every question starts from a blank slate.

Thuki is not a destination. You never switch to it. Double-tap Control and it appears on top of whatever you are already doing, pre-filled with whatever you had highlighted. When you are done, it disappears. Your original app, cursor position, and mental context are exactly where you left them.

Ollama handles the model. Thuki handles the workflow.


How to Set It Up in Ten Minutes

What You Need

  • An Apple Silicon Mac (M1 or later)
  • macOS Ventura (13) or later
  • 8 GB RAM minimum; 16 GB recommended for a comfortable experience

Step 1: Install Ollama

Download Ollama from ollama.com and install it like any macOS app. Once installed, a small icon appears in your menu bar.

Verify it is running:

ollama --version

Step 2: Pull a Model

Thuki works with any model Ollama supports. A good starting point:

Model Download Size Best for
llama3.2 2.0 GB Fast, capable all-rounder
mistral 4.1 GB Strong reasoning and coding
gemma4:e4b 4.7 GB Multilingual, strong instruction following
phi4-mini 2.5 GB Lightweight, efficient

Pull a model:

ollama pull gemma4:e4b

Step 3: Install and Launch Thuki

Thuki is open source and free. Follow the Getting Started guide on GitHub for the full installation steps. Once you have it running, Thuki will automatically detect your Ollama instance. Select the model you pulled and you are ready.

Double-tap Control to confirm Thuki appears. Type anything. You should see a response within a few seconds.


Frequently Asked Questions

Is Thuki really free? Yes. Thuki has no paid tier. The app is free, the models are free, and there are no usage limits because all usage happens on your own hardware.

Do I need to create an account? No. Thuki requires no account, no email, and no sign-up. Download and use it immediately.

What Mac do I need? Any Mac with 8 GB or more of RAM can run Thuki with a small model. Apple Silicon Macs (M1 and later) offer significantly better performance due to unified memory bandwidth. With 16 GB you can run 7-8B parameter models, which handle most daily tasks well.

Is local AI as capable as ChatGPT? For everyday tasks like summarizing text, drafting emails, explaining code, and answering quick questions, modern 7-8B open-source models handle the work well. The largest frontier cloud models (GPT-4 class and above) still lead on long-context reasoning and complex multi-step tasks. The trade-off is privacy, cost, and latency: local models respond instantly, never bill you, and never see your data.

Which model is best for coding on a Mac? For code-specific tasks, try qwen2.5-coder:7b or deepseek-coder-v2. Both are specifically tuned for programming, understand most languages, and run comfortably on 16 GB of RAM. For a single general-purpose model that also handles code well, mistral is a strong default.

What macOS version do I need? macOS Ventura (13) or later. The overlay relies on a few AppKit primitives that are stable on Ventura and newer, but not available on older versions. If you are on Monterey or earlier, an OS update is required before Thuki will run.

How much disk space do I need? The Thuki app itself is small (under 100 MB). Models are the bulk of the storage: each model takes 2-8 GB. Plan for 5-15 GB total if you keep one or two models installed. Models live at ~/.ollama/models and can be deleted any time with ollama rm <model>.

What happens to my data? Nothing leaves your machine. Thuki connects to a local Ollama server running on localhost. There is no network request during inference. You can verify this by disconnecting from Wi-Fi while using Thuki; it works the same.

Can I use Thuki offline? Yes, after the initial model download. Ollama and Thuki work fully without an internet connection. This makes Thuki useful on flights, in restricted network environments, or anywhere connectivity is unreliable.


The Short Version

Thuki is the AI assistant that should have existed from the start: local, free, private, and built around not interrupting you. It runs on your Mac, connects to open-source models via Ollama, and stays out of your way until you need it.

If you have been paying for a cloud AI subscription and wondering whether there is a better way, this is it.

Next steps:

Related reading: