What does Thuki even mean?

Thuki (Thư ký) is Vietnamese for secretary: a personal assistant who handles the details so you can focus on the work. The product is built for quick, throwaway conversations powered by local AI models: ask something, get an answer, move on. No history you didn't ask for, no cloud logging everything you type. Long term, Thuki is being built toward a fully agentic assistant that can actually do work on your behalf (Gmail, Calendar, Slack, and more via MCP integrations). The kind of assistant that handles the task, not just the answer.

Which Mac does Thuki run on?

Apple Silicon Macs (M1 or later) running macOS 13 Ventura or newer. Ventura is the floor because two capabilities Thuki depends on (the NSPanel overlay that floats above every app including fullscreen, and the HID-level event tap that intercepts your shortcut system-wide) only behave reliably from that release onward. Intel Mac support is on the roadmap.

Do I need an internet connection?

No. Thuki uses Ollama to run language models directly on your hardware. After the initial model download, everything is fully offline.

Which models can I use?

Any model Ollama supports: Llama 3, Mistral, Phi-3, Qwen, Gemma, and more. New models become available automatically as Ollama updates.

Where is my data stored?

A single SQLite file on your disk. There is no backend, no server, no log you cannot cat. Delete the file and the data is gone.

Is there a subscription or per-query charge?

No. Once you download Thuki there is no cloud service billing you. Your GPU does the work.

Does Thuki work with every app?

Yes. Highlight text in any app, summon Thuki with your keyboard shortcut, and type a slash command. The system-wide overlay works wherever your cursor is.

Ambient Local AI on macOS: How Thuki Compares

The macOS local-AI space has gotten crowded. There are at least six well-known apps that let you run language models on your Mac without sending anything to a cloud provider. They differ on a lot of details, but the most important difference is one almost nobody talks about: whether the AI is something you go to, or something that comes to you.

This post defines that distinction (ambient AI), then walks through the current landscape of local AI tools on macOS in 2026 and where each one fits.

What "Ambient AI" Means

Ambient AI is AI that lives on top of your existing workflow, summoned without leaving the app you are already in. It has three properties:

No dedicated window to navigate to. The AI is reachable from anywhere with a hotkey or gesture, not by switching apps or workspaces.
Context-aware by default. Whatever you have highlighted, screenshotted, or focused on is automatically available to the AI without copy-paste.
Dismissable in one action. Closing the AI returns you to exactly where you were, with no lost state.

Most AI tools fail at least two of these. They are windowed chat apps: you open them, you type, you read the answer, you close them, you go back to what you were doing. That cycle works, but it has a hidden cost. Every question is a context switch. The friction is small per query, but it compounds across a day of work.

Ambient AI removes that friction by reversing the relationship. You do not visit the AI; the AI shows up where you are.

The macOS Local AI Landscape in 2026

Seven options are worth knowing about. They fall into two architectural categories.

Windowed chat apps (open the app, chat in its window):

LM Studio (lmstudio.ai): most popular model lab. Strong developer tooling.
Jan (jan.ai): open-source ChatGPT-style desktop chat, 5.5M+ downloads.
Msty (msty.ai): private AI workspace with knowledge stacks, agents, and personas.
AnythingLLM (anythingllm.com): document-focused chat with built-in agents.
GPT4All (nomic.ai/gpt4all): older but still maintained, runs on Windows/Mac/Linux.

Overlay / launcher AIs (summon from anywhere):

Raycast AI (raycast.com/ai): productivity launcher with AI bolted onto it. Requires a subscription. Not local-first.
Thuki: floating overlay, local-first, free, open source. The subject of this post.

Notably, Ollama itself (the model runtime) shipped a cloud tier in 2026, with $20/month Pro and $100/month Max plans. Ollama still ships a free local runtime that everything else in this list runs on, but the project's center of gravity is shifting toward cloud monetization.

The Two Design Axes

There are two questions that determine how a local AI tool will feel to use day-to-day.

Axis 1: Windowed chat or ambient overlay? A windowed chat app has its own application window that you Cmd-Tab to. An ambient overlay appears on top of whatever you are currently doing.

Axis 2: Local-only or flexible source? Some tools only run local models. Others let you mix local with cloud models via bring-your-own-key (BYOK) or built-in cloud connectors.

Mapping the tools onto these axes:

Tool	Architecture	Model source
LM Studio	Windowed	Local-only (plus headless SDK)
Jan	Windowed	Local + cloud BYOK
Msty	Windowed	Local + cloud, side by side
AnythingLLM	Windowed	Local + cloud
GPT4All	Windowed	Local + thousands of models
Raycast AI	Overlay/launcher	Cloud-only (subscription)
Thuki	Overlay (NSPanel)	Local default, BYOK planned

The interesting empty quadrant is "ambient overlay + local-first." Raycast AI is overlay but not local-first. Every local-first option except Thuki is a windowed app.

Feature-by-Feature Comparison

Feature	LM Studio	Jan	Msty	AnythingLLM	GPT4All	Raycast AI	Thuki
Free to use	Yes	Yes	Yes (free tier)	Yes (free tier)	Yes	No (paid)	Yes
Open source	No	Yes	No	Yes	Yes	No	Yes
Native macOS app	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Runs locally	Yes	Yes	Yes	Yes	Yes	No	Yes
Cloud BYOK	No	Yes	Yes	Yes	No	Built-in only	Planned
Floating overlay (above fullscreen)	No	No	No	No	No	Partial	Yes
Hotkey summon from anywhere	No	No	No	No	No	Yes	Yes
Highlighted-text auto-capture	No	No	No	No	No	Limited	Yes
Screen attach (screenshot)	No	No	Partial	Partial	No	No	Yes

The pattern is clear. Most local AI tools focus on the model side (which models, how to manage them, how to serve them) and treat the UI as a standard chat window. Raycast goes the other way, focusing on the overlay experience while staying cloud-only. Thuki is the only tool in the comparison that pairs both.

Where Each Tool Wins

This is not an attempt to claim Thuki is best at everything. Each of these tools is good at something specific, and for some users that specific thing matters more than ambient design.

LM Studio wins if you want a model lab. It exposes per-model parameters, has the best built-in model catalog, ships SDKs for JavaScript and Python, and supports headless server deployment via llmster. If you are building an app on top of local models, LM Studio is the strongest base layer. It is not trying to be a daily-driver chat assistant; it is trying to be a workshop.

Jan wins if you want an open-source ChatGPT clone with serious polish. The UI is clean, it handles both local and cloud models gracefully, and the project has the most active community of the open-source options (41.9K GitHub stars, 5.5M+ downloads). If a chat-tab interface is what you actually want, Jan is the strongest pick.

Msty wins if you want a workspace with knowledge stacks, personas, and prompt workflows. It is the most feature-rich of the windowed options, with separate products for agents (Claw), enterprise inference (Stack), and even a legal vertical (Msty Law). The trade-off is complexity; Msty has more surface area than a simple chat app needs.

AnythingLLM wins if you do a lot of document work. The product is built around RAG: pull in PDFs, codebases, online docs, and chat with them. It also has built-in AI Agents for automation. For knowledge-base use cases, this is the strongest pick.

GPT4All wins on model variety. Nomic's catalog supports thousands of models, and the app runs cleanly on Windows, macOS, and Linux. It is older than some of the others, so the UI feels less polished, but it works.

Raycast AI wins if you already live in Raycast. If you have built your day around Raycast's launcher (commands, snippets, calendar, clipboard history), adding AI on top is a small step. The AI itself is fine. The constraint is that it requires a subscription and is not local-first; your queries go to Raycast's chosen providers, not your machine.

Thuki wins on ambient design. It is the only local-first tool in the list that summons on top of every other app via a hotkey, captures highlighted text as context automatically, and disappears back into the background when you are done. The model layer is delegated to Ollama, so the same model catalog is available; the distinguishing trait is the workflow on top of that model.

Why Ambient Matters

The case for ambient is easier to feel than to read about, but here are the specific moments where the difference shows up.

Reading a long document. You hit a paragraph you do not understand. With a windowed chat app, you copy the paragraph, switch apps, paste, type your question. With an ambient overlay, you highlight the paragraph and double-tap a hotkey. The same answer, in one tenth of the actions.

Debugging an error in your editor. A stack trace appears. With a windowed app, you copy the error, alt-tab, paste, switch back to fix it. With ambient, you highlight the trace, summon the overlay, read the answer, dismiss.

Reviewing a UI in another app. You see a button label that seems off. Windowed AI cannot see your screen; you would need to screenshot manually and attach. Thuki captures the screen with /screen and keeps you in the app you were reviewing.

Quick fact lookup while writing. Halfway through a sentence in your editor, you need a date or a definition. With a windowed app, your cursor is gone. With ambient, the question and answer happen on top of your text, and your cursor stays where it was.

None of these are revolutionary on a per-query basis. The point is the compounding effect. Removing one or two seconds of friction from a dozen daily AI interactions adds up to real time saved, and more importantly, it preserves the mental state of the work you were doing. The cognitive cost of an app switch is much higher than the cognitive cost of glancing at an overlay.

Where Thuki Does Not Win

Honesty matters. Thuki is not the right tool for every use case:

Heavy document RAG workflows. AnythingLLM and Msty are better if your primary use case is uploading PDFs and chatting with a knowledge base. Thuki is built around quick contextual questions, not persistent document workspaces.
Multi-agent automation. Msty Claw and AnythingLLM both ship agent runners. Thuki does not run multi-step autonomous tasks today.
Frontier model quality. Thuki runs whatever you have in Ollama. Until BYOK ships, that means open-source models only, which lag the frontier on some tasks. (Once BYOK ships, this gap disappears for users who choose to plug in a key.)
Cross-platform. Thuki is macOS-only. The NSPanel overlay design is intrinsically tied to AppKit. If you need a tool that works on Linux or Windows, Jan, LM Studio, or AnythingLLM are better choices.

Frequently Asked Questions

What is the difference between Thuki and Raycast AI? Raycast AI is a cloud-only feature inside a paid productivity launcher; queries go to Raycast's chosen providers, not your machine. Thuki is a standalone local-first overlay: the model runs on your Mac via Ollama, the app is free, and the source is open. The overlay UX is similar in spirit, but the underlying architecture and pricing model are different.

Is Thuki really free, or is there a paid tier hiding somewhere? Thuki is free, with no paid tier and no plans to add one. The project is open source under Apache 2.0. Local inference has no marginal cost, so there is nothing to charge per query.

Can I use Thuki with cloud models like GPT-4o or Claude? Bring-your-own-key (BYOK) support is on the roadmap. When it ships, you will be able to plug in your own API key and route Thuki's queries directly from your machine to OpenAI, Anthropic, or any OpenAI-compatible provider. Thuki itself does not run a cloud service or proxy your keys.

Why is there no overlay option in LM Studio, Jan, or Msty? Building a true macOS overlay that floats above fullscreen apps requires an NSPanel (a specific AppKit subclass), not a standard NSWindow. Cross-platform desktop frameworks like Electron do not expose this primitive cleanly, which is why most local AI tools (which are built cross-platform) are standard windowed apps. Thuki uses Tauri with the tauri-nspanel crate to bridge into native AppKit.

Which of these tools is best for me? If you want a workshop for testing models: LM Studio. If you want a clean chat-tab interface: Jan. If you want knowledge bases and document RAG: AnythingLLM or Msty. If you live in Raycast already and do not need local-first: Raycast AI. If you want quick, contextual, hotkey-summoned AI that stays out of your way: Thuki.

The Short Version

The macOS local AI space in 2026 has plenty of good windowed chat apps. What it has had almost none of, until now, is an ambient overlay design paired with local-first inference. Thuki is built specifically to fill that gap. For users whose primary AI use case is "I want quick contextual answers without leaving my current app," the comparison is not really between Thuki and LM Studio or Jan; it is between Thuki and not having an ambient AI at all.

Next steps:

New to Thuki? Read the introduction post
Read why Thuki is local-first and the BYOK plan
Set it up: Getting Started on GitHub