The brief sounds simple: a small AI window that floats above whatever you are doing, summons with a hotkey, captures whatever you have highlighted, and runs entirely on your machine. Most of the user-facing surface area is straightforward. The macOS internals underneath are not.
This post walks through the actual engineering: which AppKit primitives Thuki uses, why standard cross-platform desktop frameworks fall short, the gotchas that bit during development, and what would be different if it were being built from scratch today. Code snippets are from the Thuki repo on GitHub and reflect the shipping implementation.
The Constraints
Four hard requirements shaped every decision:
- Must appear above native fullscreen apps. Not "always on top." Above fullscreen specifically (a fullscreen YouTube video, a fullscreen editor, a fullscreen game).
- Must summon from anywhere with a hotkey. No "open the app first."
- Must auto-capture the focused app's selected text as context, without breaking the user's clipboard.
- Must stay out of the way the rest of the time. No persistent dock icon, no notification, no background CPU.
Each of those constraints rules out an obvious cross-platform approach. Together they force a stack: native AppKit primitives, called from Rust, wrapped in a Tauri webview.
Architecture
The four backend layers are independent: the activator only knows how to fire an event, the panel manager only knows how to show the window, the context capturer only knows how to read selected text and screen bounds, and the model client only knows how to stream tokens from Ollama. They communicate through Tauri's event bus, not through direct calls. This makes each piece testable on its own and means a failure in one (e.g. the AX API returning nothing for an Electron app) does not cascade.
Deep Dive 1: Floating Above Fullscreen with NSPanel
The first surprise during development was that a standard NSWindow, even with its window level cranked up to kCGScreenSaverWindowLevel, still cannot appear above a fullscreen application's space. macOS treats fullscreen apps as their own dedicated NSSpace, and the window server refuses to composite windows from other spaces over it. This is not something you can flag your way out of.
The fix is NSPanel, a subclass of NSWindow with different default behavior around floating, key-window behavior, and space membership. With the right style mask, an NSPanel is allowed to render above a fullscreen space.
Tauri 2 does not expose NSPanel natively. Bridging requires the tauri-nspanel crate, which converts a regular Tauri window into an NSPanel at runtime using objc2 bindings. The configuration is a macro:
mod _thuki_panel {
use tauri::Manager;
tauri_nspanel::tauri_panel! {
panel!(ThukiPanel {
config: {
can_become_key_window: true,
is_floating_panel: true
}
})
}
}
Two flags matter:
is_floating_panel: truetriggersNSWindowStyleMaskNonactivatingPanel+ collection behavior flags that allow the panel to appear in any space, including fullscreen ones.can_become_key_window: trueis required for the WebView to receive keyboard input. Without it, the user could see the overlay but not type into it.
One macro gotcha: tauri_panel! emits use statements at the call site, so two invocations in the same module cause name collisions. Thuki has two panels (overlay + settings), each in its own internal module:
#[cfg(target_os = "macos")]
mod _settings_panel {
use tauri::Manager;
tauri_nspanel::tauri_panel! {
panel!(ThukiSettingsPanel {
config: {
can_become_key_window: true,
is_floating_panel: false // settings is a normal window
}
})
}
}
The settings window is not a floating panel, because it is a destination you open intentionally, not an overlay.
tauri.conf.json also needs the macOS private-API flag enabled, since tauri-nspanel reaches into AppKit internals that Tauri normally hides:
{
"app": {
"macOSPrivateApi": true,
"windows": [
{
"label": "main",
"transparent": true,
"decorations": false,
"alwaysOnTop": false,
"visible": false
}
]
}
}
alwaysOnTop: false is intentional: the panel manages its own layering through NSPanel mechanics, not through Tauri's higher-level always-on-top toggle.
Deep Dive 2: Hotkey Detection at the HID Layer
The second non-obvious problem is the hotkey. The natural-sounding solution is tauri-plugin-global-shortcut or the Carbon Event Manager. Both register a system-wide shortcut at the session level: macOS notifies your app when a specific key combination is pressed.
This approach breaks under three real conditions:
- macOS 15 Sequoia changed focus-based filtering for session-level taps. Session-level event taps (
kCGSessionEventTap) sit above the window server routing layer and are subject to focus-based filtering: they silently receive zero events from other apps. The shortcut "works" in your own app but not when another app is focused, which defeats the entire purpose. - Secure input mode disables most session-level interceptors. When iTerm has Secure Keyboard Entry on or you focus a password field, session-level taps go dark.
- A single global shortcut collides with everything. Cmd+Space is Spotlight. Cmd+Option+Space is often Alfred or Raycast. There is no key combination that does not already belong to something.
Thuki takes a different approach: a CGEventTap at the HID layer (CGEventTapLocation::HID), which is the lowest point in the macOS input stack. Events reach the HID tap before they reach any application, before window-server routing, before focus filtering. This is the same layer Karabiner-Elements, BetterTouchTool, and every reliable system-wide key interceptor uses.
let tap_result = CGEventTap::new(
CGEventTapLocation::HID,
CGEventTapPlacement::HeadInsertEventTap,
CGEventTapOptions::Default,
vec![CGEventType::FlagsChanged],
move |_proxy, event_type, event: &CGEvent| -> CallbackResult {
// ... evaluate keycode and modifier flags
let keycode = event.get_integer_value_field(EventField::KEYBOARD_EVENT_KEYCODE);
let flags = event.get_flags();
if keycode != KC_PRIMARY_L && keycode != KC_PRIMARY_R {
return CallbackResult::Keep;
}
let is_press = flags.contains(CGEventFlags::CGEventFlagControl);
let mut s = cb_state.lock().unwrap();
if evaluate_activation(&mut s, is_press) {
cb_on_activation();
}
CallbackResult::Keep
},
);
A few details:
CGEventTapOptions::Default, notListenOnly. Active taps at the HID layer are not disabled by secure input mode. ListenOnly taps are. We never modify or drop events (every callback returnsKeep), but the tap option matters for survivability.- Only
FlagsChangedis registered. The disable-event sentinels (TapDisabledByTimeout,TapDisabledByUserInput) have values that overflow the bitmask and cannot be included explicitly. macOS delivers them to the callback automatically anyway. - Two keycodes monitored:
KC_PRIMARY_L: 0x3b(left Control) andKC_PRIMARY_R: 0x3e(right Control). Either qualifies.
The double-tap detection lives in evaluate_activation:
const ACTIVATION_WINDOW: Duration = Duration::from_millis(400);
const ACTIVATION_COOLDOWN: Duration = Duration::from_millis(600);
Two presses of Control within 400ms count as an activation. A 600ms cooldown prevents the user from accidentally re-triggering on the second release.
The tap runs on its own thread with its own CFRunLoop. The application's main thread never sees a single key event; it only sees the activation callback, which fires at most a couple of times per minute. The HID tap is genuinely zero-overhead from the main app's perspective.
Self-healing: macOS auto-disables event taps whose callback is too slow. When that happens, the callback receives TapDisabledByTimeout and Thuki stops the run loop. The outer retry loop reinstalls the tap. This is invisible to the user but matters for long-running reliability.
Deep Dive 3: Highlighted-Text Capture with AXUIElement
When the activator fires, Thuki has about 100ms before drawing its window to capture context from the focused app. The mechanism is the macOS Accessibility API (AXUIElement), the same protocol screen readers use.
extern "C" {
fn AXUIElementCreateSystemWide() -> AXUIElementRef;
fn AXUIElementCopyAttributeValue(
element: AXUIElementRef,
attribute: CFStringRef,
value: *mut CFTypeRef,
) -> AXError;
}
AXUIElementCreateSystemWide() returns a handle to the system-wide accessibility tree. From there, Thuki queries the focused application's focused element for AXSelectedText, then reads the selection's screen bounds for window-positioning math.
The AX protocol is a moving target. Most native macOS apps (Pages, Notes, Safari, Mail) implement it cleanly. Electron-based apps (VS Code, Slack, Discord) implement it inconsistently, and apps using Monaco editor or custom canvas-based text rendering often expose nothing useful at all.
For those cases, Thuki has a clipboard fallback: save the current clipboard, synthesize a Cmd+C key event via CGEventCreateKeyboardEvent, poll until the pasteboard changes, then restore the original clipboard contents.
fn clipboard_fallback() -> Option<String> {
let before = clipboard_text();
unsafe { simulate_cmd_c() };
let mut after = before.clone();
for delay_ms in [10, 20, 40, 80] {
std::thread::sleep(std::time::Duration::from_millis(delay_ms));
after = clipboard_text();
if after != before {
break;
}
}
// Always restore the original clipboard.
if after != before {
write_clipboard(&before);
}
if !after.is_empty() && after != before {
Some(after)
} else {
None
}
}
The exponential backoff (10ms, 20ms, 40ms, 80ms) is a compromise: fast machines respond in under 10ms, slower machines need up to ~150ms total. A fixed sleep(150ms) would feel sluggish on a fast machine; a fixed sleep(10ms) would miss the copy on a slow one.
The clipboard fallback is a workaround, not a feature. The proper path is AX. But for the 20% of apps where AX returns nothing, the fallback means the user gets the same experience they would with AX, paying only ~30ms of latency.
Deep Dive 4: Screen Attach with the Vision Framework
For the /screen command (and for image-based questions in general), Thuki captures the current screen and either passes the pixels to a vision-capable model or runs OCR via the macOS Vision framework first.
The Cargo.toml binding:
objc2-vision = { version = "0.3", features = [
"VNRecognizeTextRequest",
"VNRequestHandler",
"VNTypes",
"VNDefines"
] }
VNRecognizeTextRequest is Apple's on-device OCR. It runs through Neural Engine on Apple Silicon, takes ~100-200ms per typical screen, and handles a dozen languages including Vietnamese, Mandarin, and Japanese. Most importantly, it never makes a network request: the recognized text is computed entirely on-device. This matters because Thuki's privacy guarantee would be meaningless if /screen sent your screenshot to a cloud OCR API.
The screenshot itself uses CGWindowListCreateImage from Core Graphics. There is nothing exotic here, but the timing matters: the screenshot has to be taken before the Thuki overlay is shown, or the screenshot would include Thuki itself.
Deep Dive 5: Talking to Ollama
The model layer is intentionally boring. Thuki does not bundle its own runtime. It speaks the Ollama HTTP API on localhost:11434, the same endpoint Ollama's own CLI uses.
let response = reqwest::Client::new()
.post("http://localhost:11434/api/generate")
.json(&request_body)
.send()
.await?;
let mut stream = response.bytes_stream();
while let Some(chunk) = stream.next().await {
// Stream tokens to the frontend as they arrive.
}
A few non-obvious details:
- Streaming aborts incrementally, not at the end. Body-size caps are enforced inside the chunk loop. A misbehaving model that streams 50MB of repeated tokens gets cut off mid-stream rather than after the response completes.
/api/tagsis authoritative. Thuki maintains an in-memory cache of the user's last-selected model, but on every cold start it queries/api/tagsto verify the model still exists. If it does not (the user ranollama rm), Thuki falls back to the first installed model and prompts the user to pick.- No SDK dependency. The Ollama HTTP API is small and stable. Adding a Rust SDK would add a dependency, more code surface, and no real benefit.
When BYOK ships, the model client gets a second backend that targets OpenAI-compatible APIs. The rest of the stack (panel manager, activator, context capturer) does not change at all.
Why Tauri (and Not Swift, Not Electron)
The framework choice is the question that gets asked most.
Why not Swift / SwiftUI? A native Swift app would be the most macOS-idiomatic choice. The reason against: the chat UI itself benefits from web technologies (markdown rendering, syntax highlighting, animations) where the open-source library ecosystem is decades deeper than SwiftUI's. Building markdown rendering + code block syntax highlighting + streaming-token UI in pure SwiftUI is doable but slow. Doing it in React with react-markdown + shiki is a weekend.
Why not Electron? Electron does not expose NSPanel cleanly, has a much heavier baseline (a full Chromium + Node runtime per app), and the cross-platform conventions actively work against deep macOS integration. The same architectural choice (NSPanel + HID tap + AX API) would be possible in Electron but require similar levels of native bridging, with worse performance and a larger bundle.
Why Tauri? Three reasons:
- Rust backend. The CGEventTap, AX API, and Ollama streaming code all benefit from Rust's memory safety and async story. The activator's dedicated thread with a CFRunLoop is the kind of code that is easy to write correctly in Rust and easy to write incorrectly in C or Objective-C.
- Tauri bundle is ~10 MB, vs ~100+ MB for an equivalent Electron app. For an overlay that should feel as light as a system utility, this matters.
tauri-nspanelexists. The single most important piece of native bridging was already a community crate. Without it, the choice would have been harder.
The tradeoff: Tauri's WebView2-on-Windows / WebKit-on-macOS / WebKit-on-Linux story means each platform has slightly different rendering quirks. Thuki is macOS-only, so this never matters in practice.
The Permissions Story
Two macOS permissions matter:
- Accessibility is required for the CGEventTap (HID-layer event interception) and for the AX selection capture. One permission, two uses.
- Screen Recording is required for
/screento capture the user's display.
The first launch flow is unavoidably awkward: Thuki cannot do anything useful until the user opens System Settings → Privacy & Security → Accessibility, finds Thuki in the list, and toggles the switch. macOS does not allow apps to programmatically grant themselves these permissions, by design.
What Thuki can do is detect when the permission is missing (AXIsProcessTrusted()) and surface a clean onboarding screen that walks the user through it, then poll for the permission change with exponential backoff so the overlay activates automatically the moment the user finishes the setup. There is no app restart required.
const PERMISSION_POLL_INTERVAL: Duration = Duration::from_secs(5);
const MAX_PERMISSION_ATTEMPTS: u32 = 6;
If the permission is still missing after ~30 seconds, the user is reminded with a system tray menu entry. Asking for the permission once and giving up was tempting; polling means the app "just works" the moment the toggle is flipped, even if the user is doing something else when they grant it.
What I Would Do Differently
Three honest retrospectives.
The activator and the panel manager shipped together when they should have shipped sequentially. The first version of Thuki had the CGEventTap firing before the NSPanel conversion was reliable. The result: the hotkey worked, but the window appeared behind fullscreen apps about a third of the time. Splitting these into two independent shipped milestones would have caught the bug a week earlier.
The clipboard fallback should have been there from day one. The original plan was "AX is enough." It is not. Electron apps are too common to skip, and adding the fallback later meant rewriting state-machine code that assumed AX was the only path.
The dual-window architecture (overlay + settings) was added late. Cramming settings into the overlay's NSPanel was the first attempt and produced subtle bugs: settings menus closed unexpectedly when the panel lost focus, and the Dock icon flickered. A second NSPanel subclass with is_floating_panel: false solved both issues in maybe 50 lines of code. Lesson: when the same UI surface needs two different behaviors, split early.
Frequently Asked Questions
Can I use this pattern in my own Tauri app?
Yes, all of it. tauri-nspanel is open source. CGEventTap is a public macOS API. The AX bindings are straightforward FFI. The full architecture is reproducible from public components. The Thuki repo is open source under Apache 2.0 if you want a working reference.
Will Thuki work on Windows or Linux?
Not currently. The NSPanel approach is intrinsically macOS-specific. A Windows version would use a different overlay primitive (likely a layered window with WS_EX_TOPMOST + DWM tricks), and a Linux version would depend on the compositor (X11 or Wayland). These are doable but real ports, not a recompile.
What is the steady-state performance overhead? The HID event tap callback runs on every keystroke but does ~5 instructions for the common case (keycode mismatch returns immediately). Measured CPU overhead is statistically indistinguishable from zero. The model layer is bounded by Ollama and the model itself, not by Thuki.
Why Tauri 2 and not Tauri 1?
Tauri 2 added the multi-window APIs, a cleaner plugin system, and macOS-specific window manipulation that tauri-nspanel depends on. Tauri 1 would have required more invasive forks.
Is this just a wrapper around Ollama? The model runtime is Ollama. Everything else (NSPanel bridge, HID-layer event tap, AX context capture, screen OCR, dual-panel architecture, streaming response handling with incremental size caps, the activation state machine, the permissions onboarding flow) is Thuki's own engineering. Calling it "a wrapper" is like calling a car a wrapper around the engine.
The Short Version
Building a real macOS overlay for AI is not a one-weekend project, even with modern tools. The hard parts are not the model layer (Ollama handles that) and not the UI (web tech handles that). The hard parts are the AppKit primitives: NSPanel for fullscreen-friendly floating, CGEventTap at the HID layer for reliable hotkeys, AXUIElement for context capture, and the careful glue that makes those three things work together without breaking the user's workflow.
If you have a similar idea in mind, the code is on GitHub. If you have feedback on any of the design choices above, the issues tracker is open.
Next steps:
- New to Thuki? Read the introduction post
- The product philosophy: why Thuki is local-first
- The competitive landscape: how Thuki compares to other macOS local AI tools
- The code: github.com/quiet-node/thuki