Documentation Index
Fetch the complete documentation index at: https://docs.monolex.ai/llms.txt
Use this file to discover all available pages before exploring further.
Kernel CLI: Free & Freedom
AI Computer Use — without the subscription, without the vendor lock, without the screenshots.
Every AI company is shipping “computer use” in 2026.
Claude charges $20-200/mo. OpenAI charges a subscription. Both lock you to their AI. Both use slow screenshot-based VLM.
Kernel CLI is free, works with any AI, and uses the OS accessibility tree — crazy fast, deterministic, zero API cost.
The Problem with Screenshot-Based Computer Use
Claude Computer Use / OpenAI Operator / Google Mariner:
1. Take screenshot ~500ms
2. Send to vision model (VLM) ~2000ms
3. VLM: "I see a button at (450,320)" ~1000ms
4. Click at coordinates ~100ms
Total: ~3.6 seconds per action
Accuracy: probabilistic (VLM can hallucinate coordinates)
Cost: VLM inference per screenshot
Vendor: locked to provider's computer use API
It’s slow, expensive, inaccurate, and locked to one vendor.
How Kernel CLI Works
Kernel CLI:
1. Read accessibility tree (AX) ~50ms
2. Find element by name ~1ms
3. Act via AX API ~10ms
Total: ~61ms per action
Accuracy: deterministic (OS-native element identification)
Cost: zero (local binary, no API call)
Vendor: none (works with any AI)
Crazy fast. Deterministic. Free. Vendor-independent.
The accessibility tree is how screen readers work — it’s the OS-native representation of every window, button, text field, and menu. Kernel CLI speaks the OS’s own language.
What It Can Do
See
# Every open window
kernel-cli windows
#279 "My Project — VS Code" (Code) FOCUSED
#412 "Pull Request #142" (Chrome)
#89 "Slack — engineering" (Slack)
# Accessibility tree of any app (structural, not visual)
kernel-cli ax-tree --app "Google Chrome" --depth 3
AXWebArea
AXGroup "main content"
AXButton "Merge pull request" ← found by NAME
AXButton "Close pull request"
AXStaticText "All checks passed"
# Running processes
kernel-cli process
claude cpu:70% mem:3%
node cpu:12% mem:2%
# Screenshots (when visual is actually needed)
kernel-cli capture --window "Chrome"
Act
# Press any button by name (no coordinates)
kernel-cli ax-press "Merge pull request"
# Type text
kernel-cli type "LGTM. Shipping."
# Keyboard shortcuts
kernel-cli key "cmd+shift+enter"
# Window management
kernel-cli window focus "Slack"
kernel-cli window resize "Terminal" 1200 800
# App control
kernel-cli app launch "Safari"
kernel-cli app quit "TextEdit"
# Scroll, click, drag
kernel-cli scroll --down 5
kernel-cli click 450 320
kernel-cli drag 100 100 500 500
Query
# Safari history
kernel-cli query safari --last 24h --keyword "auth"
# Spotlight search
kernel-cli search "connector.json"
# App logs
kernel-cli log --app "Xcode" --last 1h
# Open files by process
kernel-cli lsof --pid 1234
Automate
# AppleScript / JXA
kernel-cli script "tell application \"Finder\" to make new folder at desktop with properties {name:\"AI Output\"}"
# macOS Shortcuts
kernel-cli shortcut "Send Daily Report"
# URL schemes
kernel-cli open "slack://channel?team=T123&id=C456"
AX Tree vs Screenshot: Side by Side
A human asks: “Click the Merge button on GitHub.”
Screenshot approach (Claude Computer Use):
1. kernel-cli capture --window "Chrome" → screenshot.png
2. Send screenshot.png to VLM → $0.01, 2 seconds
3. VLM: "I see 'Merge pull request' button
at approximately (645, 480)"
4. kernel-cli click 645 480 → might miss
5. If missed, take another screenshot... → retry loop
AX tree approach (Kernel CLI):
1. kernel-cli ax-tree --app "Chrome"
→ AXButton "Merge pull request" (found)
2. kernel-cli ax-press "Merge pull request"
→ pressed (deterministic, no coordinates)
Two commands. No VLM. No cost. No retry. No guessing.
Freedom: You Choose the Harness
Other computer use tools come with built-in restrictions from the vendor:
- Claude Computer Use: Anthropic decides what Claude can access
- OpenAI Operator: OpenAI decides the safety boundary
- Google Mariner: Google decides the permissions
Kernel CLI is raw capability. You decide the harness:
Option 1: No harness (development, testing)
$ kernel-cli ax-press "Delete Everything"
→ executes immediately. BRAVE MODE.
Option 2: NIIA harness (production, shared machines)
$ niia control unlock --scope full --duration 1h
→ OTP sent to your email
→ you enter OTP
→ 1 hour of controlled access
$ niia control ax-press "Delete Everything"
→ executes within unlock window
Option 3: Enterprise harness (future)
→ Policy file defines what's allowed per role
→ Audit log records every action
→ OTP for destructive operations
→ Scope-limited to specific apps/windows
The capability is the same in all three. The control is yours.
With Any AI
Kernel CLI is a standalone binary. It doesn’t know or care which AI calls it.
# Claude uses it
claude> Use kernel-cli to check if CI passed on Chrome
# Codex uses it
codex> kernel-cli ax-tree --app "Chrome" | grep "checks passed"
# Gemini uses it
gemini> Read the accessibility tree of Slack and find unread messages
# A script uses it
#!/bin/bash
kernel-cli ax-press "Deploy to Production"
MCP: Kernel CLI for All
Terminal AI tools (Claude Code, Codex, Gemini CLI) can call kernel-cli directly. But what about AI that doesn’t have a terminal — Claude Desktop, web-based AI, IDE copilots?
Kernel CLI runs as an MCP server. Any AI that supports MCP gets full OS access.
{
"mcpServers": {
"kernel-cli": {
"command": "kernel-cli",
"args": ["--mcp"]
}
}
}
One line of config. The AI gets these tools:
| MCP Tool | What it does |
|---|
snapshot | Windows + AX tree + clipboard in one call |
windows | List all open windows |
ax_tree | Read UI structure of any app |
ax_press | Press any button by name |
ax_set | Set value on any UI element |
type_text | Type text |
key | Press keyboard shortcuts |
click | Click at coordinates |
scroll | Scroll any element |
capture | Take screenshot |
app | Launch, quit, switch apps |
process | List running processes |
query | Search Safari/Chrome/Notes/Photos |
script | Run AppleScript/JXA |
search | Spotlight search |
window | Manage windows (resize, move, focus) |
What this means
Before MCP:
Only terminal AI can control the OS.
Claude Desktop, web AI, IDE copilots → blind to OS.
After MCP:
ANY AI with MCP support controls the OS.
Claude Desktop → ax_press "Merge pull request"
Cursor → ax_tree to read Chrome
Web AI → query safari history
Terminal AI: kernel-cli ax-press "OK" ← direct binary call
MCP AI: mcp__kernel-cli__ax_press("OK") ← MCP tool call
Result: identical
The capability is the same. The access path differs. Terminal or MCP — same OS control, same AX tree, same speed.
Real example: Claude Desktop merging a PR
Claude Desktop has no terminal. But with Kernel CLI MCP:
1. ax_tree(app="Chrome")
→ sees: AXButton "Merge pull request"
2. ax_press(title="Merge pull request")
→ button pressed. PR merged.
3. ax_tree(app="Slack")
→ sees: AXTextField "Message #engineering"
4. type_text("Feature shipped. PR #142 merged.")
→ typed into Slack
5. key(combo="enter")
→ message sent
No screenshot. No VLM. No coordinates. Claude Desktop just pressed a button and typed a message — through MCP, using the OS accessibility tree.
Harness applies to MCP too
When Kernel CLI runs through niia’s MCP registration, the OTP harness applies:
Direct MCP (kernel-cli --mcp):
→ BRAVE MODE. No gate.
Via niia MCP (niia as MCP proxy):
→ OTP required for control tools
→ observe tools always available
→ same physical harnessing as CLI
In connector.json
{
"connector": "2.0",
"name": "full-cycle",
"pipeline": {
"phases": [
{
"name": "code",
"model": "opus",
"prompt": "Implement the feature. Push to branch.",
"capabilities": { "pty": true }
},
{
"name": "verify",
"model": "sonnet",
"prompt": "Open Chrome. Check GitHub CI status. If green, merge.",
"capabilities": {
"kernel": {
"observe": ["windows", "ax-tree"],
"control": ["ax-press"],
"requires_otp": true
}
}
},
{
"name": "notify",
"model": "haiku",
"prompt": "Open Slack. Post to #deploys: 'Feature shipped.'",
"capabilities": {
"kernel": {
"control": ["ax-press", "type", "key"],
"requires_otp": true
}
}
}
]
}
}
Phase 1: Code in terminal. Phase 2: Verify in browser. Phase 3: Notify in Slack.
One pipeline. Terminal + OS. Three different AI models. Each with exactly the capabilities it needs.
Comparison
| Claude CU | OpenAI Operator | Kernel CLI |
|---|
| Price | $20-200/mo | Subscription | Free |
| Method | Screenshot + VLM | Screenshot + VLM | AX tree (native) |
| Speed | ~3.6s/action | ~3s/action | ~61ms/action |
| Accuracy | Probabilistic | Probabilistic | Deterministic |
| VLM cost | Per screenshot | Per screenshot | Zero |
| Platform | macOS only (2026) | Web only | macOS (Linux planned) |
| Vendor lock-in | Anthropic | OpenAI | None |
| Works with | Claude only | GPT only | Any AI |
| Harness | Vendor-controlled | Vendor-controlled | You choose |
| MCP server | No | No | Yes |
| Offline | No | No | Yes |
| Source | Closed | Open (Apache 2.0) | Binary distributed |
Get Started
# Install via OpenCLIs
openclis install niia # includes kernel-cli
# Or use directly
kernel-cli windows # see what's open
kernel-cli ax-tree # read the UI structure
kernel-cli ax-press "OK" # press a button
# With NIIA harness
niia observe windows # always allowed
niia control unlock --scope full # OTP required
niia control ax-press "OK" # after unlock
Kernel CLI is part of the NIIA toolkit by Monolex.
Distributed as a signed binary. Source is proprietary.