Skip to main content

Kernel CLI: Free & Freedom

AI Computer Use — without the subscription, without the vendor lock, without the screenshots. Every AI company is shipping “computer use” in 2026. Claude charges $20-200/mo. OpenAI charges a subscription. Both lock you to their AI. Both use slow screenshot-based VLM. Kernel CLI is free, works with any AI, and uses the OS accessibility tree — crazy fast, deterministic, zero API cost.

The Problem with Screenshot-Based Computer Use

Claude Computer Use / OpenAI Operator / Google Mariner:
  1. Take screenshot                    ~500ms
  2. Send to vision model (VLM)         ~2000ms
  3. VLM: "I see a button at (450,320)" ~1000ms
  4. Click at coordinates               ~100ms

  Total: ~3.6 seconds per action
  Accuracy: probabilistic (VLM can hallucinate coordinates)
  Cost: VLM inference per screenshot
  Vendor: locked to provider's computer use API
It’s slow, expensive, inaccurate, and locked to one vendor.

How Kernel CLI Works

Kernel CLI:
  1. Read accessibility tree (AX)       ~50ms
  2. Find element by name               ~1ms
  3. Act via AX API                     ~10ms

  Total: ~61ms per action
  Accuracy: deterministic (OS-native element identification)
  Cost: zero (local binary, no API call)
  Vendor: none (works with any AI)
Crazy fast. Deterministic. Free. Vendor-independent. The accessibility tree is how screen readers work — it’s the OS-native representation of every window, button, text field, and menu. Kernel CLI speaks the OS’s own language.

What It Can Do

See

# Every open window
kernel-cli windows
  #279 "My Project — VS Code" (Code) FOCUSED
  #412 "Pull Request #142" (Chrome)
  #89  "Slack — engineering" (Slack)

# Accessibility tree of any app (structural, not visual)
kernel-cli ax-tree --app "Google Chrome" --depth 3
  AXWebArea
    AXGroup "main content"
      AXButton "Merge pull request" found by NAME
      AXButton "Close pull request"
      AXStaticText "All checks passed"

# Running processes
kernel-cli process
  claude cpu:70% mem:3%
  node cpu:12% mem:2%

# Screenshots (when visual is actually needed)
kernel-cli capture --window "Chrome"

Act

# Press any button by name (no coordinates)
kernel-cli ax-press "Merge pull request"

# Type text
kernel-cli type "LGTM. Shipping."

# Keyboard shortcuts
kernel-cli key "cmd+shift+enter"

# Window management
kernel-cli window focus "Slack"
kernel-cli window resize "Terminal" 1200 800

# App control
kernel-cli app launch "Safari"
kernel-cli app quit "TextEdit"

# Scroll, click, drag
kernel-cli scroll --down 5
kernel-cli click 450 320
kernel-cli drag 100 100 500 500

Query

# Safari history
kernel-cli query safari --last 24h --keyword "auth"

# Spotlight search
kernel-cli search "connector.json"

# App logs
kernel-cli log --app "Xcode" --last 1h

# Open files by process
kernel-cli lsof --pid 1234

Automate

# AppleScript / JXA
kernel-cli script "tell application \"Finder\" to make new folder at desktop with properties {name:\"AI Output\"}"

# macOS Shortcuts
kernel-cli shortcut "Send Daily Report"

# URL schemes
kernel-cli open "slack://channel?team=T123&id=C456"

AX Tree vs Screenshot: Side by Side

A human asks: “Click the Merge button on GitHub.”

Screenshot approach (Claude Computer Use):

1. kernel-cli capture --window "Chrome"     → screenshot.png
2. Send screenshot.png to VLM               → $0.01, 2 seconds
3. VLM: "I see 'Merge pull request' button
         at approximately (645, 480)"
4. kernel-cli click 645 480                 → might miss
5. If missed, take another screenshot...    → retry loop

AX tree approach (Kernel CLI):

1. kernel-cli ax-tree --app "Chrome"
     → AXButton "Merge pull request" (found)
2. kernel-cli ax-press "Merge pull request"
     → pressed (deterministic, no coordinates)
Two commands. No VLM. No cost. No retry. No guessing.

Freedom: You Choose the Harness

Other computer use tools come with built-in restrictions from the vendor:
  • Claude Computer Use: Anthropic decides what Claude can access
  • OpenAI Operator: OpenAI decides the safety boundary
  • Google Mariner: Google decides the permissions
Kernel CLI is raw capability. You decide the harness:
Option 1: No harness (development, testing)
  $ kernel-cli ax-press "Delete Everything"
  → executes immediately. BRAVE MODE.

Option 2: NIIA harness (production, shared machines)
  $ niia control unlock --scope full --duration 1h
  → OTP sent to your email
  → you enter OTP
  → 1 hour of controlled access
  $ niia control ax-press "Delete Everything"
  → executes within unlock window

Option 3: Enterprise harness (future)
  → Policy file defines what's allowed per role
  → Audit log records every action
  → OTP for destructive operations
  → Scope-limited to specific apps/windows
The capability is the same in all three. The control is yours.

With Any AI

Kernel CLI is a standalone binary. It doesn’t know or care which AI calls it.
# Claude uses it
claude> Use kernel-cli to check if CI passed on Chrome

# Codex uses it
codex> kernel-cli ax-tree --app "Chrome" | grep "checks passed"

# Gemini uses it
gemini> Read the accessibility tree of Slack and find unread messages

# A script uses it
#!/bin/bash
kernel-cli ax-press "Deploy to Production"

MCP: Kernel CLI for All

Terminal AI tools (Claude Code, Codex, Gemini CLI) can call kernel-cli directly. But what about AI that doesn’t have a terminal — Claude Desktop, web-based AI, IDE copilots? Kernel CLI runs as an MCP server. Any AI that supports MCP gets full OS access.
{
  "mcpServers": {
    "kernel-cli": {
      "command": "kernel-cli",
      "args": ["--mcp"]
    }
  }
}
One line of config. The AI gets these tools:
MCP ToolWhat it does
snapshotWindows + AX tree + clipboard in one call
windowsList all open windows
ax_treeRead UI structure of any app
ax_pressPress any button by name
ax_setSet value on any UI element
type_textType text
keyPress keyboard shortcuts
clickClick at coordinates
scrollScroll any element
captureTake screenshot
appLaunch, quit, switch apps
processList running processes
querySearch Safari/Chrome/Notes/Photos
scriptRun AppleScript/JXA
searchSpotlight search
windowManage windows (resize, move, focus)

What this means

Before MCP:
  Only terminal AI can control the OS.
  Claude Desktop, web AI, IDE copilots → blind to OS.

After MCP:
  ANY AI with MCP support controls the OS.
  Claude Desktop → ax_press "Merge pull request"
  Cursor → ax_tree to read Chrome
  Web AI → query safari history
Terminal AI:     kernel-cli ax-press "OK"           ← direct binary call
MCP AI:          mcp__kernel-cli__ax_press("OK")    ← MCP tool call
Result:          identical
The capability is the same. The access path differs. Terminal or MCP — same OS control, same AX tree, same speed.

Real example: Claude Desktop merging a PR

Claude Desktop has no terminal. But with Kernel CLI MCP:
1. ax_tree(app="Chrome")
   → sees: AXButton "Merge pull request"

2. ax_press(title="Merge pull request")
   → button pressed. PR merged.

3. ax_tree(app="Slack")
   → sees: AXTextField "Message #engineering"

4. type_text("Feature shipped. PR #142 merged.")
   → typed into Slack

5. key(combo="enter")
   → message sent
No screenshot. No VLM. No coordinates. Claude Desktop just pressed a button and typed a message — through MCP, using the OS accessibility tree.

Harness applies to MCP too

When Kernel CLI runs through niia’s MCP registration, the OTP harness applies:
Direct MCP (kernel-cli --mcp):
  → BRAVE MODE. No gate.

Via niia MCP (niia as MCP proxy):
  → OTP required for control tools
  → observe tools always available
  → same physical harnessing as CLI

In connector.json

{
  "connector": "2.0",
  "name": "full-cycle",
  "pipeline": {
    "phases": [
      {
        "name": "code",
        "model": "opus",
        "prompt": "Implement the feature. Push to branch.",
        "capabilities": { "pty": true }
      },
      {
        "name": "verify",
        "model": "sonnet",
        "prompt": "Open Chrome. Check GitHub CI status. If green, merge.",
        "capabilities": {
          "kernel": {
            "observe": ["windows", "ax-tree"],
            "control": ["ax-press"],
            "requires_otp": true
          }
        }
      },
      {
        "name": "notify",
        "model": "haiku",
        "prompt": "Open Slack. Post to #deploys: 'Feature shipped.'",
        "capabilities": {
          "kernel": {
            "control": ["ax-press", "type", "key"],
            "requires_otp": true
          }
        }
      }
    ]
  }
}
Phase 1: Code in terminal. Phase 2: Verify in browser. Phase 3: Notify in Slack. One pipeline. Terminal + OS. Three different AI models. Each with exactly the capabilities it needs.

Comparison

Claude CUOpenAI OperatorKernel CLI
Price$20-200/moSubscriptionFree
MethodScreenshot + VLMScreenshot + VLMAX tree (native)
Speed~3.6s/action~3s/action~61ms/action
AccuracyProbabilisticProbabilisticDeterministic
VLM costPer screenshotPer screenshotZero
PlatformmacOS only (2026)Web onlymacOS (Linux planned)
Vendor lock-inAnthropicOpenAINone
Works withClaude onlyGPT onlyAny AI
HarnessVendor-controlledVendor-controlledYou choose
MCP serverNoNoYes
OfflineNoNoYes
SourceClosedOpen (Apache 2.0)Binary distributed

Get Started

# Install via OpenCLIs
openclis install niia    # includes kernel-cli

# Or use directly
kernel-cli windows       # see what's open
kernel-cli ax-tree       # read the UI structure
kernel-cli ax-press "OK" # press a button

# With NIIA harness
niia observe windows              # always allowed
niia control unlock --scope full  # OTP required
niia control ax-press "OK"       # after unlock

Kernel CLI is part of the NIIA toolkit by Monolex. Distributed as a signed binary. Source is proprietary.