Skip to main content

Dimension 5: The Operating System

The first four dimensions operate inside the terminal. The fifth dimension breaks out of it.

Recap: Four Dimensions Inside the Terminal

Dimension 1: Agent count       → N agents in PTY sessions
Dimension 2: Agent direction   → bidirectional, mesh, meeting
Dimension 3: Agent depth       → recursive teams, fractal orchestration
Dimension 4: Agent machines    → cross-machine via gateway/P2P
All four dimensions live inside the terminal. Agents read files, write code, run commands. Powerful — but blind to everything outside the terminal window.

Dimension 5: Agent Sees the OS

Dimensions 1-4:  AI operates inside the terminal
Dimension 5:     AI sees and controls the operating system

  Terminal world:    files, code, git, shell commands
  OS world:          windows, buttons, apps, processes, screen, clipboard
An agent in dimension 5 can:
See:     All open windows and their positions
         Accessibility tree of any application
         Running processes and their resource usage
         Clipboard contents
         Browser history, Notes, Photos (via query)
         Screenshots of any window or screen

Do:      Press any button in any application (by name, not coordinates)
         Type text into any field
         Send keyboard shortcuts (Cmd+S, Cmd+Tab, etc.)
         Open/close/resize/move windows
         Launch and quit applications
         Run AppleScript/JXA for complex automation
         Scroll, drag, click at coordinates

The Capability Layer: Kernel CLI

Kernel CLI is a standalone binary that provides raw OS access:
# See all windows
kernel-cli windows
  #279 "Project — Warp" (Warp) FOCUSED
  #9099 "GitHub PR #142" (Chrome)
  #9237 "terminal" (iTerm2)

# Read accessibility tree of any app
kernel-cli ax-tree --app "Google Chrome" --depth 3

# Press a button by name (no coordinates needed)
kernel-cli ax-press "Merge pull request"

# Type text
kernel-cli type "Approved. Ship it."

# Send keyboard shortcut
kernel-cli key "cmd+shift+enter"

# Take screenshot
kernel-cli capture --window "Google Chrome"

# Query Safari history
kernel-cli query safari --last 24h --keyword "auth"

# List processes
kernel-cli process
  claude cpu:70% mem:3%
  monolex cpu:85% mem:1%
Kernel CLI uses the accessibility tree (AX), not screenshots. This means:
  • No VLM (vision model) needed
  • Structural understanding, not pixel guessing
  • Button names, not coordinates
  • Fast and deterministic

The Harness Layer: niia observe/control

Kernel CLI is raw capability. niia wraps it with physical harnessing.
Raw (kernel-cli):          Harnessed (niia):
  Everything allowed.        OTP required for control.
  No gate. No audit.         Time-limited. Scope-limited.
  BRAVE MODE.                Human holds the key.
# Observe (some commands always allowed)
niia observe windows always allowed
niia observe process always allowed

# Observe (gated — exposes screen content)
niia observe ax-tree        🔒 requires: niia control unlock --scope observe
niia observe capture        🔒 requires: niia control unlock --scope observe

# Control (always gated)
niia control unlock --scope full --duration 1h
 OTP sent to human's email
  → Human enters OTP
  → 1 hour of control access

niia control ax-press "OK"  ✅ (within unlock window)
niia control type "hello"   ✅ (within unlock window)
niia control key "cmd+s"    ✅ (within unlock window)

# After 1 hour:
niia control ax-press "OK"  🔒 locked again

Why AX Tree > Screenshots

Claude Computer Use (screenshot approach):
  1. Take screenshot                           ~500ms
  2. Send to VLM                               ~2000ms
  3. VLM reasons: "I see a button that says Merge" ~1000ms
  4. VLM outputs coordinates: click(450, 320)
  5. Click at coordinates                       ~100ms
  Total: ~3600ms, probabilistic, can misclick

kernel-cli (AX tree approach):
  1. Read AX tree                              ~50ms
  2. Find element by name: "Merge pull request" ~1ms
  3. Press via AX API                          ~10ms
  Total: ~61ms, deterministic, never misclicks
Crazy fast. Deterministic. No VLM cost. No hallucinated coordinates. The AX tree is how screen readers work — it’s the OS-native way to understand UI structure. Kernel CLI speaks the same language the OS speaks.

In connector.json

{
  "connector": "2.0",
  "name": "full-deploy",

  "pipeline": {
    "phases": [
      {
        "name": "implement",
        "model": "opus",
        "prompt": "Implement the feature. Commit and push.",
        "capabilities": { "pty": true }
      },
      {
        "name": "verify-ci",
        "model": "sonnet",
        "prompt": "Open Chrome. Navigate to GitHub PR. Check CI status. If green, click Merge.",
        "capabilities": {
          "pty": true,
          "kernel": {
            "observe": ["windows", "ax-tree"],
            "control": ["ax-press", "key"],
            "requires_otp": true
          }
        }
      },
      {
        "name": "notify",
        "model": "haiku",
        "prompt": "Open Slack. Send deploy notification to #engineering.",
        "capabilities": {
          "kernel": {
            "control": ["ax-press", "type", "key"],
            "requires_otp": true
          }
        }
      }
    ]
  }
}
Phase 1: Code in terminal (Dimension 1-4). Phase 2: Check CI in browser, merge PR (Dimension 5). Phase 3: Notify team in Slack (Dimension 5). One pipeline. Terminal + OS. Code + UI. All declarative.

The Five Dimensions

Dimension 1: Agent count       "How many?"
Dimension 2: Agent direction   "Who talks to whom?"
Dimension 3: Agent depth       "How deep does the tree go?"
Dimension 4: Agent machines    "Which physical machines?"
Dimension 5: Agent OS access   "What can the agent see and touch?"

Each dimension is independent.
Each multiplies the others.
connector.json declares all five.
Dimensions 1-4:  AI that writes code
Dimension 5:     AI that uses the computer

Adding Dimension 5 to a pipeline means the AI can:
  - Write code (PTY)
  - Check CI status (browser via AX)
  - Merge the PR (button press via AX)
  - Notify the team (Slack via AX)
  - Monitor production dashboards (observe via AX)
  - Respond to alerts (control via AX)

This is not "AI coding assistant."
This is "AI computer user."

Safety by Design

Dimension 5 is the most powerful and the most dangerous. That’s why it has the strongest harness:
Dimensions 1-4:  Sandbox + worktree (software isolation)
Dimension 5:     OTP (physical isolation)

Software isolation: AI can't write outside worktree
Physical isolation: AI can't control OS without human's OTP

The more powerful the capability,
the stronger the harness.
The strongest harness is physical.
See Physical Harnessing: Why OTP Beats Policy for why this matters.