Documentation Index
Fetch the complete documentation index at: https://docs.monolex.ai/llms.txt
Use this file to discover all available pages before exploring further.
Dimension 5: The Operating System
The first four dimensions operate inside the terminal.
The fifth dimension breaks out of it.
Recap: Four Dimensions Inside the Terminal
Dimension 1: Agent count → N agents in PTY sessions
Dimension 2: Agent direction → bidirectional, mesh, meeting
Dimension 3: Agent depth → recursive teams, fractal orchestration
Dimension 4: Agent machines → cross-machine via gateway/P2P
All four dimensions live inside the terminal. Agents read files, write code, run commands. Powerful — but blind to everything outside the terminal window.
Dimension 5: Agent Sees the OS
Dimensions 1-4: AI operates inside the terminal
Dimension 5: AI sees and controls the operating system
Terminal world: files, code, git, shell commands
OS world: windows, buttons, apps, processes, screen, clipboard
An agent in dimension 5 can:
See: All open windows and their positions
Accessibility tree of any application
Running processes and their resource usage
Clipboard contents
Browser history, Notes, Photos (via query)
Screenshots of any window or screen
Do: Press any button in any application (by name, not coordinates)
Type text into any field
Send keyboard shortcuts (Cmd+S, Cmd+Tab, etc.)
Open/close/resize/move windows
Launch and quit applications
Run AppleScript/JXA for complex automation
Scroll, drag, click at coordinates
The Capability Layer: Kernel CLI
Kernel CLI is a standalone binary that provides raw OS access:
# See all windows
kernel-cli windows
#279 "Project — Warp" (Warp) FOCUSED
#9099 "GitHub PR #142" (Chrome)
#9237 "terminal" (iTerm2)
# Read accessibility tree of any app
kernel-cli ax-tree --app "Google Chrome" --depth 3
# Press a button by name (no coordinates needed)
kernel-cli ax-press "Merge pull request"
# Type text
kernel-cli type "Approved. Ship it."
# Send keyboard shortcut
kernel-cli key "cmd+shift+enter"
# Take screenshot
kernel-cli capture --window "Google Chrome"
# Query Safari history
kernel-cli query safari --last 24h --keyword "auth"
# List processes
kernel-cli process
claude cpu:70% mem:3%
monolex cpu:85% mem:1%
Kernel CLI uses the accessibility tree (AX), not screenshots.
This means:
- No VLM (vision model) needed
- Structural understanding, not pixel guessing
- Button names, not coordinates
- Fast and deterministic
The Harness Layer: niia observe/control
Kernel CLI is raw capability. niia wraps it with physical harnessing.
Raw (kernel-cli): Harnessed (niia):
Everything allowed. OTP required for control.
No gate. No audit. Time-limited. Scope-limited.
BRAVE MODE. Human holds the key.
# Observe (some commands always allowed)
niia observe windows ✅ always allowed
niia observe process ✅ always allowed
# Observe (gated — exposes screen content)
niia observe ax-tree 🔒 requires: niia control unlock --scope observe
niia observe capture 🔒 requires: niia control unlock --scope observe
# Control (always gated)
niia control unlock --scope full --duration 1h
→ OTP sent to human's email
→ Human enters OTP
→ 1 hour of control access
niia control ax-press "OK" ✅ (within unlock window)
niia control type "hello" ✅ (within unlock window)
niia control key "cmd+s" ✅ (within unlock window)
# After 1 hour:
niia control ax-press "OK" 🔒 locked again
Why AX Tree > Screenshots
Claude Computer Use (screenshot approach):
1. Take screenshot ~500ms
2. Send to VLM ~2000ms
3. VLM reasons: "I see a button that says Merge" ~1000ms
4. VLM outputs coordinates: click(450, 320)
5. Click at coordinates ~100ms
Total: ~3600ms, probabilistic, can misclick
kernel-cli (AX tree approach):
1. Read AX tree ~50ms
2. Find element by name: "Merge pull request" ~1ms
3. Press via AX API ~10ms
Total: ~61ms, deterministic, never misclicks
Crazy fast. Deterministic. No VLM cost. No hallucinated coordinates.
The AX tree is how screen readers work — it’s the OS-native way to understand UI structure. Kernel CLI speaks the same language the OS speaks.
In connector.json
{
"connector": "2.0",
"name": "full-deploy",
"pipeline": {
"phases": [
{
"name": "implement",
"model": "opus",
"prompt": "Implement the feature. Commit and push.",
"capabilities": { "pty": true }
},
{
"name": "verify-ci",
"model": "sonnet",
"prompt": "Open Chrome. Navigate to GitHub PR. Check CI status. If green, click Merge.",
"capabilities": {
"pty": true,
"kernel": {
"observe": ["windows", "ax-tree"],
"control": ["ax-press", "key"],
"requires_otp": true
}
}
},
{
"name": "notify",
"model": "haiku",
"prompt": "Open Slack. Send deploy notification to #engineering.",
"capabilities": {
"kernel": {
"control": ["ax-press", "type", "key"],
"requires_otp": true
}
}
}
]
}
}
Phase 1: Code in terminal (Dimension 1-4).
Phase 2: Check CI in browser, merge PR (Dimension 5).
Phase 3: Notify team in Slack (Dimension 5).
One pipeline. Terminal + OS. Code + UI. All declarative.
The Five Dimensions
Dimension 1: Agent count "How many?"
Dimension 2: Agent direction "Who talks to whom?"
Dimension 3: Agent depth "How deep does the tree go?"
Dimension 4: Agent machines "Which physical machines?"
Dimension 5: Agent OS access "What can the agent see and touch?"
Each dimension is independent.
Each multiplies the others.
connector.json declares all five.
Dimensions 1-4: AI that writes code
Dimension 5: AI that uses the computer
Adding Dimension 5 to a pipeline means the AI can:
- Write code (PTY)
- Check CI status (browser via AX)
- Merge the PR (button press via AX)
- Notify the team (Slack via AX)
- Monitor production dashboards (observe via AX)
- Respond to alerts (control via AX)
This is not "AI coding assistant."
This is "AI computer user."
Safety by Design
Dimension 5 is the most powerful and the most dangerous.
That’s why it has the strongest harness:
Dimensions 1-4: Sandbox + worktree (software isolation)
Dimension 5: OTP (physical isolation)
Software isolation: AI can't write outside worktree
Physical isolation: AI can't control OS without human's OTP
The more powerful the capability,
the stronger the harness.
The strongest harness is physical.
See Physical Harnessing: Why OTP Beats Policy for why this matters.