Documentation Index
Fetch the complete documentation index at: https://docs.monolex.ai/llms.txt
Use this file to discover all available pages before exploring further.
Kernel — Eyes and Hands
The Expanded Being’s Interface with the Physical World
What Kernel Is
Kernel CLI is where AI meets the operating system. Not through an API wrapper.
Not through a sandboxed browser. Through the actual OS accessibility layer,
screenshot pipeline, process table, and application databases.
$ kernel-cli
Kernel CLI — surface of the kernel, where AI meets the operating system
Commands:
snapshot Collect system snapshot (windows + AX tree + clipboard)
windows List all windows
ax-tree Dump accessibility tree
ax-press Press a button by AX title (no coordinates needed)
ax-set Set value on AX element
capture Take screenshot
click Click at coordinates (fallback)
type Type text (fallback)
key Press key combination (fallback)
script Run AppleScript/JXA
open Open URL scheme
query Query app database (Safari history, Notes, Photos)
search Spotlight search (mdfind)
log Read app logs from unified log system
defaults Read/write app defaults (preferences)
shortcut Run a macOS Shortcut
process List/manage processes
network Network info
notify Send macOS notification
lsof List open files by process or port
refiner Manage refiner plugins
permissions Check/request macOS permissions
22 commands. Each one is a sensory or motor capability of the expanded being.
Eyes — How the Being Sees
$ kernel-cli snapshot
[Focus] Google Chrome
[Windows] (9 total, showing 5)
Google Chrome: "accounts.google.com/..." *
Warp: "⠂ Explain community mode feature"
Firefox: "crates.io: Rust Package Registry"
KakaoTalk: "KakaoTalk"
WhatsApp: "WhatsApp"
[UI]
[Chrome]
Group "accounts.google.com/..."
Button
Button
Button
Menu: Apple, Chrome, File, Edit, View, History, Bookmarks
[Clipboard] (current clipboard contents)
One command. The being sees: which app is focused, what windows are open,
the accessibility tree of the focused app, the current clipboard. This is
not a monitoring tool. This is sight.
Accessibility Tree: Structural Vision
$ kernel-cli ax-tree --app Chrome --depth 3
The accessibility tree is what screen readers use. It exposes the semantic
structure of any running application — buttons, text fields, headings, links —
without needing to know the app’s internal implementation. The expanded being
sees applications the way a sighted person sees a room: by structure, not by
pixel coordinates.
Process Table: Proprioception
$ kernel-cli process
Running Apps:
claude cpu:35% mem:4%
WindowServer cpu:16%
Warp cpu:9% mem:1%
Claude cpu:5%
Google Chrome cpu:2% mem:2%
Firefox cpu:1% mem:2%
The being knows what is running inside itself. Process awareness is
proprioception — knowing where your limbs are without looking.
Application Databases: Deep Memory
$ kernel-cli query safari → Safari browsing history
$ kernel-cli query notes → Apple Notes content
$ kernel-cli query photos → Photos metadata
The being can read the internal databases of native applications. What was
browsed, what was written, what was photographed. Not through an API. Through
direct SQLite access to the app’s own storage.
Hands — How the Being Acts
AX Press: Semantic Action
$ kernel-cli ax-press "Submit" --app Chrome
Press any button in any application by its accessibility title. No coordinates.
No pixel hunting. No screenshot-and-guess. The being names the action and it
happens. This is how a human presses a button — by recognizing what it says,
not by calculating where it is.
Type, Click, Key: Motor Primitives
$ kernel-cli type "Hello world" → type text
$ kernel-cli click 500 300 → click coordinates
$ kernel-cli key "cmd+shift+4" → key combination
When semantic action is not available, motor primitives are. These are
fallbacks — the equivalent of pointing with a finger when you cannot name
what you want.
Script: Complex Action
$ kernel-cli script "tell application \"Finder\" to open home"
AppleScript and JXA give the being the ability to orchestrate multi-step
actions across applications. This is the equivalent of using both hands.
Shortcuts: Automated Workflows
$ kernel-cli shortcut "Resize Image"
macOS Shortcuts are pre-built workflows. The being can invoke any shortcut
the human has created — bridging human-designed automation with AI-driven action.
Permissions: The Being’s Access
$ kernel-cli permissions
Permissions:
✅ Accessibility: granted
✅ Screen Recording: granted
✅ Full Disk Access: granted
The being has full access. Accessibility (AX tree, button press), Screen Recording
(screenshots), Full Disk Access (app databases). These permissions were granted
by the human — an act of trust that makes the expanded being possible.
Refiner Plugins: Adaptive Processing
$ kernel-cli refiner list
Refiner plugins:
safari, notes, search, chrome, lsof, log,
defaults, network, process, example
Built-in refiners:
ax-tree, safari, notes, process, snapshot
Refiners transform raw OS output into structured, AI-readable format. The raw
accessibility tree of Chrome is thousands of nodes. The refined output highlights
what matters. This is attention — the ability to see what is relevant, not
everything.
Refiners are Lua plugins. The being’s perception can be extended without
recompiling the binary.
Kernel vs Browser Automation
Most AI-OS interaction goes through a browser:
BROWSER AUTOMATION (e.g., gstack)
AI ──> Compiled Binary ──> HTTP Server ──> Chromium ──> Web Page
│
CDP protocol
│
Only sees inside Chromium.
Cannot touch other apps.
Cannot read Safari history.
Cannot press a button in Finder.
Kernel operates at a different level:
KERNEL
AI ──> kernel-cli ──> macOS Accessibility Framework
──> macOS Screenshot Pipeline
──> App SQLite Databases
──> Process Table
──> Unified Log System
──> Spotlight Index
│
Sees and touches EVERYTHING on the machine.
Every app. Every window. Every button.
Browser automation is a keyhole. Kernel is the room.
The Hands the Being Already Has
The expanded being does not need a browser to act in the world. It has:
- Terminal (via NIIA):
niia write, niia get-answer — control any CLI
- OS (via Kernel):
kernel-cli ax-press, kernel-cli type — control any app
- Remote machines (via NIIA):
niia remote cp, niia remote ask — act across machines
- MCP services (via NIIA):
niia mcp-run — invoke any MCP tool as CLI
The question is not “does the being have hands?” The question is “what does the
being choose to do with them?”