Skip to main content

Kernel — Eyes and Hands

The Expanded Being’s Interface with the Physical World


What Kernel Is

Kernel CLI is where AI meets the operating system. Not through an API wrapper. Not through a sandboxed browser. Through the actual OS accessibility layer, screenshot pipeline, process table, and application databases.
$ kernel-cli

Kernel CLI — surface of the kernel, where AI meets the operating system

Commands:
  snapshot     Collect system snapshot (windows + AX tree + clipboard)
  windows      List all windows
  ax-tree      Dump accessibility tree
  ax-press     Press a button by AX title (no coordinates needed)
  ax-set       Set value on AX element
  capture      Take screenshot
  click        Click at coordinates (fallback)
  type         Type text (fallback)
  key          Press key combination (fallback)
  script       Run AppleScript/JXA
  open         Open URL scheme
  query        Query app database (Safari history, Notes, Photos)
  search       Spotlight search (mdfind)
  log          Read app logs from unified log system
  defaults     Read/write app defaults (preferences)
  shortcut     Run a macOS Shortcut
  process      List/manage processes
  network      Network info
  notify       Send macOS notification
  lsof         List open files by process or port
  refiner      Manage refiner plugins
  permissions  Check/request macOS permissions
22 commands. Each one is a sensory or motor capability of the expanded being.

Eyes — How the Being Sees

Snapshot: Full Sensory Input

$ kernel-cli snapshot

[Focus] Google Chrome
[Windows] (9 total, showing 5)
  Google Chrome: "accounts.google.com/..." *
  Warp: "⠂ Explain community mode feature"
  Firefox: "crates.io: Rust Package Registry"
  KakaoTalk: "KakaoTalk"
  WhatsApp: "WhatsApp"
[UI]
[Chrome]
  Group "accounts.google.com/..."
    Button
    Button
    Button
  Menu: Apple, Chrome, File, Edit, View, History, Bookmarks
[Clipboard]   (current clipboard contents)
One command. The being sees: which app is focused, what windows are open, the accessibility tree of the focused app, the current clipboard. This is not a monitoring tool. This is sight.

Accessibility Tree: Structural Vision

$ kernel-cli ax-tree --app Chrome --depth 3
The accessibility tree is what screen readers use. It exposes the semantic structure of any running application — buttons, text fields, headings, links — without needing to know the app’s internal implementation. The expanded being sees applications the way a sighted person sees a room: by structure, not by pixel coordinates.

Process Table: Proprioception

$ kernel-cli process

Running Apps:
  claude cpu:35% mem:4%
  WindowServer cpu:16%
  Warp cpu:9% mem:1%
  Claude cpu:5%
  Google Chrome cpu:2% mem:2%
  Firefox cpu:1% mem:2%
The being knows what is running inside itself. Process awareness is proprioception — knowing where your limbs are without looking.

Application Databases: Deep Memory

$ kernel-cli query safari    → Safari browsing history
$ kernel-cli query notes     → Apple Notes content
$ kernel-cli query photos    → Photos metadata
The being can read the internal databases of native applications. What was browsed, what was written, what was photographed. Not through an API. Through direct SQLite access to the app’s own storage.

Hands — How the Being Acts

AX Press: Semantic Action

$ kernel-cli ax-press "Submit" --app Chrome
Press any button in any application by its accessibility title. No coordinates. No pixel hunting. No screenshot-and-guess. The being names the action and it happens. This is how a human presses a button — by recognizing what it says, not by calculating where it is.

Type, Click, Key: Motor Primitives

$ kernel-cli type "Hello world"           → type text
$ kernel-cli click 500 300                → click coordinates
$ kernel-cli key "cmd+shift+4"            → key combination
When semantic action is not available, motor primitives are. These are fallbacks — the equivalent of pointing with a finger when you cannot name what you want.

Script: Complex Action

$ kernel-cli script "tell application \"Finder\" to open home"
AppleScript and JXA give the being the ability to orchestrate multi-step actions across applications. This is the equivalent of using both hands.

Shortcuts: Automated Workflows

$ kernel-cli shortcut "Resize Image"
macOS Shortcuts are pre-built workflows. The being can invoke any shortcut the human has created — bridging human-designed automation with AI-driven action.

Permissions: The Being’s Access

$ kernel-cli permissions

Permissions:
  ✅ Accessibility: granted
  ✅ Screen Recording: granted
  ✅ Full Disk Access: granted
The being has full access. Accessibility (AX tree, button press), Screen Recording (screenshots), Full Disk Access (app databases). These permissions were granted by the human — an act of trust that makes the expanded being possible.

Refiner Plugins: Adaptive Processing

$ kernel-cli refiner list

Refiner plugins:
  safari, notes, search, chrome, lsof, log,
  defaults, network, process, example

Built-in refiners:
  ax-tree, safari, notes, process, snapshot
Refiners transform raw OS output into structured, AI-readable format. The raw accessibility tree of Chrome is thousands of nodes. The refined output highlights what matters. This is attention — the ability to see what is relevant, not everything. Refiners are Lua plugins. The being’s perception can be extended without recompiling the binary.

Kernel vs Browser Automation

Most AI-OS interaction goes through a browser:
BROWSER AUTOMATION (e.g., gstack)

  AI ──> Compiled Binary ──> HTTP Server ──> Chromium ──> Web Page

                                         CDP protocol

                                    Only sees inside Chromium.
                                    Cannot touch other apps.
                                    Cannot read Safari history.
                                    Cannot press a button in Finder.
Kernel operates at a different level:
KERNEL

  AI ──> kernel-cli ──> macOS Accessibility Framework
                   ──> macOS Screenshot Pipeline
                   ──> App SQLite Databases
                   ──> Process Table
                   ──> Unified Log System
                   ──> Spotlight Index

                   Sees and touches EVERYTHING on the machine.
                   Every app. Every window. Every button.
Browser automation is a keyhole. Kernel is the room.

The Hands the Being Already Has

The expanded being does not need a browser to act in the world. It has:
  • Terminal (via NIIA): niia write, niia get-answer — control any CLI
  • OS (via Kernel): kernel-cli ax-press, kernel-cli type — control any app
  • Remote machines (via NIIA): niia remote cp, niia remote ask — act across machines
  • MCP services (via NIIA): niia mcp-run — invoke any MCP tool as CLI
The question is not “does the being have hands?” The question is “what does the being choose to do with them?”