Introducing agent-doc: Documents as the UI for AI Agents

AI assistants changed how I write software, but the conversation interface hasn't kept up. Terminal chat scrolls away. Web UIs lock you into rigid turn-taking. You can't edit an AI's response, reorganize a conversation, or version-control it alongside your code. I built agent-doc to fix this.

The Problem with AI Chat: Every AI coding assistant I've used has the same limitation: the conversation is ephemeral. You get a response, scroll past it, and it's gone. If the agent gives you a mediocre answer, you can't edit it — you have to re-prompt and hope for better. If a conversation meanders, you can't prune the irrelevant parts. You can't reorganize. You can't annotate.

Most critically, these conversations happen outside your development environment. They live in a browser tab or terminal session, disconnected from your editor, your version control, and your project structure. When a session ends, the context is lost.

I wanted something different: a conversation that lives in a file, tracked by git, editable in my IDE, and persistent across sessions.
Documents as the UI: agent-doc's core insight is simple: a markdown file is already a great conversation UI. Your editor has syntax highlighting, folding, split panes, search, and undo. Git gives you history, branching, and diffs. Markdown gives you rich formatting. Why build another chat interface when all of this already exists?

With agent-doc, you edit a markdown file. You write in a ## User section. You press a hotkey. agent-doc computes the diff since your last submit, sends it to the AI agent, and appends the response as a ## Assistant section. The conversation grows in the document. You can:

Edit the agent's responses — fix errors, improve clarity, trim noise

Delete old sections — the agent sees the deletions in the diff

Reorganize freely — move sections around, add headers, restructure

Annotate inline — add blockquotes or comments anywhere as prompts

Version control everything — git tracks every exchange

The document is a living artifact, not a disposable chat log.
How It Works: The core loop is straightforward:

Snapshot — agent-doc stores a snapshot of the document after each submit

Diff — on the next submit, it computes a unified diff between the snapshot and current content

Send — the diff (plus full document for context) goes to the AI agent

Append — the agent's response is appended as a new ## Assistant block

Merge — if you edited during the response, a 3-way merge preserves your changes

Commit — auto-commits before each submit so your editor shows diff gutters for what the agent added

The diff-based approach is key. The agent doesn't just see your latest message — it sees exactly what changed. If you deleted three paragraphs and added a question, the agent knows. If you edited its previous response to correct a fact, it sees that correction. This gives the agent much richer context than a simple "last message" approach.
Comment Stripping: Your Private Scratchpad: One subtle but important feature: HTML comments () and link reference comments ([//]: # (...)) are stripped before computing the diff. This means you can leave private notes in your document — reminders, TODOs, context for yourself — without triggering an agent response.

But if you uncomment text (remove the comment markers), that IS treated as a real change. So comments function as a staging area: write something in a comment, think about it, then uncomment when you're ready for the agent to see it.
Concurrent Edit Merging: A common problem with AI tools: you're waiting for a response, and you want to keep editing. Most tools force you to wait. agent-doc handles this with a 3-way merge using git merge-file.

When the agent finishes responding, agent-doc re-reads the document. If it changed since the response started (because you kept editing), it performs a 3-way merge between:

The document as it was when the agent started

The document with the agent's response appended

The current document (with your concurrent edits)

If the edits are in different regions, the merge is clean. If they overlap, you get standard conflict markers with clear labels: agent-response, original, and your-edits.
Components: Re-renderable Regions: Sometimes you want the agent to maintain a specific section of a document — a status table, a summary, a dashboard widget. agent-doc has "components" for this:

 | Service | State | |---------|---------| | api | healthy | 
Paired markers create an unambiguous boundary. External scripts can update components via agent-doc patch:

agent-doc patch dashboard.md status "| api | degraded |" echo "deploy complete" | agent-doc patch dashboard.md log
Components can be configured for different modes — replace (default), append, or prepend — with optional max entries, timestamps, and pre/post-patch hooks. Combined with the agent-doc watch daemon, this enables live dashboards where external events trigger agent analysis.
tmux Routing: Persistent Agent Sessions: A document needs a dedicated agent session. agent-doc solves this with tmux routing: each document gets a UUID in its frontmatter (agent_doc_session), mapped to a tmux pane in a session registry.

When you press the submit hotkey, agent-doc routes to the correct pane — or auto-starts one if needed. This means:

Persistent sessions — close your editor, reopen, press submit — same agent session

Multiple documents — each gets its own pane and conversation

Layout sync — agent-doc sync mirrors your editor's split layout in tmux

IDE integration — JetBrains and VS Code plugins detect split positions and sync automatically

The tmux routing was complex enough that I extracted it as a standalone library, tmux-router, with 46 tests covering the reconciliation algorithm.
Editor Integration: agent-doc ships with plugins for JetBrains IDEs and VS Code. The philosophy: the CLI does the work, plugins are thin wrappers.

Both plugins provide:

Submit hotkey (Ctrl+Shift+Alt+A) — save and route to agent

Claim hotkey (Ctrl+Shift+Alt+C) — bind document to a tmux pane

Layout sync (Ctrl+Shift+Alt+L) — sync editor splits to tmux

Tab-change sync — auto-focus the right tmux pane when you switch editor tabs

Permission prompt polling — surfaces Claude's permission requests as IDE overlays

The JetBrains plugin renders permission prompts as a JLayeredPane overlay so you can approve or deny tool use without leaving your editor. The VS Code extension keeps things minimal — all logic lives in the CLI.
Claude Code Integration: agent-doc integrates with Claude Code via a skill definition (/agent-doc). The skill handles the full workflow: read document, compute diff, respond in the console (streaming), write the response back to the document, and update the snapshot.

This dual-mode approach — CLI for automation, skill for interactive use — means agent-doc works whether you're running Claude as a subprocess or using it interactively in the terminal.
Built with Rust: agent-doc is written in Rust. This was a practical choice:

Fast startup — the CLI runs on every hotkey press; latency matters

Reliable diffing — the similar crate handles unified diffs without external dependencies

Single binary — no runtime, no node_modules, just cargo install agent-doc

Cross-platform — works on Linux and macOS (anywhere tmux runs)

The project has 150+ tests across unit, integration, and property-based testing. The tmux-router library adds another 46. I take correctness seriously — the merge-safe write path and snapshot consistency are critical to not losing user work.
Design Philosophy: A few principles guided the design:

Documents over chat. Terminal chat is ephemeral; documents are persistent and curated. The user owns the conversation artifact.

Diffs over messages. The agent sees what changed, not just the latest message. This captures intent — deletions, edits, reorganization all carry meaning.

CLI is dumb, skill is smart. The CLI handles plumbing: diffing, snapshots, routing, merging. The AI skill handles interpretation: understanding what the diff means and responding appropriately. This separation keeps the tool agent-agnostic.

Git-native. Auto-commits create a timeline. Diff gutters show what the agent added. Branches isolate experiments. The document's history IS the conversation's history.

Concurrent editing is normal. Users don't stop working while waiting for a response. The 3-way merge treats this as the default case, not an edge case.
Getting Started: Install from crates.io:

cargo install agent-doc
Initialize a session document:

agent-doc init plan.md "Project Plan"
Edit the document in your IDE, write in the ## User section, then submit:

agent-doc run plan.md
Or use the editor plugins for hotkey-driven workflow. See the GitHub repo for full documentation.
What's Next: agent-doc is at v0.9.0. The core workflow is solid and I use it daily. Upcoming features:

agent-doc compact — auto-summarize old exchanges to keep documents focused

agent-doc deep — fan-out parallel subagents for complex research tasks

Direct API backend — skip the CLI subprocess, call Claude API directly

Dashboard templates — agent-doc init --dashboard for common monitoring patterns

The project is open source. If you're tired of ephemeral AI chat and want your conversations to be persistent, editable, and version-controlled — give agent-doc a try.

The Problem with AI Chat

Documents as the UI

How It Works

Comment Stripping: Your Private Scratchpad

Concurrent Edit Merging

Components: Re-renderable Regions

tmux Routing: Persistent Agent Sessions

Editor Integration

Claude Code Integration

Built with Rust

Design Philosophy

Getting Started

What's Next