# Agent Web UI — Architecture Version: 1.0 Date: 2026-02-26 ## Purpose This document describes the high-level architecture of the agent web UI project, the protocols between the single-page app (SPA) and the Flask server, how streaming/truncated LLM replies are handled, where logs are written, and practical commands to search and debug errors. ## Components - `agent_src/agent.py` — The `Agent` orchestrator: constructs messages, calls the LLM client, parses model replies, and invokes tools from the registry. - `agent_src/llm.py` — LLM client that talks to OpenRouter (via `requests`), supports streaming assembly, raw-stream logging, and sentinel markers on stream errors. - `agent_src/tools.py` — Tool registry; includes `AskUserTool` which can either return a synchronous answer (CLI) or a pending token for the web flow. - `agent_src/interfaces.py` — Adapter interfaces: `UserInterfaceAdapter` base, `CLIAdapter`, and web-compatible adapter semantics. - `agent_src/ui_state.py` — In-memory pending store (`PENDING`) for tokens and resume snapshots. - `web.py` — Flask server exposing REST endpoints and serving the SPA from `web_static/`. - `web_static/index.html`, `web_static/app.js` — Browser SPA (vanilla JS) that sends chat requests, displays conversation history, shows non-blocking pending prompts, and resumes pending tokens. ## High-level Flow 1. Browser SPA -> POST `/api/chat` with `{ "instruction": "..." }`. 2. Flask loads the `Agent` and calls `Agent.run(instruction)`. 3. Agent iterates: sends messages to `chat_completion()` (LLM client), parses the assistant reply, and if a tool is to be invoked, calls that tool. 4. If a tool (e.g., `AskUser`) requires user input in the web flow, it returns `{ "awaiting": "", "prompt": "..." }`. 5. Flask returns that awaiting dict to the browser. The SPA creates a non-blocking pending item in the UI. 6. The user answers the pending prompt in the SPA; SPA POSTs `{ "answer": "..." }` to `/api/respond/`. 7. The server loads the snapshot from `agent_src/ui_state.PENDING[token]`, resumes the `Agent` (calls `Agent.resume(token, answer)`), and returns the resumed result. Note: pending state is in-memory by default; server restart clears pending tokens unless persistence is implemented. ## SPA ↔ Server Protocol (examples) - POST /api/chat - Request: `Content-Type: application/json` ```json { "instruction": "Translate the following..." } ``` - Possible responses: - Final answer: `{ "answer": "..." }` - Await user: `{ "awaiting": "", "prompt": "Follow-up question..." }` - Error: `{ "error": "..." }` - POST /api/respond/ - Request: `Content-Type: application/json` ```json { "answer": "User reply text" } ``` - Response: `{ "result": }` or `{ "error": "..." }` - Headers: optional `X-API-KEY` if `WEB_API_KEY` is set on the server. ## How streamed / truncated JSON is handled - The LLM client (`agent_src/llm.py`) attempts streaming assembly of deltas when the provider supports it. - If the stream shows a mid-stream provider error or is cut off, the client appends a sentinel like `<>` and returns the assembled partial text. - `Agent._run_loop` uses `parse_json_response()` to extract JSON. That function: - Strips fenced code blocks (```...```) optionally labeled `json`. - Tries `json.loads()` on the cleaned text. - If that fails, heuristically extracts the first balanced `{...}` substring and tries again. - If parsing still fails, the agent runs regex-based fallbacks to at least determine the `tool` field so execution may continue. This combination enables the agent to continue working even when the provider returns wrapped, partial, or slightly malformed JSON. ## Pending tokens and resume state - When a tool signals it needs a user answer, the agent stores a snapshot in `agent_src/ui_state.PENDING[token]` containing: - `messages` (deep-copied conversation history up to this point) - `step` (the agent loop step number) - `tool_name`, `tool_args` - `model`, `max_steps` - `prompt` (the question presented to the user) - `created_at` (ISO8601 UTC timestamp) - `Agent.resume(token, answer)` appends the user's answer into `messages` and continues the run loop from the next step. - Current implementation: in-memory only. Consider persisting to JSON or Redis for resilience. ## Logs and where to look - `web.log` — main server log (configured in `web.py` via `logging.basicConfig`). Contains endpoint events, created/updated/deleted skeletons, saved pending tokens, and server exceptions. - Agent module logs — use `logging.getLogger(__name__)` inside `agent_src/*`; messages are visible in `web.log` and console. - Optional raw stream log — set environment `OPENROUTER_STREAM_LOG=/path/to/stream.log` to capture raw streaming lines from the provider; this is critical for debugging truncated JSON. Key log messages to search: - `Saved pending token for tool prompt=<...>` — indicates the agent paused and saved the resume snapshot. - `Step – Model reply: <...>` — the raw assembled assistant reply for each loop iteration. - `Failed to parse JSON from model:` — agent couldn't parse; check the raw reply in logs or raw stream. - `<` behind `WEB_API_KEY` or other auth in production. ## Next steps and extensions - Persist `PENDING` to disk or Redis with TTL and recovery on startup. - Add a secure debug endpoint (authenticated) to list pending tokens and metadata for operators. - Add structured logging (JSON) and add request IDs for tracing across SPA → server → LLM streams. - Add unit tests for `parse_json_response()` covering fenced blocks, truncated JSON, and balanced-object extraction. --- File maintained in the repository for developer reference.