# Agent Web UI — Architecture

Version: 1.0
Date: 2026-02-26

## Purpose

This document describes the high-level architecture of the agent web UI project, the protocols between the single-page app (SPA) and the Flask server, how streaming/truncated LLM replies are handled, where logs are written, and practical commands to search and debug errors.

## Components

- `agent_src/agent.py` — The `Agent` orchestrator: constructs messages, calls the LLM client, parses model replies, and invokes tools from the registry.
- `agent_src/llm.py` — LLM client that talks to OpenRouter (via `requests`), supports streaming assembly, raw-stream logging, and sentinel markers on stream errors.
- `agent_src/tools.py` — Tool registry; includes `AskUserTool` which can either return a synchronous answer (CLI) or a pending token for the web flow.
- `agent_src/interfaces.py` — Adapter interfaces: `UserInterfaceAdapter` base, `CLIAdapter`, and web-compatible adapter semantics.
- `agent_src/ui_state.py` — In-memory pending store (`PENDING`) for tokens and resume snapshots.
- `web.py` — Flask server exposing REST endpoints and serving the SPA from `web_static/`.
- `web_static/index.html`, `web_static/app.js` — Browser SPA (vanilla JS) that sends chat requests, displays conversation history, shows non-blocking pending prompts, and resumes pending tokens.

## High-level Flow

1. Browser SPA -> POST `/api/chat` with `{ "instruction": "..." }`.
2. Flask loads the `Agent` and calls `Agent.run(instruction)`.
3. Agent iterates: sends messages to `chat_completion()` (LLM client), parses the assistant reply, and if a tool is to be invoked, calls that tool.
4. If a tool (e.g., `AskUser`) requires user input in the web flow, it returns `{ "awaiting": "<token>", "prompt": "..." }`.
5. Flask returns that awaiting dict to the browser. The SPA creates a non-blocking pending item in the UI.
6. The user answers the pending prompt in the SPA; SPA POSTs `{ "answer": "..." }` to `/api/respond/<token>`.
7. The server loads the snapshot from `agent_src/ui_state.PENDING[token]`, resumes the `Agent` (calls `Agent.resume(token, answer)`), and returns the resumed result.

Note: pending state is in-memory by default; server restart clears pending tokens unless persistence is implemented.

## SPA ↔ Server Protocol (examples)

- POST /api/chat
  - Request: `Content-Type: application/json`
    ```json
    { "instruction": "Translate the following..." }
    ```
  - Possible responses:
    - Final answer: `{ "answer": "..." }`
    - Await user: `{ "awaiting": "<token>", "prompt": "Follow-up question..." }`
    - Error: `{ "error": "..." }`

- POST /api/respond/<token>
  - Request: `Content-Type: application/json`
    ```json
    { "answer": "User reply text" }
    ```
  - Response: `{ "result": <string | object> }` or `{ "error": "..." }`

- Headers: optional `X-API-KEY` if `WEB_API_KEY` is set on the server.

## How streamed / truncated JSON is handled

- The LLM client (`agent_src/llm.py`) attempts streaming assembly of deltas when the provider supports it.
- If the stream shows a mid-stream provider error or is cut off, the client appends a sentinel like `<<STREAM_ERROR: ...>>` and returns the assembled partial text.
- `Agent._run_loop` uses `parse_json_response()` to extract JSON. That function:
  - Strips fenced code blocks (```...```) optionally labeled `json`.
  - Tries `json.loads()` on the cleaned text.
  - If that fails, heuristically extracts the first balanced `{...}` substring and tries again.
  - If parsing still fails, the agent runs regex-based fallbacks to at least determine the `tool` field so execution may continue.

This combination enables the agent to continue working even when the provider returns wrapped, partial, or slightly malformed JSON.

## Pending tokens and resume state

- When a tool signals it needs a user answer, the agent stores a snapshot in `agent_src/ui_state.PENDING[token]` containing:
  - `messages` (deep-copied conversation history up to this point)
  - `step` (the agent loop step number)
  - `tool_name`, `tool_args`
  - `model`, `max_steps`
  - `prompt` (the question presented to the user)
  - `created_at` (ISO8601 UTC timestamp)
- `Agent.resume(token, answer)` appends the user's answer into `messages` and continues the run loop from the next step.
- Current implementation: in-memory only. Consider persisting to JSON or Redis for resilience.

## Logs and where to look

- `web.log` — main server log (configured in `web.py` via `logging.basicConfig`). Contains endpoint events, created/updated/deleted skeletons, saved pending tokens, and server exceptions.
- Agent module logs — use `logging.getLogger(__name__)` inside `agent_src/*`; messages are visible in `web.log` and console.
- Optional raw stream log — set environment `OPENROUTER_STREAM_LOG=/path/to/stream.log` to capture raw streaming lines from the provider; this is critical for debugging truncated JSON.

Key log messages to search:

- `Saved pending token <token> for tool <tool> prompt=<...>` — indicates the agent paused and saved the resume snapshot.
- `Step <n> – Model reply: <...>` — the raw assembled assistant reply for each loop iteration.
- `Failed to parse JSON from model:` — agent couldn't parse; check the raw reply in logs or raw stream.
- `<<STREAM_ERROR:` marker inside raw stream log — provider stream ended abnormally.

## Useful shell commands (copy/paste)

- Tail server log:
```sh
tail -f web.log
```
- Search for saved pending tokens:
```sh
grep -n "Saved pending token" web.log | tail -n 50
```
- Search for parse failures:
```sh
grep -n "Failed to parse JSON" web.log | tail -n 50
```
- Inspect raw OpenRouter stream (if enabled):
```sh
tail -n 200 $OPENROUTER_STREAM_LOG
grep -n "<<STREAM_ERROR" $OPENROUTER_STREAM_LOG || true
```
- Dump in-memory pending keys from the repo (quick):
```sh
python - <<'PY'
import json
from agent_src import ui_state
print(json.dumps(list(ui_state.PENDING.keys()), indent=2))
PY
```

## Operational recommendations

- Enable `OPENROUTER_STREAM_LOG` when diagnosing truncation or mid-stream errors.
- Add persistence for `ui_state.PENDING` (simple JSON or Redis) to survive server restarts.
- Rotate `web.log` with `RotatingFileHandler` or ship structured logs to a log store (ELK/CloudWatch).
- Protect `/api/respond/<token>` behind `WEB_API_KEY` or other auth in production.

## Next steps and extensions

- Persist `PENDING` to disk or Redis with TTL and recovery on startup.
- Add a secure debug endpoint (authenticated) to list pending tokens and metadata for operators.
- Add structured logging (JSON) and add request IDs for tracing across SPA → server → LLM streams.
- Add unit tests for `parse_json_response()` covering fenced blocks, truncated JSON, and balanced-object extraction.

---

File maintained in the repository for developer reference.
