Layered technical illustration of agent infrastructure beyond the protocol layer, including orchestration, policy, durability, observability, and operator controls

What Comes After MCP: The Next Layer of Agent Infrastructure

The live demo repo for this series is 67ailab/harness-engineering. For this final post, I did not change the repo before publishing; the codebase discussed here is the current public state at commit 7d01dae, the same commit introduced in the previous post when the repo gained a real blueprint export. That matters because this article is not about an imaginary next step. It is about what the current repo already makes obvious once you stop looking at MCP as the finish line. ...

May 13, 2026 · 67 AI Lab
Layered technical diagram of an agent harness with CLI, runner, policy, tools, tracing, memory, workflow, approval gate, and persisted artifacts

A Reference Blueprint for a Production Agent Harness

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new repo commit is 7d01dae, which adds a real blueprint export to the demo so the architecture in this article is not just a hand-drawn diagram in prose. You can now run: PYTHONPATH=src python3 -m harness_engineering.cli blueprint --pretty PYTHONPATH=src python3 -m harness_engineering.cli blueprint --format markdown PYTHONPATH=src python3 -m harness_engineering.cli blueprint --format mermaid That feature lives mainly in: ...

May 12, 2026 · 67 AI Lab
Technical dashboard showing token streams, latency bars, throughput gauges, and an approval-gated agent workflow

Cost, Latency, and Throughput Engineering for Agents

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new repo commit is b9a60e8, which adds per-step timing metadata, lightweight workload and token estimates, and performance/cost rollups to the harness traces and summaries. That change lives mainly in: src/harness_engineering/models.py src/harness_engineering/runner.py src/harness_engineering/tracing.py src/harness_engineering/store.py tests/test_harness.py README.md The core additions are: new timing and metrics fields on StepResult in models.py wall-clock measurement inside RetryPolicy.call() in runner.py step-level workload estimation in HarnessRunner._estimate_step_metrics() aggregated performance and cost rollups in build_trace_summary() in tracing.py operator-facing rollups in RunStore.build_summary() in store.py This is the right place for Post 12 to land, because cost and latency problems in agent systems almost never come from one bad prompt. They come from system shape: ...

May 11, 2026 · 67 AI Lab
Technical illustration of an agent runtime protected by a glowing policy boundary with config and key symbols outside the boundary

Security, Auth, and Policy in Agent Harnesses

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new repo commit is 3f2ec5d, which adds a checked-in baseline policy file at policy/default.json and tightens PolicyEngine so relative policy paths resolve from the policy file location rather than from the caller’s current working directory. That sounds like a small change. It is small in lines of code. It is not small in meaning. ...

May 10, 2026 · 67 AI Lab
Technical illustration of planner, executor, and reviewer components connected by explicit handoffs and an approval gate before a final file write

Multi-Agent Systems Without the Theater

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit dadf203, which adds a small but real multi-agent mode to the demo: the harness can now run with explicit planner, executor, and reviewer roles, persist role activity, record handoffs, and expose those artifacts through the CLI and saved run files. The core changes are in: ...

May 9, 2026 · 67 AI Lab
Technical illustration of an agent workflow passing through a policy gate before a filesystem write inside an allowed directory boundary

Sandboxing, Isolation, and Safe Execution

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit 98c6302, which adds an explicit policy layer to the harness: tools now carry action categories, risky writes are checked against allowed output roots before execution, and policy decisions are persisted in traces and summaries. The key code changes are in: src/harness_engineering/policy.py src/harness_engineering/tools.py src/harness_engineering/runner.py src/harness_engineering/cli.py src/harness_engineering/mcp.py src/harness_engineering/tracing.py src/harness_engineering/store.py src/harness_engineering/workflow.py sample_data/policy/restrictive.json That matters because “sandboxing” gets used too loosely in agent conversations. Sometimes people mean a real OS sandbox. Sometimes they mean a container. Sometimes they mean “the model only has a few tools.” Those are not the same thing. ...

May 8, 2026 · 67 AI Lab
Technical illustration of an agent workflow feeding event traces into a compact observability panel and evaluation checklist

Tracing, Observability, and Evals for Agent Systems

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit 85c762c, which adds two concrete things the repo was missing: a persisted trace-summary surface for every run a lightweight eval runner with trace-aware fixtures The key changes are in src/harness_engineering/tracing.py, src/harness_engineering/store.py, src/harness_engineering/cli.py, and the new src/harness_engineering/evals.py module, plus starter fixtures in sample_data/evals/basic.json. That matters because a lot of agent writing still treats observability as an afterthought and evals as a benchmark spreadsheet. In practice, most production pain shows up somewhere else: ...

May 7, 2026 · 67 AI Lab
Technical illustration of an agent workflow paused at an approval gate while a human reviewer decides whether to continue

Human-in-the-Loop Done Properly

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit 352fba2, which adds a first-class pending-approval inspection surface to the existing approval-gated harness. The key changes are in src/harness_engineering/runner.py, src/harness_engineering/cli.py, and src/harness_engineering/store.py. That matters because most writing about “human in the loop” in agent systems is still weirdly sloppy. A model says “should I proceed?”, a human types “yes”, and the demo declares the governance problem solved. It is not solved. In production, approval is not a vibe, not a chat convention, and not a magical hidden boolean inside the runtime. It is a workflow boundary with state, context, inspection, and recovery semantics. ...

May 6, 2026 · 67 AI Lab
Multimodal radiotherapy contouring with CT, PET, clinical text, and AI fusion

LLM and VLM for Radiotherapy Contouring: State of the Art, Gaps, and Opportunities

Radiotherapy contouring is entering a new phase. For years, progress was driven mainly by image segmentation: better backbones, larger datasets, and stronger 3D architectures improved the automatic outlining of visible anatomy. That approach remains highly effective for organs-at-risk (OARs), where the task is largely to identify and delineate structures that can be seen directly on imaging. Target contouring is different. Gross tumor volume (GTV), clinical target volume (CTV), nodal target volumes, and postoperative beds are not defined by pixels alone. They are shaped by disease extent, stage, pathology, surgical status, laterality, risk patterns of spread, institutional practice, and protocol logic. In real clinical workflow, radiation oncologists do not contour from images alone; they contour from images interpreted in context. ...

May 5, 2026 · 67 AI Lab
Layered agent memory diagram showing working context, session state, and retrieval memory around a checkpointed workflow

Memory Architecture for Agents: Context, Sessions, and State

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit d20e352, which adds an explicit memory-layer model to the demo instead of treating every stored value as one blurry thing called “memory.” The core addition is src/harness_engineering/memory.py, plus wiring in src/harness_engineering/store.py and src/harness_engineering/cli.py so every run now emits a memory.json snapshot and the CLI exposes a memory command. ...

May 5, 2026 · 67 AI Lab