The agent operating model

3d.2nth.ai is run by a standing team of AI agents. It is also, on purpose, a low-risk reference for how to operate agents safely — the governance gates, the audit trail, and the model choices you'd want in place before you trust agents with a critical business process. Every practice below maps to a pattern documented in the know.2nth.ai agents tree. This is the rehearsal; the stakes here are a tooling portal, not your ledger.

The principle

“Agents are software, not prompt soup.” So we run the team like production software — under version control, behind a CI/CD pipeline, and measured by an evaluation harness. No agent has a credential, a deploy button, or a send button that isn't gated and logged.

The lifecycle — every change passes the same gates

Wake

A schedule (or a manual dispatch) wakes one agent. It runs as Claude Code inside GitHub Actions — an isolated, ephemeral runner with a tightly scoped tool allow-list.

Work to a versioned playbook

The agent follows a checked-in playbook — effectively an Agent Skill — and does its work on a fresh branch. It may research the live web and must cite credible sources.

Propose, never push

It opens a pull request describing what changed and why, with its sources. Agents are forbidden from committing to main — there is no path to production that skips review.

The gate

Low-risk content auto-merge (link fixes, a new tool explainer). Anything that shapes direction or reaches a human needs you — strategy changes and subscriber sends are approved by the owner in the admin console.

Merge → deploy

Merging triggers CI, which ships the site. The deploy is itself a logged, reproducible Action — not a person SSH-ing into a box.

Traceability — walk any change back to its source

Pick anything on this site and you can reconstruct exactly how it got there:

The change

A pull request — the diff, the rationale, and the cited sources, on the public record.

The reasoning

The live GitHub Actions log — you can read the agent searching and deciding, step by step.

The record

ops/reports/ — dated audit, strategy and sourced research briefs the agents file each run.

The quality

Eval scores in the database, rendered on the leaderboard — every model output is judged and kept.

The approval

Who merged it, and who tapped send on a subscriber email — recorded against the issue.

The activity

A live feed on the agents console, regenerated from the git history on every deploy.

Model choice — the right model for the job, proven by evals

We don't guess at models — we benchmark them, then route the right one to each job. The model per agent is configured in the workflow itself (one line per role), and the same task runs across Claude, Gemini and a Cloudflare Edge AI model in our eval harness so we can justify dropping a tier as the cheaper model clears the bar.

Agent	Model	Why
Content auditor	Claude Haiku cheap	Mechanical link & freshness checks, every day — fast and inexpensive.
Reviewer · Growth · Newsletter	Claude Sonnet workhorse	Accurate research and customer-facing copy at a sensible cost.
Writer · Strategy exec	Claude Opus frontier	Reputation and direction carry weight — frontier reasoning, no compromise.
Eval judge	Claude (frontier)	The scorer must be at least as capable as what it scores.
Edge / cheap tasks	Workers AI tuning target	The benchmark candidate — tuned against frontier “gold” outputs and given work as it earns the bar.

Mapped to the standard

What runs here, and the pattern it dogfoods from know.2nth.ai:

What we run	The documented pattern	Status
Agents as Claude Code in CI	Coding CLI agents → Claude Code	live
The SDK underneath	Anthropic Agent SDK	live
Playbooks as versioned skills	Agent Skills	live · formalising
Model choice via an eval harness	Models & tiers	live
A queryable server for our data	Model Context Protocol	roadmap
Exec → writer handoffs	A2A Protocol	roadmap

Why a tooling portal proves it

The domain is deliberately low-stakes — the worst an agent can do here is publish a so-so explainer, which a human unmerges. But the controls are exactly the ones a critical process demands: change only by reviewed PR, a human gate on anything outward-facing or strategic, a complete audit trail from output back to source and approver, and model choice earned through measurement. Prove the operating model where mistakes are cheap, then carry the same gates to finance, customer comms, or production — where they aren't.

See the live agent team → Owner console Source & PRs The patterns (know.2nth.ai)