The agent operating model
3d.2nth.ai is run by a standing team of AI agents. It is also, on purpose, a low-risk reference for how to operate agents safely — the governance gates, the audit trail, and the model choices you'd want in place before you trust agents with a critical business process. Every practice below maps to a pattern documented in the know.2nth.ai agents tree. This is the rehearsal; the stakes here are a tooling portal, not your ledger.
The principle
“Agents are software, not prompt soup.” So we run the team like production software — under version control, behind a CI/CD pipeline, and measured by an evaluation harness. No agent has a credential, a deploy button, or a send button that isn't gated and logged.
The lifecycle — every change passes the same gates
Wake
A schedule (or a manual dispatch) wakes one agent. It runs as Claude Code inside GitHub Actions — an isolated, ephemeral runner with a tightly scoped tool allow-list.
Work to a versioned playbook
The agent follows a checked-in playbook — effectively an Agent Skill — and does its work on a fresh branch. It may research the live web and must cite credible sources.
Propose, never push
It opens a pull request describing what changed and why, with its sources. Agents are forbidden from committing to main — there is no path to production that skips review.
The gate
Low-risk content auto-merge (link fixes, a new tool explainer). Anything that shapes direction or reaches a human needs you — strategy changes and subscriber sends are approved by the owner in the admin console.
Merge → deploy
Merging triggers CI, which ships the site. The deploy is itself a logged, reproducible Action — not a person SSH-ing into a box.
Traceability — walk any change back to its source
Pick anything on this site and you can reconstruct exactly how it got there:
A pull request — the diff, the rationale, and the cited sources, on the public record.
The live GitHub Actions log — you can read the agent searching and deciding, step by step.
ops/reports/ — dated audit, strategy and sourced research briefs the agents file each run.
Eval scores in the database, rendered on the leaderboard — every model output is judged and kept.
Who merged it, and who tapped send on a subscriber email — recorded against the issue.
A live feed on the agents console, regenerated from the git history on every deploy.
Model choice — the right model for the job, proven by evals
We don't guess at models — we benchmark them, then route the right one to each job. The model per agent is configured in the workflow itself (one line per role), and the same task runs across Claude, Gemini and a Cloudflare Edge AI model in our eval harness so we can justify dropping a tier as the cheaper model clears the bar.
| Agent | Model | Why |
|---|---|---|
| Content auditor | Claude Haiku cheap | Mechanical link & freshness checks, every day — fast and inexpensive. |
| Reviewer · Growth · Newsletter | Claude Sonnet workhorse | Accurate research and customer-facing copy at a sensible cost. |
| Writer · Strategy exec | Claude Opus frontier | Reputation and direction carry weight — frontier reasoning, no compromise. |
| Eval judge | Claude (frontier) | The scorer must be at least as capable as what it scores. |
| Edge / cheap tasks | Workers AI tuning target | The benchmark candidate — tuned against frontier “gold” outputs and given work as it earns the bar. |
Mapped to the standard
What runs here, and the pattern it dogfoods from know.2nth.ai:
| What we run | The documented pattern | Status |
|---|---|---|
| Agents as Claude Code in CI | Coding CLI agents → Claude Code | live |
| The SDK underneath | Anthropic Agent SDK | live |
| Playbooks as versioned skills | Agent Skills | live · formalising |
| Model choice via an eval harness | Models & tiers | live |
| A queryable server for our data | Model Context Protocol | roadmap |
| Exec → writer handoffs | A2A Protocol | roadmap |