The agent operating model

3d.2nth.ai is run by a standing team of AI agents. It is also, on purpose, a low-risk reference for how to operate agents safely — the governance gates, the audit trail, and the model choices you'd want in place before you trust agents with a critical business process. Every practice below maps to a pattern documented in the know.2nth.ai agents tree. This is the rehearsal; the stakes here are a tooling portal, not your ledger.

The principle

Agents are software, not prompt soup.” So we run the team like production software — under version control, behind a CI/CD pipeline, and measured by an evaluation harness. No agent has a credential, a deploy button, or a send button that isn't gated and logged.

The lifecycle — every change passes the same gates

1

Wake

A schedule (or a manual dispatch) wakes one agent. It runs as Claude Code inside GitHub Actions — an isolated, ephemeral runner with a tightly scoped tool allow-list.

2

Work to a versioned playbook

The agent follows a checked-in playbook — effectively an Agent Skill — and does its work on a fresh branch. It may research the live web and must cite credible sources.

3

Propose, never push

It opens a pull request describing what changed and why, with its sources. Agents are forbidden from committing to main — there is no path to production that skips review.

4

The gate

Low-risk content auto-merge (link fixes, a new tool explainer). Anything that shapes direction or reaches a human needs youstrategy changes and subscriber sends are approved by the owner in the admin console.

5

Merge → deploy

Merging triggers CI, which ships the site. The deploy is itself a logged, reproducible Action — not a person SSH-ing into a box.

Traceability — walk any change back to its source

Pick anything on this site and you can reconstruct exactly how it got there:

The change

A pull request — the diff, the rationale, and the cited sources, on the public record.

The reasoning

The live GitHub Actions log — you can read the agent searching and deciding, step by step.

The record

ops/reports/ — dated audit, strategy and sourced research briefs the agents file each run.

The quality

Eval scores in the database, rendered on the leaderboard — every model output is judged and kept.

The approval

Who merged it, and who tapped send on a subscriber email — recorded against the issue.

The activity

A live feed on the agents console, regenerated from the git history on every deploy.

Model choice — the right model for the job, proven by evals

We don't guess at models — we benchmark them, then route the right one to each job. The model per agent is configured in the workflow itself (one line per role), and the same task runs across Claude, Gemini and a Cloudflare Edge AI model in our eval harness so we can justify dropping a tier as the cheaper model clears the bar.

AgentModelWhy
Content auditorClaude Haiku cheapMechanical link & freshness checks, every day — fast and inexpensive.
Reviewer · Growth · NewsletterClaude Sonnet workhorseAccurate research and customer-facing copy at a sensible cost.
Writer · Strategy execClaude Opus frontierReputation and direction carry weight — frontier reasoning, no compromise.
Eval judgeClaude (frontier)The scorer must be at least as capable as what it scores.
Edge / cheap tasksWorkers AI tuning targetThe benchmark candidate — tuned against frontier “gold” outputs and given work as it earns the bar.

Mapped to the standard

What runs here, and the pattern it dogfoods from know.2nth.ai:

What we runThe documented patternStatus
Agents as Claude Code in CICoding CLI agents → Claude Codelive
The SDK underneathAnthropic Agent SDKlive
Playbooks as versioned skillsAgent Skillslive · formalising
Model choice via an eval harnessModels & tierslive
A queryable server for our dataModel Context Protocolroadmap
Exec → writer handoffsA2A Protocolroadmap

Why a tooling portal proves it

The domain is deliberately low-stakes — the worst an agent can do here is publish a so-so explainer, which a human unmerges. But the controls are exactly the ones a critical process demands: change only by reviewed PR, a human gate on anything outward-facing or strategic, a complete audit trail from output back to source and approver, and model choice earned through measurement. Prove the operating model where mistakes are cheap, then carry the same gates to finance, customer comms, or production — where they aren't.
See the live agent team → Owner console Source & PRs The patterns (know.2nth.ai)