Forge: Giving My Agent a Factory

A few weeks ago I wrote a post arguing that my AI agent should orchestrate work rather than do all of it itself — the agent gets the eyes and the brain, deterministic tooling keeps the hands. That post was about cluster updates. Tonight CC and I took the same idea somewhere bigger: if the agent is the orchestrator, what does it orchestrate with?

The answer we built is called Forge. It lets CC — my Claude Code agent — dispatch other Claude Code agents into my Kubernetes cluster, in containers, to build software while I sleep. This is the first real step toward something I’ve wanted for a while: my homelab as an autonomous build substrate. I generate and vet ideas; CC turns them into running code without needing me at the keyboard for every line.

We got the platform working end to end in a single evening. I want to be honest that “a single evening” is doing some heavy lifting — it was a single evening because of the model driving it, and I’ll come back to that.

The shape of the thing

The org chart I’m aiming for is simple. I’m the president: I set direction, vet ideas, approve what ships. CC is the executive: it takes an idea we’ve talked through, decomposes it, and dispatches workers to do the building. The workers are ephemeral Claude Code processes running headless (claude -p) inside containers on my production cluster — kubecluster01, seven nodes of mini PCs.

Here’s how a task flows through Forge:

A task becomes an issue. The work queue is just a private Git repo (forge-tasks) on my self-hosted Forgejo. Open an issue, label it forge:queued, put the task in the body. Directives like  or  tune the run.
An orchestrator picks it up. A small reconcile loop runs in-cluster every couple of minutes, finds queued issues, and dispatches a worker for each one as a Kubernetes Job.
A worker does the work. The container clones the target repo, runs Claude Code headlessly against the task prompt, commits its output to a branch, and reports back — cost, turn count, and a summary land as a comment on the original issue, which then closes itself.
I get a ping. Every run notifies my Telegram with what happened and what it cost.

The whole point is that step 2 onward needs no human. I can file a task from my phone and a worker materializes in my cluster, builds the thing, and tells me what it spent.

Why not just use the hosted thing?

Anthropic has a managed agents product — they run the agent loop and the sandbox for you. It’s genuinely nice. But it doesn’t run on Amazon Bedrock, and my entire stack authenticates to Claude through Bedrock in my own AWS account. So the hosted path was out from the start.

That turned out to be fine, because self-hosting on my own cluster is more in the spirit of the project anyway. The homelab is the point. If the goal is “my infrastructure builds things for me,” then the build agents should live on my infrastructure.

Guardrails, because this is a loaded gun

Handing autonomous agents a container on your production cluster with permissions disabled is exactly as alarming as it sounds, so most of the evening went into the boring, important part: blast radius.

A dedicated, least-privilege identity. Workers authenticate to Bedrock with a brand-new IAM user whose entire policy is bedrock:InvokeModel. A worker that goes off the rails can spend model tokens and nothing else — it can’t touch S3, IAM, billing, or any other corner of my AWS account.
A locked-down network. A Kubernetes NetworkPolicy lets workers reach DNS, my Git server, and the public internet — and nothing else. No cluster API, no other namespaces, no access to the rest of my LAN or my gateway. The container is the sandbox, which is what makes running the agent with permission prompts off a defensible choice rather than a reckless one.
Spend and concurrency caps, three ways. Each worker runs under a dollar budget (claude --max-budget-usd). A namespace quota caps how many can run at once. Every job has a hard two-hour wall-clock kill and never auto-retries — because an agent re-running a half-finished task is how you get duplicated side effects.

The principle from the cluster-updates post carried straight over: the agent owns judgment, but every dangerous capability sits behind an explicit, boring gate.

Diagram of a Forge worker's blast radius: the worker pod can reach Bedrock (InvokeModel only), the Git server, and the public internet, but is blocked from the Kubernetes API, other namespaces, the LAN, and the gateway — enforced by a NetworkPolicy egress allowlist, least-privilege IAM, and quota plus budget caps

The bug, because there’s always a bug

The first real task I gave it — building a small web app from a spec — failed immediately, and the failure was a good one. The worker couldn’t clone the repo.

The cause is a classic homelab gremlin: hairpin NAT. A pod inside my cluster tried to reach my Git server at its public-facing ingress IP, and the traffic died trying to loop back out and in again. Image pulls worked fine — those happen at a lower level on the node — but the pod’s own outbound clone timed out after wedging for two minutes. The fix is to point workers at Forgejo’s internal cluster address instead of its external one, and to widen the NetworkPolicy just enough to allow it. I caught it, the orchestrator cleanly held the task with a diagnostic note instead of stranding it, and that’s tomorrow’s first job.

I’m including this on purpose. The interesting thing about an autonomous build system isn’t the happy path — it’s whether it fails legibly. This one did: the worker reported exactly why it couldn’t proceed, the task got parked rather than lost, and I could read the whole story from a Telegram message and a Git comment.

A word about the engine

I’d be misrepresenting the evening if I said I designed all this and CC just typed it out. The reality is closer to a real collaboration, and that’s only possible because of what’s driving CC now: Claude Fable 5, Anthropic’s newest and most capable model.

The difference is in how long it can productively think. Designing Forge wasn’t a series of one-liners — it was holding a whole system in working memory at once: the Bedrock constraint, the IAM blast radius, the network policy, the queue semantics, the failure modes, how the orchestrator should self-heal. Earlier models could write any one piece on request. Fable 5 reasons across the whole thing, anticipates the failure modes before they happen, and sustains that across an entire build session without losing the plot. The self-healing logic in the orchestrator — re-queue a task whose worker vanished, garbage-collect finished jobs to free quota — wasn’t something I asked for line by line. It was the model thinking ahead about how this breaks at 3am when no one’s watching.

That’s the unlock that makes the whole “agent as executive” idea real rather than aspirational. You can’t delegate to something that needs its hand held every step. You can delegate to something that holds the goal, reasons about the edges, and tells you honestly when it hit a wall.

Where this is going

Tonight was the foundation. The roadmap from here:

Fix the clone path and let the first real product build run start to finish.
A review gate, so workers propose changes as pull requests and nothing merges to a main branch without my sign-off. I stay president.
Let my Telegram assistant file tasks, so “build me X” from my phone becomes a worker in my cluster without me opening a laptop.
Longer-horizon agents — not just one-shot build tasks, but agents that own a project across many sessions, keep their own notes, and pick up where they left off.

That last one is the real goal, and it’s the thing I’m most excited about. Not an agent that answers a question, but a fleet of them living on my own hardware, building the backlog of ideas I never have time for, with me steering and approving rather than typing. Tonight we built the factory floor. Now we find out what it can make.