Backups That Don't Share Fate

I was about to start patching my Kubernetes cluster — kernel updates, node reboots, the works — when I hit a wall of my own making. Draining a node force-kills whatever’s running on it, and two of my nodes run a PostgreSQL database with no replica and no backup. Reboot that node and a write lands mid-flight, and there’s nothing to recover from.

So before any patching, I had to solve the backup problem. And the moment I started thinking about it seriously, one offhand question from me reshaped the entire design: “doesn’t the NAS become a single point of failure?”

It does. And answering that properly is most of this post.

Replicas are not backups

First, a trap I want to name because it’s the one that gets people. My databases sit on Longhorn, which keeps three replicas of every volume across three nodes. It’s tempting to look at that and think “I’m covered — three copies!”

You’re not. Replicas protect against a node dying. They’re live mirrors: if your database writes corruption, or a bad migration drops a table, or an app upgrade mangles a schema, all three replicas faithfully copy the damage in real time. Three perfect copies of broken data.

Replicas are availability. Backups are recovery. They protect against completely different disasters, and you need both. The good news was my replicas were already there — what was missing was the entire recovery half.

3-2-1, and the word that matters

The standard backup rule is 3-2-1: three copies, on two types of media, with one off-site. Easy to recite. The interesting part is how the copies get made, and that’s where my NAS question bit.

The obvious design is a chain:

app  ──►  NAS  ──►  cloud

Each app dumps to the NAS, and then one process syncs the NAS to the cloud. Clean, simple — and it stacks two single points of failure in series. If the NAS is down, nothing reaches the cloud. If that one sync job breaks, every app’s off-site copy silently stops. One failure, total loss of off-site coverage.

What I actually wanted was fan-out:

        ┌──►  NAS (fast, local, free)
app  ───┤
        └──►  cloud (off-site, survives the NAS)

Each app writes to both destinations, independently. The NAS dying doesn’t stop the cloud copy. One app’s backup failing doesn’t touch any other app’s. There’s no shared chokepoint and no shared fate. That’s the whole idea in three words: don’t share fate.

It’s more moving parts than a chain. But “more overhead” buys you the property that actually matters when things go wrong: failures stay isolated.

A key per app

If every app pushes its own copy to the cloud independently, each one needs cloud credentials. The lazy version is one master key shared everywhere — which means a leak anywhere exposes everything.

The disciplined version is a scoped key per app. My cloud storage (Backblaze B2) lets you mint application keys restricted to a single bucket and even a single path prefix, with only the permissions you choose. So each app gets a key that can read and write only its own folder and is blind to everything else.

I like verifying claims like this rather than trusting them, so I tested it: authenticated as the database app’s key and tried to write into the photo app’s folder.

An error occurred (AccessDenied) when calling the PutObject operation: not entitled

Exactly what I wanted to see. A leaked database key cannot touch the photo backups, cannot even list them. The blast radius of any single compromised credential is one app’s backups. Decoupling isn’t just about surviving failures — it’s about containing them too.

Two layers, two different jobs

Backups aren’t one thing. I ended up with two layers that do genuinely different work:

Layer 1 — logical dumps. A scheduled job runs the database’s own export tool and writes a clean, application-consistent snapshot — fanned out to NAS and cloud. This is the real safety net. It’s granular (restore one database, one table, even one row), and it’s guaranteed-clean because the database itself produced it. This is what you reach for after a bad migration or a dropped table.

Layer 2 — volume snapshots. Longhorn takes fast, local, point-in-time snapshots of the disk. These are crash-consistent (like snapshotting a running machine) and live on the cluster. They’re the “oops, undo the last hour” button — seconds to revert, right after a bad change, before you even need the dumps.

Here’s the subtle decision that ties back to the no-shared-fate principle: my volume layer is deliberately local-only. Longhorn can ship snapshots off-site, but it does it through a single global backup target — one centralized chain for every volume. That’s exactly the shared-fate pattern I was trying to kill. So Longhorn stays the fast local convenience, and the off-site job belongs to the per-app dump threads, which are independent by design. The principle picked the architecture.

Put the whole thing on one picture and the shape is clear — every app is its own column, and no two columns share a path off the cluster:

                          kubecluster01
   ┌───────────────────────────────────────────────────────────┐
   │                                                             │
   │   shared PostgreSQL              Immich PostgreSQL          │
   │   (nextcloud, forgejo,           (photo metadata)           │
   │    paperless, gitlab)                                       │
   │        │                              │                     │
   │        │ Layer 2: Longhorn local snapshot (24h + 7d)        │
   │        │ ── crash-consistent, seconds to revert, ON-CLUSTER │
   │        ▼                              ▼                     │
   │   [3 replicas]                   [3 replicas]   availability│
   │        │                              │                     │
   │        │ Layer 1: hourly pg_dump (application-consistent)   │
   │        │                              │                     │
   │   ┌────┴─────┐                   ┌────┴─────┐  fan-out:     │
   │   ▼          ▼                   ▼          ▼   two indep.  │
   └───┼──────────┼───────────────────┼──────────┼──── threads ─┘
       │          │                   │          │
       ▼          ▼                   ▼          ▼
   ┌───────┐  ┌────────────┐      ┌───────┐  ┌────────────┐
   │ nas2  │  │  B2 cloud  │      │ nas2  │  │  B2 cloud  │
   │/Backup│  │ database/  │      │/Backup│  │  immich/   │
   │  /db  │  │ (scoped🔑) │      │ /imm  │  │ (scoped🔑) │
   └───────┘  └────────────┘      └───────┘  └────────────┘
     local      off-site            local      off-site

   each 🔑 can touch ONLY its own prefix — leaked db key
   gets "not entitled" on immich/, and vice-versa

The thing to read off that diagram: there is no single box that, if it dies, takes out more than one app’s off-site copy. nas2 dying loses zero off-site copies (B2 still has them). The database’s B2 key leaking exposes only the database folder. No shared fate, drawn out.

How fast, how much loss?

The two numbers that define a backup system are RPO (how much data you can lose) and RTO (how long recovery takes).

My databases are small — a couple hundred megabytes for the shared one, under a gig for the photo metadata. At that size you can be generous and it costs nothing: hourly backups (so at most an hour of loss) that restore in about two minutes. The local snapshots revert in seconds. For a homelab holding my photos, my git server, my documents, and my paperless archive, that’s genuinely strong — and the storage bill is rounding error.

The part everyone skips

Here’s the discipline that separates a backup from a hope: I restored one.

A backup you’ve never restored is a guess. So I pulled a dump from the cloud — not the local copy, the off-site one, using that app’s scoped key — restored it into a throwaway database, and compared the row count against the live one.

live forgejo user count: 4
restored forgejo user count: 4
DRILL PASS: counts match (4)

Four users in, four users out, recovered end-to-end from the off-site copy. Now it’s a backup. Until that moment it was just files with optimistic names.

What I deliberately didn’t do

The temptation with a system like this is to back up everything at once and call it a platform. I didn’t. I built it for exactly the two databases that were blocking my cluster patching, as the first instances of a reusable pattern — same image, same script shape, same credential model — that I can stamp out for the next app, and the next.

A pile of stuff is explicitly future work, written down so it doesn’t get lost: the rest of the app data that has no off-site copy yet, decoupling the big media libraries from the old NAS-based sweep, and — the one that decoupling demands — monitoring. Because here’s the catch with independent threads: they fail independently, which only helps if you notice. Decoupled execution, but centralized observability. A quiet failure in a thread nobody’s watching is just a slower way to lose data.

The meta moment

As usual: my AI assistant designed this, pushed back on my chain-versus-fan-out instinct until it was actually sound, minted and scoped the cloud keys, built and tested the container image, wrote the jobs, ran the restore drill, and committed the whole thing to git with the architecture documented. I steered; it built.

But the most valuable thing it did wasn’t writing YAML. It was taking my vague worry — “isn’t the NAS a single point of failure?” — and refusing to let it stay vague until it had become a concrete principle that chose the architecture. No shared fate. Everything else fell out of that.

Now I can go patch that cluster. Which was, you’ll recall, the entire point — about four hours and one backup system ago.