Durable Objects: A Coordination Primitive Built From Failure

141 reads.

I've already mentioned this on a podcast we did at Devtopia, but one thing I genuinely appreciate about Cloudflare is how they drive new infrastructure primitives instead of just reselling old ideas with a nicer API. Storage, networking, edge compute and how they keep pushing the boundary.

One of those inventions is Durable Objects.

Durable Objects are not perfect. In fact, they are very much a system-fail invention. But that's exactly what makes them interesting.

At a high level, a Durable Object is a special kind of Worker , still running inside a V8 isolate, not a VM , but with a strong opinion:

it behaves like a tiny server with memory + disk, living inside Cloudflare's edge.

And that distinction matters a lot.

Why serverless makes distributed state hard

Traditional serverless systems are stateless by design. V8 isolates are lightweight, fast, and disposable , but that also means:

isolates can be destroyed at any time
memory is ephemeral
no shared state between isolates
no guarantees about execution order

On top of that, Workers intentionally cannot access many low-level VM APIs. This is not accidental , it's because of sandboxing, channeling risks, and side-channel attacks. The isolation model is a core security feature.

All of this is great for scalability and safety, but it makes distributed coordination extremely hard.

If you try to do things like:

syncing shared state across isolates
coordinating writes
managing counters, locks, or sessions

you quickly fall into the classic distributed systems traps.

The wrong approach: syncing everywhere

One possible approach is to say:

"Let's just sync everything."

For example:

distribute a WAL across workers
let every isolate talk to SQLite
coordinate writes like a distributed DB

This is roughly the problem space D1 lives in.

But doing this does not guarantee race-condition safety unless you introduce:

leaders
locks
consensus
conflict resolution

And at that point, you're basically rebuilding a database , inside a serverless runtime , across multiple isolates.

That's expensive, complex, and fragile.

Durable Objects choose a different model: ownership, not sync

Durable Objects flip the model entirely.

Instead of saying "everyone syncs", they say:

"Choose an owner."

You bind a piece of shared state to one logical instance:

per organization
per chat room
per user
per shard

Everything related to that state is localized to that one Durable Object instance.

No traditional syncing.

No distributed WAL.

No multi-writer conflicts.

Requests are routed to the same instance, and that instance becomes the coordinator.

This is the key insight.

Durable Object stubs: how ownership actually works

Ownership only works if Cloudflare can reliably route every request to the correct instance.
That mechanism is the Durable Object stub.

A stub is a stable handle to a specific Durable Object instance, identified by its ID.

The stub is not the object itself. It does not contain state. It contains identity.

When a Worker sends a request through a stub, Cloudflare resolves where the object currently lives and routes the request there.

If the object is restarted or migrated, the stub remains valid.

This is how Cloudflare enforces ownership without exposing location, networking, or replication details.

You are not load-balancing across instances. You are always talking to the single instance that owns the state.

But that introduces a big trade-off: single-threaded execution

To make this model correct, Cloudflare enforces a strong rule:

A Durable Object processes one event at a time.

That means:

no parallel request execution
no concurrent mutation
everything is serialized

This is where input and output gatekeeping comes in.

Input & output gates: correctness over cleverness

Cloudflare introduced input gates and output gates to control side effects.

Input gate

Blocks new incoming events while async work is still in flight (storage operations)

Output gate

Prevents sending responses or external effects until all writes are safely committed.

This solves two hard problems at once:

race conditions
optimistic writes with incorrect acknowledgments

You don't get to tell the outside world "this worked" until the system knows it worked.

Where batching and O(1) complexity comes in

This is the clever part.

Originally, storage operations (get() / put()) were expensive:

every put() meant a network round trip
more writes = more latency
request time scaled with the number of writes → O(n)

Cloudflare changed this model.

Now:

put() writes to in-memory cache
it completes almost instantly
even if you await it

The real I/O happens at the output gate.

All writes during a request are:

coalesced
batched
flushed together in a constant number of network round trips

So instead of:

n writes → n network waits → O(n)

you get:

n writes → 1 commit → O(1)

This is huge.

It means:

input gates stay blocked for far less time
throughput improves
latency becomes predictable
developers don't have to manually batch writes

This is one of those infra decisions that looks small but fundamentally changes how you design apps.

Why this works (and why it's safe)

Even though `put() returns immediately, Cloudflare does not lie to the outside world.

Because of the output gate:

no response is sent
no side effects escape
until storage is actually committed

So correctness is preserved, while performance improves.

It's very similar to how databases use:

write-ahead logs
transactional commits
delayed fsyncs

Just adapted to a serverless edge runtime.

The downsides (and they matter)

This model isn't free.

Some real drawbacks:

single-threaded bottlenecks under high contention
long-running requests block everything behind them
batching can hide write cost until the end, causing latency spikes
debugging performance issues requires understanding gates
you might careful shard boundaries meaning Durable Object regions which is implicit by default.

Durable Objects reward good domain modeling , and punish lazy ones.

Final thought

Durable Objects are not magic.

They are not a general database.

They are not "serverless but with state".

They are a carefully constrained coordination primitive, built to work within the hard limits of V8 isolates and edge security.

And honestly?

That constraint-driven design is what makes them impressive.

If you treat them like a tiny server with ownership , not like a distributed cache, they shine.

If you fight the model, they will absolutely fight back.