Published on

The Algorithm That Powers Google Docs Also Powers AI Training

Authors
  • avatar
    Name
    Nitin Wadhawan

Last week, I watched 47 people simultaneously edit a Google Doc during an all-hands meeting. No conflicts. No "document locked" messages. Everyone's cursor dancing across the screen in real-time.

That same evening, I read about NVIDIA's latest distributed training framework synchronizing gradients across 4,096 GPUs.

And it got me thinking: These systems share the same fundamental goal: convergence under concurrency.

Not the same algorithm. But the same challenge. And understanding one illuminates the other.


The Hidden Connection

When you type in Google Docs, you're not just editing text. You're participating in one of the most sophisticated distributed systems ever built. One that processes 2 million operations per second while maintaining consistency across millions of users worldwide.

The algorithm that makes this possible? Operational Transformation (OT).

Studying OT deeply made me realize: the patterns of thinking that power real-time document collaboration translate directly to distributed AI training, real-time video generation, and large-scale inference systems.

Let me show you what I mean.


🔄 Pattern 1: Convergence Under Concurrency (Same Goal, Different Mechanisms)

The Google Docs Problem

When Alice inserts "Hello" at position 5 and Bob simultaneously deletes characters at position 3, their operations conflict. Without intervention, you get corrupted text.

Google's solution: Transform Bob's operation based on Alice's. If Alice inserted 5 characters before Bob's target position, shift Bob's position by 5. The operations are non-commutative. Order matters. So OT must transform them to preserve user intent.

The Distributed Training Parallel

In distributed training across multiple GPUs, each GPU computes gradients independently. These gradients must be synchronized to update the model consistently.

Here's the key difference: gradient addition is commutative. GPU1_grad + GPU2_grad equals GPU2_grad + GPU1_grad. No transformation needed.

So NCCL doesn't "transform" gradients the way OT transforms text operations. Instead, it solves a different problem: how do you efficiently aggregate values across thousands of nodes with minimal bandwidth and latency?

The Shared Insight

Both systems must achieve convergence despite massive parallelism. OT solves semantic concurrency (preserving intent). NCCL solves numerical aggregation (optimizing throughput). Same goal, different mechanisms.

Why this matters: Understanding why the mechanisms differ (commutativity) is more valuable than forcing a false equivalence. It's the kind of nuance that separates senior engineers from principal engineers.


⚖️ Pattern 2: The Consistency Spectrum

The Google Docs Trade-off

Google chose eventual consistency for Docs. When you type, you see your keystroke immediately (optimistic update), even though it takes 50-200ms to sync with the server. Users don't notice the lag. But they absolutely notice a "document locked" modal.

This trade-off (user experience over strict consistency) drives the entire architecture.

The Distributed Training Parallel

In distributed training, you face a similar spectrum:

ApproachConsistencyTrade-off
Synchronous SGDStrongEvery GPU waits for all others. Consistent but slow. One straggler blocks everyone.
Asynchronous SGDEventualGPUs proceed independently. Faster but may have "stale gradients" that hurt convergence.

Modern frameworks like PyTorch FSDP and DeepSpeed use bounded staleness: a middle ground where GPUs can be slightly out of sync (eventual) but within defined bounds (not too eventual).

The Shared Insight

The consistency spectrum isn't binary. The best systems find the sweet spot where the end result (user experience or model quality) doesn't suffer, but you gain massive performance benefits.

Google Docs taught me to always ask: "What's the actual consistency requirement here?" Often it's less strict than we assume.


📜 Pattern 3: Event Sourcing for Time Travel

The Google Docs Approach

Google Docs doesn't store your document as text. It stores a log of every operation ever performed:

Operation 1: INSERT("H", position=0)
Operation 2: INSERT("e", position=1)
Operation 3: INSERT("l", position=2)
...

This "event sourcing" pattern enables:

  • Version history (replay to any point)
  • Undo/redo (reverse operations)
  • Efficient sync (send only new operations)

The AI Infrastructure Parallel

This is the same pattern behind modern ML experiment tracking:

  • Training checkpoints are snapshots (like Google Docs' periodic snapshots)
  • Training logs are event streams (loss, gradients, hyperparameters)
  • Experiment reproducibility requires replaying operations from a known state

Tools like MLflow, Weights & Biases, and Git-based ML versioning (DVC) are all implementations of event sourcing. The same pattern Google perfected for Docs in 2006.

The Shared Insight

Any system that needs history, reproducibility, or undo is secretly an append-only log with snapshots. Once you see this pattern, you see it everywhere.


🔌 Pattern 4: The Stateful Connection Problem

The Google Docs Challenge

Real-time collaboration requires persistent WebSocket connections. At peak, that's 10 million concurrent connections. Each connection is stateful (knows which document, which user, which cursor position).

Scaling stateful connections is notoriously hard. Google uses:

  • Sticky sessions (all users of Document X → same server)
  • Document-level sharding (different documents → different servers)
  • Graceful degradation (WebSocket fails → fall back to polling)

The AI Infrastructure Parallel

Real-time AI inference has the same structural challenge:

  • Streaming LLM responses (like ChatGPT's typing effect) require persistent connections
  • Each connection has state (conversation history, user context, KV cache)
  • You can't just round-robin requests. Context must stay with the connection.

The solutions rhyme: sticky sessions, connection pooling, stateful inference servers. Different domain, similar architectural patterns.

The Shared Insight

The hardest part of real-time systems isn't the algorithm. It's maintaining millions of stateful connections without losing context. This is true whether you're building collaborative documents or streaming AI inference.


⚡ Pattern 5: Optimistic Updates for Perceived Performance

The Google Docs UX

When you type, the character appears instantly. The system assumes your operation will succeed and shows the result immediately. If something fails, it rolls back.

This "optimistic UI" is why Docs feels instant despite 200ms network latency.

The AI Infrastructure Parallel

Real-time AI video generation (like Runway, Pika, or NVIDIA's research) uses a similar principle:

  • Show a lower-quality preview immediately (optimistic)
  • Stream higher-quality frames as they render
  • User perceives instant response even though full generation takes seconds

Speculative decoding in LLMs is another example: predict multiple tokens optimistically, verify them in parallel, keep the correct ones. The principle: assume success, verify later, roll back if wrong.

The Shared Insight

Perceived latency matters more than actual latency. Design for human perception, not for technical correctness. This lesson from Google Docs applies directly to AI product design.


💡 Why This Matters

I'm not claiming Google Docs and NVIDIA NCCL use the same algorithm. They don't. OT transforms non-commutative operations; AllReduce aggregates commutative values.

But studying one deeply taught me to ask better questions about the other:

  • What's the actual consistency requirement?
  • Can we use optimistic approaches?
  • Where does state live, and how do we scale it?
  • What's the right trade-off on the consistency spectrum?

The Meta-Lesson

The best infrastructure engineers don't just solve problems. They recognize patterns of thinking across domains. Not false equivalences, but genuine structural similarities that illuminate design choices.

Google Docs taught me how to think about convergence, consistency, and real-time state management. Those mental models transfer, even when the specific mechanisms don't.


What patterns of thinking have you seen transfer across domains? I'd love to hear examples in the comments.