The Problem of Reference
In a static document, addressing is trivial: "The 5th paragraph." In a dynamic editor, especially with real-time collaboration, indices are meaningless.
If User A deletes Paragraph 1, then "Paragraph 5" instantly becomes "Paragraph 4" for User A, but remains "Paragraph 5" for User B until the network packet arrives. This Index Divergence is the root cause of 90% of collaboration bugs.
Mutable References (Paths)
- Address:
[0, 4, 1] - Pro: ZERO memory overhead.
- Con: Invalidated by any precedent sibling mutation.
- Use Case: Local rendering, simple scripts.
Immutable References (IDs)
-
Address:
"b9f2-k8l1" - Pro: Mathematically stable forever.
- Con: High memory cost (Reference storage).
- Use Case: Collaboration, React Keys, Comments.
1. Path Addressing (Mutable)
Path addressing treats the document as a Coordinate System. A node is defined by its position in the tree hierarchy.
// A "Path" is a trail of array indices
type Path = number[];
// Example: Root -> 1st Child -> 4th Child
const target = [0, 3];The Volatility Problem
Scenario: You are tracking
a comment on Paragraph 10 path: [10].
Event: User deletes Paragraph 0.
All paths > [0] must be decremented.
Your comment now points to [10], which is actually the OLD Paragraph
11.
To make Paths work in a real editor, you need Operational Transformation (OT). You must mathematically "Transform" every single stored path against every incoming operation. This is computationally expensive (O(History_Length)) and prone to subtle bugs.
2. ID Addressing (Immutable)
ID addressing decouples Identity from Position. A node is defined by "Who it is", not "Where it is".
UUID Collision Mathematics
Developers often fear "What if two IDs collide?". Let's look at the math for UUIDv4 (122 random bits).
| ID Type | Bits | Collision Risk (at 1B nodes) |
|---|---|---|
| Incrementing Int | 32/64 | 100% (in distributed systems) |
| ShortID (8 chars) | 48 | Rare (0.01%) |
| UUIDv4 | 122 | Negligible ($$10^-18$$) |
Never use Auto-Increment Integers. If User A creates "Node 5" offlin and User B creates "Node 5" offline, you have an unresolvable ID conflict upon merge. Distributed systems require a standard random distribution.
Performance Analysis (V8)
While IDs are mathematically superior, they incur real machine costs.
String Interning & Memory
In V8 (Chrome/Node), strings are immutable. Heavy usage of string IDs can put pressure on the heap.
- Object Shape Overhead: An object property key `node["id_123"]` is often "Interned" (cached globally). If you have 100,000 unique IDs, you have 100,000 entries in the V8 String Table.
- Pointer Size: A sequential integer fits in a "Smi" (Small Integer, stored directly in the pointer). A string ID requires a pointer to a Heap Object (64-bit pointer + Object Header + String Data).
// MEMORY COST COMPARISON
// Integer ID: 0 bytes overhead (Stored in pointer)
const id = 1275;
// String ID: ~40-60 bytes overhead
// (Pointer + Map Reference + Hash + Char Data)
const id = "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d";*Mitigation:* Modern editors often use NanoID (URL-friendly, shorter) or Base64 encoding to reduce the byte storage size compared to raw hex UUIDs.
Implementation Patterns
1. The Reverse Lookup Map
Clicking a DOM node and finding the corresponding Model Node is a critical "Hot Path" (runs on every click/hover). Naive search is O(N). We need O(1).
class Editor {
// Using a WeakMap prevents memory leaks!
// If the DOM node is removed, this entry is GC'd.
private domToModel = new WeakMap<HTMLElement, NodeID>();
registerNode(domNode: HTMLElement, modelId: NodeID) {
this.domToModel.set(domNode, modelId);
// Also handy:
domNode.dataset.id = modelId;
}
getIdFromDom(domNode: HTMLElement): NodeID | null {
// Allows O(1) lookup from Event.target
return this.domToModel.get(domNode);
}
}2. Hybrid Addressing (Standard)
Most professional editors uses Dual Addressing.
- Storage uses IDs: The document model (JSON/CRDT) stores only IDs. This ensures sync safety.
- Selection uses Paths: The caret is defined as `(NodeID, Offset)` OR `(Path, Offset)`. Selection is ephemeral and local, so using Paths (which are faster to compute during typing) is acceptable, provided they are re-validated against IDs before saving.
3. ID Generation Algorithms
Not all IDs are created equal. The choice of algorithm has profound implications for storage size, database performance, and network bandwidth.
Compact NanoID
NanoID uses a larger alphabet (A-Z, a-z, 0-9, -, _) to pack more entropy into fewer characters. This is the recommended choice for Text Editors.
UUID (36 chars)
f47ac10b-58cc-4372-a567-0e02b2c3d479 NanoID (21 chars)
V1StGXR8_Z5jdHi6B-myT 40% Smaller Payload. For a document with 1000 nodes, this saves ~15KB of JSON compared to UUIDs. Over a websocket, this adds up.
Time-Sortable ULID & CUID
Universally Unique Lexicographically Sortable Identifier. These IDs start with a Timestamp and end with Randomness.
// ULID Structure
"01ARZ3NDEKTSV4RRFFQ69G5FAV"
// [ Timestamp ] [ Randomness ]
// Benefit: Newer nodes always sort AFTER older nodes
const sorted = nodes.sort((a, b) => a.id.localeCompare(b.id));
// ^ This gives you creation order for FREE!When to use: If you store Document versions or snapshots in a database, use ULIDs for the Snapshot ID. But for internal node IDs within the document, plain NanoID is sufficient since order is determined by the document structure.
4. Database Engineering Impact
If you persist documents to a database, your ID choice can make or break query performance.
B-Tree Index Fragmentation
Relational Databases (PostgreSQL, MySQL) use B-Tree indexes for Primary Keys. B-Trees are optimized for Sequential Insertion.
Sequential IDs (1, 2, 3...)
- New rows append to the end of the index.
- Pages stay full and compact.
- Buffer cache hit ratio: High.
Random UUIDs
- New rows insert at random positions.
- Constant Page Splits.
- Buffer cache hit ratio: Low.
- Write Amplification: 2-5x.
Solution: Use ULID or CUID for
database Primary Keys (they are time-sortable),
or use a separate auto-increment internal_id for the PK and store the random ID as a secondary
indexed column.
5. Frontend Architecture
On the frontend, stable IDs are essential for performance and UX.
React Reconciliation & Key Prop
React uses the key prop to identify
which virtual DOM nodes correspond to which real
DOM nodes.
Bad: key={index}
If you insert a paragraph at the
top, every key shifts. React
destroys and re-creates all components.
Focus is lost. IME state is
reset.
Good: key={node.id}
The node's key remains stable. React sees the new ordering and moves the existing DOM nodes. Focus and IME are preserved.
Virtualization & Scroll Anchoring
For documents with 1000+ nodes, you must use Virtualization (only rendering visible nodes). Libraries like
react-window or tanstack-virtual rely on stable IDs.
// Virtualization with stable IDs
const sizeCache = new Map<NodeID, number>();
function onNodeResize(id: NodeID, height: number) {
sizeCache.set(id, height);
}
// When the user scrolls back up, we know the EXACT height
// of off-screen nodes. No scroll jumping!
function getItemSize(id: NodeID): number {
return sizeCache.get(id) ?? ESTIMATED_HEIGHT;
}Without stable IDs, if content above the viewport changes, the scroll position jumps erratically because heights cannot be cached.
6. Security Implications
IDs are often exposed in URLs. Your choice of ID format has security implications.
| URL Pattern | Security Risk |
|---|---|
| /doc/1055 | Enumeration Attack. Attacker iterates through /doc/1056, /doc/1057... and scrapes your entire database. |
| /doc/b9f2... | Secure. The search space is too large to brute-force. Cannot guess adjacent document IDs. |
| ULID timestamps | Information Leakage. ULIDs reveal creation time. May be acceptable or not depending on your use case. |
Best Practice: Use random IDs
(NanoID/UUID) for public-facing URLs. If you need
time-sorting internally, use ULID for database
PKs but expose a random public_id in the API.
7. Real-world Editor Analysis
How do popular open-source editors handle node identity?
ProseMirror
Uses Path-based addressing with Mapping for OT. Nodes do not have inherent IDs. This makes it simpler but requires careful transform logic for collaboration plugins.
Slate.js
Uses Path-based addressing. Nodes can optionally have IDs, but
it's up to the developer to add them.
Most real-world Slate apps add id to every Element.
Lexical (Meta)
Uses ID-based addressing with internal __key property
on every node. This is used for React reconciliation
and node lookup. The key is a simple incrementing
integer per editor instance (safe because
it's local, not synced).
Notion / BlockSuite
Uses CRDT-based ID addressing. Every block has a stable ID that is used for real-time sync. The underlying CRDT (Yjs) manages conflict resolution using Lamport Timestamps embedded in the ID.