Scenario

Content Normalization: Paste, Whitespace, and DOM Hygiene

Architecting a consistent document state by neutralizing browser inconsistencies in HTML insertion and character encoding.

architecture
Scenario ID
scenario-content-normalization

Details

Overview

Every browser inserts different HTML when a user pastes or hits “Enter.” A robust editor must normalize this “Browser Soup” into a predictable internal schema to prevent data corruption and layout breakage.

Critical Normalization Zones

1. Paste Filter & Cleansing

When pasting from external sources (Word, Excel, Web), browsers inject massive amounts of hidden meta-data and proprietary CSS inside <style> blocks. Strict sanitization is required to strip non-standard attributes.

2. Whitespace &   Management

Browsers follow HTML rules, which collapse consecutive spaces into one. To maintain visual fidelity, editors often use Non-breaking spaces (&nbsp;).

  • Contamination: &nbsp; blocks CSS line-wrapping, causing layout overflows. This is severe in plaintext-only mode.
  • Conversion: Chrome/Edge frequently convert non-breaking spaces back to regular spaces during editing, causing intended alignment to collapse.

3. Empty Node Pruning

Rapid editing often leaves empty <span>, <b>, or <div> tags in the DOM. These “Ghost Tags” don’t affect visuals but break selection logic and node-count based features.

Normalization Strategy

The Parser Pipeline

Interrupt the paste or beforeinput event and run the incoming HTML through a DOMParser. Apply a strict whitelist of tags and attributes before allowing the insertion.

Whitespace Preservation (CSS over entities)

Prefer white-space: pre-wrap for preserving layouts rather than relying on &nbsp; chains. If manual intervention is required, use a beforeinput handler to insert \u00A0 only when a trailing space is detected.

Scenario flow

Visual view of how this scenario connects to its concrete cases and environments. Nodes can be dragged and clicked.

React Flow mini map

Variants

Each row is a concrete case for this scenario, with a dedicated document and playground.

Case OS Device Browser Keyboard Status
ce-0102-consecutive-spaces-collapsed Windows 11 Desktop or Laptop Any Chrome 120.0 US draft
ce-0117-nbsp-converted-to-space Windows 11 Desktop or Laptop Any Chrome 120.0 US draft
ce-0153-nbsp-line-break-prevention Windows 11 Desktop or Laptop Any Chrome 120.0 US draft

Browser compatibility

This matrix shows which browser and OS combinations have documented cases for this scenario. Click on a cell to view the specific case.

Confirmed
Draft
No case documented

Cases

Open a case to see the detailed description and its dedicated playground.

Related Scenarios

Other scenarios that share similar tags or category.

Tags: paste, whitespace

Code block indentation lost on paste or format

Leading spaces and tabs in pasted code can collapse to a single space or be stripped when the editor normalizes to paragraphs or applies pre-wrap inconsistently.

1 case
Tags: paste, whitespace

Trailing and leading whitespace on paste

Firefox and other browsers may preserve or normalize trailing newlines and spaces differently when pasting plain text—collaborative editors and diffs see unexpected whitespace changes.

1 case
Tags: paste

Clipboard API paste does not work in contenteditable

When using the Clipboard API (navigator.clipboard.readText() or navigator.clipboard.read()) to programmatically paste content into a contenteditable region, the paste operation may fail or not work as expected.

2 cases
Tags: whitespace

Code block editing behavior varies across browsers

Editing text within code blocks (<pre><code>) in contenteditable elements behaves inconsistently across browsers. Line breaks, indentation, whitespace preservation, and formatting may be handled differently, making it difficult to maintain code formatting.

4 cases

Comments & Discussion

Have questions, suggestions, or want to share your experience? Join the discussion below.