Case ce-0585-chrome-web-speech-api-contenteditable-insertion · Scenario scenario-voice-input-contenteditable

Web Speech API final results can insert at the wrong range if selection moves during recognition

OS: Windows 10+ Device: Desktop Any Browser: Chrome 120+ Keyboard: US QWERTY (microphone for speech) Status: draft

chrome web-speech-api voice selection async contenteditable

Phenomenon

The Web Speech API delivers SpeechRecognitionResult events asynchronously. Editor integrations often snapshot window.getSelection().getRangeAt(0) when recognition starts and call insertNode / insertText when result.isFinal is true. Between those two moments the user may tab away, click a toolbar, or focus another field. The saved Range can become invalid, point at detached nodes, or no longer represent “where the user was dictating.” Chrome does not provide a built-in “bind this transcript to this contenteditable” primitive; all correctness is application responsibility. Additionally, beforeinput / input may not fire for programmatic DOM inserts done from JavaScript, so framework state can skip the usual pipeline entirely.

Reproduction Steps

Create a contenteditable region and a focusable control outside it (e.g. a <button>).
Add a Speech Recognition script: on start, clone the current selection range if it is inside the editor.
Speak until a final result arrives (or simulate onresult with a timeout).
Before the final result fires, click the outside control so document.activeElement is no longer the editor.
In onresult, apply the transcript using the cloned range without checking editor.contains(range.startContainer) or document.activeElement.
Observe DOM errors in the console, insertion into unexpected locations, or silent failure depending on fallback logic.

Observed Behavior

Stale range: Range operations may throw InvalidNodeTypeError or no-op if boundary nodes were removed by a re-render.
Wrong active element: Fallbacks that re-read getSelection() insert relative to wherever focus moved—not the original editor.
Missing input events: Programmatic mutation does not mirror keyboard-driven beforeinput/input, so listeners that sync model on input never run.
No composition lifecycle: Unlike IME, Speech API does not emit composition* events; editors that only sync on compositionend miss updates.

Expected Behavior

If the editor is not focused and the selection is not confidently inside the same editorial root as when recognition began, the application should not apply the transcript to the old range. It should either cancel, queue until the user refocuses the editor, or prompt. Any intentional programmatic insert should go through the same state update path as keyboard input (custom events or explicit model patches).

Impact

Corrupted documents: Text appears in the wrong block or outside the intended wrapper.
Framework desync: Virtual DOM assumes events that never fired.
Accessibility: Voice users may lose data silently when focus shifts to a toolbar or dialog mid-utterance.

Browser Comparison

Chrome / Edge: Speech Recognition API available; async timing issue is integration-level but very common.
Safari / Firefox: API availability and behavior differ; do not assume Chrome parity.
OS dictation: May bypass your Speech API entirely and use native editing paths—different bug surface.

Solutions

Guard every insert:

function safeInsert(editor, range, text) {
  if (!editor.contains(range.startContainer)) return false;
  if (document.activeElement !== editor && !editor.contains(document.activeElement)) return false;
  range.deleteContents();
  range.insertNode(document.createTextNode(text));
  range.collapse(false);
  getSelection().removeAllRanges();
  getSelection().addRange(range);
  return true;
}

Version the session: Store an incrementing sessionId when starting recognition; if the editor’s focus/blur handlers change sessionId, ignore late results.
Emit synthetic pipeline: After programmatic insert, dispatch InputEvent or call your model updater directly so one code path owns truth.

References

Step 1: Caret inside editor, recognition started

Hello

User places caret after 'Hello ' and starts speech recognition that will fire onresult later.

→

Step 2: User clicks outside before onresult runs

Hello

Focus moves to the button; getSelection() no longer has a meaningful range inside #editor.

✅ Expected

Hello

Expected: Detect that editor is not active; queue transcript or discard insert instead of corrupting DOM.

Playground for this case

Use the reported environment as a reference and record what happens in your environment while interacting with the editable area.

Reported environment

OS: Windows 10+

Device: Desktop Any

Browser: Chrome 120+

Keyboard: US QWERTY (microphone for speech)

Your environment

OSDeviceBrowserKeyboard

Sample HTML:

Event log

Use this log together with the case description when filing or updating an issue.

0 events

Interact with the editable area to see events here.

Comments & Discussion

Have questions, suggestions, or want to share your experience? Join the discussion below.