Phenomenon
The Web Speech API delivers SpeechRecognitionResult events asynchronously. Editor integrations often snapshot window.getSelection().getRangeAt(0) when recognition starts and call insertNode / insertText when result.isFinal is true. Between those two moments the user may tab away, click a toolbar, or focus another field. The saved Range can become invalid, point at detached nodes, or no longer represent “where the user was dictating.” Chrome does not provide a built-in “bind this transcript to this contenteditable” primitive; all correctness is application responsibility. Additionally, beforeinput / input may not fire for programmatic DOM inserts done from JavaScript, so framework state can skip the usual pipeline entirely.
Reproduction Steps
- Create a
contenteditableregion and a focusable control outside it (e.g. a<button>). - Add a Speech Recognition script: on
start, clone the current selection range if it is inside the editor. - Speak until a final result arrives (or simulate
onresultwith a timeout). - Before the final result fires, click the outside control so
document.activeElementis no longer the editor. - In
onresult, apply the transcript using the cloned range without checkingeditor.contains(range.startContainer)ordocument.activeElement. - Observe DOM errors in the console, insertion into unexpected locations, or silent failure depending on fallback logic.
Observed Behavior
- Stale range:
Rangeoperations may throwInvalidNodeTypeErroror no-op if boundary nodes were removed by a re-render. - Wrong active element: Fallbacks that re-read
getSelection()insert relative to wherever focus moved—not the original editor. - Missing input events: Programmatic mutation does not mirror keyboard-driven
beforeinput/input, so listeners that sync model oninputnever run. - No composition lifecycle: Unlike IME, Speech API does not emit
composition*events; editors that only sync oncompositionendmiss updates.
Expected Behavior
If the editor is not focused and the selection is not confidently inside the same editorial root as when recognition began, the application should not apply the transcript to the old range. It should either cancel, queue until the user refocuses the editor, or prompt. Any intentional programmatic insert should go through the same state update path as keyboard input (custom events or explicit model patches).
Impact
- Corrupted documents: Text appears in the wrong block or outside the intended wrapper.
- Framework desync: Virtual DOM assumes events that never fired.
- Accessibility: Voice users may lose data silently when focus shifts to a toolbar or dialog mid-utterance.
Browser Comparison
- Chrome / Edge: Speech Recognition API available; async timing issue is integration-level but very common.
- Safari / Firefox: API availability and behavior differ; do not assume Chrome parity.
- OS dictation: May bypass your Speech API entirely and use native editing paths—different bug surface.
Solutions
- Guard every insert:
function safeInsert(editor, range, text) {
if (!editor.contains(range.startContainer)) return false;
if (document.activeElement !== editor && !editor.contains(document.activeElement)) return false;
range.deleteContents();
range.insertNode(document.createTextNode(text));
range.collapse(false);
getSelection().removeAllRanges();
getSelection().addRange(range);
return true;
}
-
Version the session: Store an incrementing
sessionIdwhen starting recognition; if the editor’sfocus/blurhandlers changesessionId, ignore late results. -
Emit synthetic pipeline: After programmatic insert, dispatch
InputEventor call your model updater directly so one code path owns truth.