Web Speech API final results can insert at the wrong range if selection moves during recognition
OS: Windows 10+ · Device: Desktop Any · Browser: Chrome 120+ · Keyboard: US QWERTY (microphone for speech)
Open case →Scenario
OS voice typing, Web Speech API, and assistive voice tools insert or mutate contenteditable without the same event sequence as keyboard or IME. Caret position, selection, and composition events are inconsistent across platforms, which breaks editors that assume beforeinput/input/composition alignment.
Voice-driven editing (browser Web Speech API, OS-level dictation, Dragon-style tools) interacts with contenteditable through paths that differ from physical keyboard and IME. Applications often see bulk insertText-style updates, missing composition* events, or asynchronous inserts that race with focus and selection changes. Unlike input elements, contenteditable has no dedicated “voice mode” flag in web APIs, so frameworks cannot reliably branch on the input source.
Editor code frequently assumes:
beforeinput → DOM change → input with a coherent getTargetRanges() / selection snapshot.compositionstart / compositionupdate / compositionend.Voice input regularly violates (1) and (2): transcripts may arrive in one chunk or in word splits, selection may be stale if the user moved focus while recognition was in flight, and some platforms never emit composition events for dictation. That creates duplicate handling, wrong insertion offsets, and divergent undo stacks—similar failure modes to mobile dictation but on desktop and in custom Speech API integrations.
recognition.onresult handler, the active range may no longer be inside the intended contenteditable (user clicked elsewhere).data in one beforeinput) are fragile and collide with some IMEs.editable.addEventListener('beforeinput', (e) => {
// Voice / Speech paths may still use insertText or insertCompositionText
// but composition events may be absent (platform-dependent).
console.log(e.inputType, e.data, e.getTargetRanges?.()?.length);
});
requestAnimationFrame or microtask batch while verifying document.activeElement and a saved Range still belong to your editor root.getSelection().getRangeAt(0) if it is inside the editor; on result, re-validate against editor.contains(range.startContainer) before mutating.data-editor-root checks before mutating.beforeinput / input after dictationbeforeinput / input semanticsVisual view of how this scenario connects to its concrete cases and environments. Nodes can be dragged and clicked.
Each row is a concrete case for this scenario, with a dedicated document and playground.
| Case | OS | Device | Browser | Keyboard | Status |
|---|---|---|---|---|---|
| ce-0585-chrome-web-speech-api-contenteditable-insertion | Windows 10+ | Desktop Any | Chrome 120+ | US QWERTY (microphone for speech) | draft |
Open a case to see the detailed description and its dedicated playground.
OS: Windows 10+ · Device: Desktop Any · Browser: Chrome 120+ · Keyboard: US QWERTY (microphone for speech)
Open case →Other scenarios that share similar tags or category.
Ensuring contenteditable editors are navigable for assistive technology users through proper ARIA mapping and engine synchronization.
Keyboard navigation in contenteditable elements must comply with WCAG 2.1.1 (Keyboard) and 2.1.2 (No Keyboard Trap) requirements. The Tab key typically moves focus out of contenteditable, while arrow keys move the caret. Custom keyboard handling must ensure all functionality is keyboard-operable and focus remains visible.
On iOS Safari, input and beforeinput can fire with inputType 'insertText' multiple times (e.g. voice dictation) or with inputType undefined/null. Forcing re-render or changing selection during this flow desyncs the editor model from the DOM and can permanently break subsequent input.
The selection (window.getSelection()) in beforeinput events can differ from the selection in corresponding input events. This mismatch can occur during IME composition, text prediction, or when typing adjacent to formatted elements like links. The selection in beforeinput may include adjacent formatted text, while input selection reflects the final cursor position.
The getTargetRanges() method in beforeinput events may return an empty array or undefined in various scenarios, including text prediction, certain IME compositions, or specific browser/device combinations. When getTargetRanges() is unavailable, developers must rely on window.getSelection() as a fallback, but this may be less accurate.
Have questions, suggestions, or want to share your experience? Join the discussion below.