Scenario

Combining characters and complex scripts during IME composition

Scripts that use combining marks, conjuncts, or tone marks (e.g. Thai, Devanagari, Vietnamese, Cyrillic) may compose differently across browsers. Diacritics can be lost, reordered, or split across DOM nodes when the editor normalizes or wraps text during composition.

ime
Scenario ID
scenario-ime-combining-characters-composition

Details

Scripts that use combining marks, conjuncts, or tone marks (e.g. Thai, Devanagari, Vietnamese, Cyrillic) may compose differently across browsers. Diacritics can be lost, reordered, or split across DOM nodes when the editor normalizes or wraps text during composition.

Problem Overview

Unicode combining sequences and grapheme clusters do not always map 1:1 to DOM Text nodes or to editor transactions. Frameworks that slice strings by JavaScript length or insert zero-width characters for placeholders can break Indic or Thai input.

Observed Behavior

  • Tone marks or vowel signs appear in the wrong order relative to the base character.
  • Conjunct formation fails until a space or punctuation is typed.
  • Cyrillic composition behaves differently from Latin IME expectations.

Impact

Corrupted words, broken spell-check ranges, and undo stacks that do not match user expectations.

Browser Comparison

Harfbuzz shaping and platform IMEs differ; Safari vs Chrome on the same OS can diverge for Vietnamese and Thai.

Solutions

  • Prefer grapheme-aware operations (e.g. Intl.Segmenter or libraries) when mutating text during composition.
  • Avoid splitting Text nodes during compositionupdate unless necessary.
  • Debounce normalization until compositionend.

Best Practices

  • Test Vietnamese Telex/VNI and Thai with multiple keyboard layouts.

References

Scenario flow

Visual view of how this scenario connects to its concrete cases and environments. Nodes can be dragged and clicked.

React Flow mini map

Variants

Each row is a concrete case for this scenario, with a dedicated document and playground.

Case OS Device Browser Keyboard Status
ce-0177-thai-ime-tone-mark-positioning-firefox Windows 11 Desktop or Laptop Any Firefox 120.0 Thai (IME) draft
ce-0178-vietnamese-ime-diacritic-loss-chrome Windows 11 Desktop or Laptop Any Chrome 120.0 Vietnamese (IME) draft
ce-0180-hindi-ime-devanagari-conjuncts-chrome Windows 11 Desktop or Laptop Any Chrome 120.0 Hindi (IME - Devanagari) draft
ce-0206-russian-ime-cyrillic-composition-chrome Windows 11 Desktop or Laptop Any Chrome 120.0 Russian (IME - Cyrillic) draft

Browser compatibility

This matrix shows which browser and OS combinations have documented cases for this scenario. Click on a cell to view the specific case.

Confirmed
Draft
No case documented

Cases

This scenario affects multiple languages. Cases are grouped by language/input method below.

Hindi/Devanagari

1 case

Russian (IME - Cyrillic)

1 case

Thai

1 case

Vietnamese

1 case

Related Scenarios

Other scenarios that share similar tags or category.

Tags: ime, composition, thai

Tab key during IME composition (focus vs indent vs IME)

Tab moves focus by default. During IME composition, Tab may cancel composition, cycle candidates, or be captured by the editor for indentation—behavior differs for Chinese, Thai, and Safari vs Firefox.

2 cases
Tags: ime, composition, thai

Space key behavior during IME composition

During active IME composition, pressing Space may commit the segment, insert a literal space, be ignored, or cancel composition—depending on language, IME, and browser. Editors that assume Space always inserts U+0020 can lose characters or break composition state.

5 cases
Tags: ime, composition

beforeinput and input events have different inputType values

During IME composition or in certain browser/IME combinations, the beforeinput event may have a different inputType than the corresponding input event. For example, beforeinput may fire with insertCompositionText while input fires with deleteContentBackward. This mismatch can cause handlers to misinterpret the actual DOM change and requires storing beforeinput's targetRanges for use in input event handling.

1 case
Tags: ime, composition

Selection mismatch between beforeinput and input events

The selection (window.getSelection()) in beforeinput events can differ from the selection in corresponding input events. This mismatch can occur during IME composition, text prediction, or when typing adjacent to formatted elements like links. The selection in beforeinput may include adjacent formatted text, while input selection reflects the final cursor position.

1 case
Tags: ime, composition

beforeinput not cancelable during IME composition

Some beforeinput events during IME composition cannot be canceled per spec or implementation—calling preventDefault may throw or be ignored, so editors cannot always block native insertion.

1 case

Comments & Discussion

Have questions, suggestions, or want to share your experience? Join the discussion below.