Overview
The schema defines what your document structure can contain. It's the contract between your model and the operations that modify it. A well-designed schema ensures your document is always valid and predictable.
The model is the actual document instance that conforms to the schema. It represents the current state of the document in a structured, validated format.
Schema Definition
A schema defines the structure and rules for your document:
Node Specs
Each node type has a spec that defines its properties:
const schema = {
nodes: {
document: {
content: 'block+', // Must contain one or more blocks
},
paragraph: {
content: 'inline*', // Can contain zero or more inlines
group: 'block', // Belongs to block group
},
heading: {
content: 'inline*',
group: 'block',
attrs: {
level: { default: 1 } // Attribute with default
}
},
text: {
group: 'inline',
// Text nodes don't have children
},
link: {
content: 'inline*',
group: 'inline',
attrs: {
href: { default: '' }
}
}
}
};Mark Specs
Marks define formatting that can be applied to text:
const schema = {
marks: {
bold: {
// Simple mark with no attributes
},
italic: {},
underline: {},
link: {
attrs: {
href: { default: '' },
title: { default: '' }
}
},
code: {
// Code mark might exclude other marks
excludes: 'bold italic underline'
}
}
};Content Rules
Content rules define what can be nested inside each node:
'block+'- One or more blocks'block*'- Zero or more blocks'inline*'- Zero or more inlines'paragraph | heading'- Paragraph or heading'(paragraph | heading)+'- One or more paragraphs or headings
// Example content rules
{
document: {
content: 'block+' // Document must have at least one block
},
paragraph: {
content: 'inline*' // Paragraph can have any inlines
},
list: {
content: 'listItem+', // List must have at least one item
group: 'block'
},
listItem: {
content: 'paragraph block*', // Item starts with paragraph, then optional blocks
group: 'block'
}
}Node Types
Block Nodes
Block nodes are structural elements that typically start on a new line:
- Paragraphs
- Headings (h1-h6)
- Lists (ordered, unordered)
- Code blocks
- Blockquotes
- Tables
// Block node examples
{
type: 'paragraph',
children: [
{ type: 'text', text: 'This is a paragraph.' }
]
}
{
type: 'heading',
level: 2,
children: [
{ type: 'text', text: 'Heading' }
]
}
{
type: 'codeBlock',
language: 'javascript',
children: [
{ type: 'text', text: 'const x = 1;' }
]
}Inline Nodes
Inline nodes exist within blocks and don't break the line:
- Links
- Images
- Mentions
- Custom inline elements
// Inline node examples
{
type: 'link',
attrs: { href: 'https://example.com' },
children: [
{ type: 'text', text: 'Example' }
]
}
{
type: 'image',
attrs: {
src: 'image.jpg',
alt: 'Description'
}
// Images typically don't have children
}Text Nodes
Text nodes contain the actual text content and can have marks:
// Text node with marks
{
type: 'text',
text: 'Bold and italic',
marks: [
{ type: 'bold' },
{ type: 'italic' }
]
}
// Plain text node
{
type: 'text',
text: 'Plain text',
marks: []
}Document Structure
Hierarchical Structure
Documents are trees with a root document node:
// Complete document structure
{
type: 'document',
children: [
{
type: 'heading',
level: 1,
children: [
{ type: 'text', text: 'Title' }
]
},
{
type: 'paragraph',
children: [
{ type: 'text', text: 'First paragraph.' }
]
},
{
type: 'paragraph',
children: [
{ type: 'text', text: 'Second ' },
{ type: 'text', text: 'paragraph', marks: [{ type: 'bold' }] },
{ type: 'text', text: '.' }
]
}
]
}Nesting Rules
Schema enforces nesting rules to prevent invalid structures:
- Blocks cannot be nested inside inlines
- Text nodes can only be inside inlines or blocks
- Some nodes have specific content requirements
// Valid structure
{
type: 'paragraph',
children: [
{ type: 'text', text: 'Text' }
]
}
// Invalid structure (block inside inline)
{
type: 'link',
children: [
{
type: 'paragraph', // ❌ Invalid: block inside inline
children: [...]
}
]
}
// Valid: inline inside block
{
type: 'paragraph',
children: [
{
type: 'link', // ✅ Valid: inline inside block
children: [
{ type: 'text', text: 'Link' }
]
}
]
}Mark System
Mark Types
Marks are formatting applied to text nodes:
// Text with single mark
{
type: 'text',
text: 'Bold text',
marks: [{ type: 'bold' }]
}
// Text with multiple marks
{
type: 'text',
text: 'Bold and italic',
marks: [
{ type: 'bold' },
{ type: 'italic' }
]
}
// Text with mark that has attributes
{
type: 'text',
text: 'Link text',
marks: [
{
type: 'link',
attrs: { href: 'https://example.com' }
}
]
}Mark Attributes
Some marks have attributes:
// Link mark with attributes
{
type: 'text',
text: 'Example',
marks: [
{
type: 'link',
attrs: {
href: 'https://example.com',
title: 'Example website'
}
}
]
}
// Color mark with attributes
{
type: 'text',
text: 'Red text',
marks: [
{
type: 'color',
attrs: { color: '#ff0000' }
}
]
}Mark Exclusivity
Some marks exclude others (e.g., code mark excludes formatting):
const schema = {
marks: {
code: {
excludes: 'bold italic underline link' // Code can't have other marks
},
link: {
// Link can coexist with bold, italic, etc.
}
}
};
// Valid: bold and italic together
{
type: 'text',
text: 'Bold italic',
marks: [{ type: 'bold' }, { type: 'italic' }]
}
// Invalid: code with bold
{
type: 'text',
text: 'Code bold',
marks: [
{ type: 'code' },
{ type: 'bold' } // ❌ Code excludes bold
]
}Schema Validation
Structure Validation
Validate that document structure matches schema:
function validateDocument(doc, schema) {
// Check root node type
if (doc.type !== schema.topNode) {
return { valid: false, error: 'Invalid root node' };
}
// Validate each child
for (const child of doc.children) {
const result = validateNode(child, schema);
if (!result.valid) {
return result;
}
}
return { valid: true };
}
function validateNode(node, schema) {
const spec = schema.nodes[node.type];
if (!spec) {
return { valid: false, error: 'Unknown node type: ' + node.type };
}
// Validate content matches spec
if (!matchesContentRule(node.children, spec.content)) {
return { valid: false, error: 'Content does not match spec' };
}
// Validate attributes
if (!validateAttributes(node.attrs, spec.attrs)) {
return { valid: false, error: 'Invalid attributes' };
}
// Recursively validate children
for (const child of node.children) {
const result = validateNode(child, schema);
if (!result.valid) {
return result;
}
}
return { valid: true };
}Content Validation
Validate that node content matches content rules:
function matchesContentRule(children, rule) {
// Parse content rule (e.g., 'block+', 'inline*')
const parsed = parseContentRule(rule);
// Check if children match
if (parsed.type === 'group') {
// Check if all children are in the group
return children.every(child =>
isInGroup(child, parsed.group)
);
}
// Handle other rule types...
return true;
}
function isInGroup(node, group) {
const spec = schema.nodes[node.type];
return spec?.group === group;
}HTML Mapping
Mapping between your model and HTML is essential for rendering and parsing:
Model to HTML
Serialize your model to HTML:
function serializeNode(node) {
switch (node.type) {
case 'document':
return serializeChildren(node.children);
case 'paragraph':
return '<p>' + serializeChildren(node.children) + '</p>';
case 'heading':
return '<h' + node.level + '>' + serializeChildren(node.children) + '</h' + node.level + '>';
case 'text':
let html = escapeHtml(node.text);
// Apply marks
if (node.marks) {
node.marks.forEach(mark => {
html = wrapWithMark(html, mark);
});
}
return html;
case 'link':
const href = node.attrs?.href || '';
return '<a href="' + escapeHtml(href) + '">' + serializeChildren(node.children) + '</a>';
default:
return serializeChildren(node.children);
}
}
function wrapWithMark(html, mark) {
const tagMap = {
bold: 'strong',
italic: 'em',
underline: 'u',
code: 'code'
};
const tag = tagMap[mark.type];
if (!tag) return html;
const attrs = mark.attrs ? serializeAttrs(mark.attrs) : '';
return '<' + tag + attrs + '>' + html + '</' + tag + '>';
}HTML to Model
Parse HTML into your model:
function parseHTML(html) {
const parser = new DOMParser();
const doc = parser.parseFromString(html, 'text/html');
return {
type: 'document',
children: Array.from(doc.body.childNodes)
.map(node => parseNode(node))
.filter(Boolean)
};
}
function parseNode(domNode) {
if (domNode.nodeType === Node.TEXT_NODE) {
return {
type: 'text',
text: domNode.textContent,
marks: extractMarks(domNode)
};
}
if (domNode.nodeType === Node.ELEMENT_NODE) {
const nodeType = getNodeType(domNode.tagName);
if (!nodeType) {
// Unknown element, unwrap and parse children
return parseChildren(domNode.childNodes);
}
return {
type: nodeType,
attrs: extractAttributes(domNode, nodeType),
children: parseChildren(domNode.childNodes)
};
}
return null;
}
function extractMarks(textNode) {
const marks = [];
let current = textNode.parentElement;
while (current && current !== editor) {
const mark = getMarkFromElement(current);
if (mark) {
marks.push(mark);
}
current = current.parentElement;
}
return marks;
}HTML Normalization
Normalize inconsistent HTML to match your schema:
- Convert
<b>to<strong> - Convert
<i>to<em> - Convert
<div>to<p>when appropriate - Remove invalid attributes
- Fix nesting violations
function normalizeHTML(html) {
const parser = new DOMParser();
const doc = parser.parseFromString(html, 'text/html');
// Normalize elements
normalizeElements(doc.body);
// Fix nesting
fixNesting(doc.body);
// Remove invalid attributes
removeInvalidAttributes(doc.body);
return doc.body.innerHTML;
}
function normalizeElements(element) {
// Convert b to strong
element.querySelectorAll('b').forEach(b => {
const strong = document.createElement('strong');
strong.innerHTML = b.innerHTML;
b.parentNode.replaceChild(strong, b);
});
// Convert i to em
element.querySelectorAll('i').forEach(i => {
const em = document.createElement('em');
em.innerHTML = i.innerHTML;
i.parentNode.replaceChild(em, i);
});
// Convert div to p (if appropriate)
element.querySelectorAll('div').forEach(div => {
if (!div.querySelector('p, ul, ol, h1, h2, h3, h4, h5, h6')) {
const p = document.createElement('p');
p.innerHTML = div.innerHTML;
div.parentNode.replaceChild(p, div);
}
});
}