Model & Schema

Designing your document model and schema: node types, document structure, mark system, validation, and HTML mapping.

Overview

The schema defines what your document structure can contain. It's the contract between your model and the operations that modify it. A well-designed schema ensures your document is always valid and predictable.

The model is the actual document instance that conforms to the schema. It represents the current state of the document in a structured, validated format.

Schema Definition

A schema defines the structure and rules for your document:

Node Specs

Each node type has a spec that defines its properties:

const schema = {
  nodes: {
    document: {
      content: 'block+',  // Must contain one or more blocks
    },
    paragraph: {
      content: 'inline*',  // Can contain zero or more inlines
      group: 'block',      // Belongs to block group
    },
    heading: {
      content: 'inline*',
      group: 'block',
      attrs: {
        level: { default: 1 }  // Attribute with default
      }
    },
    text: {
      group: 'inline',
      // Text nodes don't have children
    },
    link: {
      content: 'inline*',
      group: 'inline',
      attrs: {
        href: { default: '' }
      }
    }
  }
};

Mark Specs

Marks define formatting that can be applied to text:

const schema = {
  marks: {
    bold: {
      // Simple mark with no attributes
    },
    italic: {},
    underline: {},
    link: {
      attrs: {
        href: { default: '' },
        title: { default: '' }
      }
    },
    code: {
      // Code mark might exclude other marks
      excludes: 'bold italic underline'
    }
  }
};

Content Rules

Content rules define what can be nested inside each node:

  • 'block+' - One or more blocks
  • 'block*' - Zero or more blocks
  • 'inline*' - Zero or more inlines
  • 'paragraph | heading' - Paragraph or heading
  • '(paragraph | heading)+' - One or more paragraphs or headings
// Example content rules
{
  document: {
    content: 'block+'  // Document must have at least one block
  },
  paragraph: {
    content: 'inline*'  // Paragraph can have any inlines
  },
  list: {
    content: 'listItem+',  // List must have at least one item
    group: 'block'
  },
  listItem: {
    content: 'paragraph block*',  // Item starts with paragraph, then optional blocks
    group: 'block'
  }
}

Node Types

Block Nodes

Block nodes are structural elements that typically start on a new line:

  • Paragraphs
  • Headings (h1-h6)
  • Lists (ordered, unordered)
  • Code blocks
  • Blockquotes
  • Tables
// Block node examples
{
  type: 'paragraph',
  children: [
    { type: 'text', text: 'This is a paragraph.' }
  ]
}

{
  type: 'heading',
  level: 2,
  children: [
    { type: 'text', text: 'Heading' }
  ]
}

{
  type: 'codeBlock',
  language: 'javascript',
  children: [
    { type: 'text', text: 'const x = 1;' }
  ]
}

Inline Nodes

Inline nodes exist within blocks and don't break the line:

  • Links
  • Images
  • Mentions
  • Custom inline elements
// Inline node examples
{
  type: 'link',
  attrs: { href: 'https://example.com' },
  children: [
    { type: 'text', text: 'Example' }
  ]
}

{
  type: 'image',
  attrs: {
    src: 'image.jpg',
    alt: 'Description'
  }
  // Images typically don't have children
}

Text Nodes

Text nodes contain the actual text content and can have marks:

// Text node with marks
{
  type: 'text',
  text: 'Bold and italic',
  marks: [
    { type: 'bold' },
    { type: 'italic' }
  ]
}

// Plain text node
{
  type: 'text',
  text: 'Plain text',
  marks: []
}

Document Structure

Hierarchical Structure

Documents are trees with a root document node:

// Complete document structure
{
  type: 'document',
  children: [
    {
      type: 'heading',
      level: 1,
      children: [
        { type: 'text', text: 'Title' }
      ]
    },
    {
      type: 'paragraph',
      children: [
        { type: 'text', text: 'First paragraph.' }
      ]
    },
    {
      type: 'paragraph',
      children: [
        { type: 'text', text: 'Second ' },
        { type: 'text', text: 'paragraph', marks: [{ type: 'bold' }] },
        { type: 'text', text: '.' }
      ]
    }
  ]
}

Nesting Rules

Schema enforces nesting rules to prevent invalid structures:

  • Blocks cannot be nested inside inlines
  • Text nodes can only be inside inlines or blocks
  • Some nodes have specific content requirements
// Valid structure
{
  type: 'paragraph',
  children: [
    { type: 'text', text: 'Text' }
  ]
}

// Invalid structure (block inside inline)
{
  type: 'link',
  children: [
    {
      type: 'paragraph',  // ❌ Invalid: block inside inline
      children: [...]
    }
  ]
}

// Valid: inline inside block
{
  type: 'paragraph',
  children: [
    {
      type: 'link',  // ✅ Valid: inline inside block
      children: [
        { type: 'text', text: 'Link' }
      ]
    }
  ]
}

Mark System

Mark Types

Marks are formatting applied to text nodes:

// Text with single mark
{
  type: 'text',
  text: 'Bold text',
  marks: [{ type: 'bold' }]
}

// Text with multiple marks
{
  type: 'text',
  text: 'Bold and italic',
  marks: [
    { type: 'bold' },
    { type: 'italic' }
  ]
}

// Text with mark that has attributes
{
  type: 'text',
  text: 'Link text',
  marks: [
    {
      type: 'link',
      attrs: { href: 'https://example.com' }
    }
  ]
}

Mark Attributes

Some marks have attributes:

// Link mark with attributes
{
  type: 'text',
  text: 'Example',
  marks: [
    {
      type: 'link',
      attrs: {
        href: 'https://example.com',
        title: 'Example website'
      }
    }
  ]
}

// Color mark with attributes
{
  type: 'text',
  text: 'Red text',
  marks: [
    {
      type: 'color',
      attrs: { color: '#ff0000' }
    }
  ]
}

Mark Exclusivity

Some marks exclude others (e.g., code mark excludes formatting):

const schema = {
  marks: {
    code: {
      excludes: 'bold italic underline link'  // Code can't have other marks
    },
    link: {
      // Link can coexist with bold, italic, etc.
    }
  }
};

// Valid: bold and italic together
{
  type: 'text',
  text: 'Bold italic',
  marks: [{ type: 'bold' }, { type: 'italic' }]
}

// Invalid: code with bold
{
  type: 'text',
  text: 'Code bold',
  marks: [
    { type: 'code' },
    { type: 'bold' }  // ❌ Code excludes bold
  ]
}

Schema Validation

Structure Validation

Validate that document structure matches schema:

function validateDocument(doc, schema) {
  // Check root node type
  if (doc.type !== schema.topNode) {
    return { valid: false, error: 'Invalid root node' };
  }
  
  // Validate each child
  for (const child of doc.children) {
    const result = validateNode(child, schema);
    if (!result.valid) {
      return result;
    }
  }
  
  return { valid: true };
}

function validateNode(node, schema) {
  const spec = schema.nodes[node.type];
  if (!spec) {
    return { valid: false, error: 'Unknown node type: ' + node.type };
  }
  
  // Validate content matches spec
  if (!matchesContentRule(node.children, spec.content)) {
    return { valid: false, error: 'Content does not match spec' };
  }
  
  // Validate attributes
  if (!validateAttributes(node.attrs, spec.attrs)) {
    return { valid: false, error: 'Invalid attributes' };
  }
  
  // Recursively validate children
  for (const child of node.children) {
    const result = validateNode(child, schema);
    if (!result.valid) {
      return result;
    }
  }
  
  return { valid: true };
}

Content Validation

Validate that node content matches content rules:

function matchesContentRule(children, rule) {
  // Parse content rule (e.g., 'block+', 'inline*')
  const parsed = parseContentRule(rule);
  
  // Check if children match
  if (parsed.type === 'group') {
    // Check if all children are in the group
    return children.every(child => 
      isInGroup(child, parsed.group)
    );
  }
  
  // Handle other rule types...
  return true;
}

function isInGroup(node, group) {
  const spec = schema.nodes[node.type];
  return spec?.group === group;
}

HTML Mapping

Mapping between your model and HTML is essential for rendering and parsing:

Model to HTML

Serialize your model to HTML:

function serializeNode(node) {
  switch (node.type) {
    case 'document':
      return serializeChildren(node.children);
      
    case 'paragraph':
      return '<p>' + serializeChildren(node.children) + '</p>';
      
    case 'heading':
      return '<h' + node.level + '>' + serializeChildren(node.children) + '</h' + node.level + '>';
      
    case 'text':
      let html = escapeHtml(node.text);
      // Apply marks
      if (node.marks) {
        node.marks.forEach(mark => {
          html = wrapWithMark(html, mark);
        });
      }
      return html;
      
    case 'link':
      const href = node.attrs?.href || '';
      return '<a href="' + escapeHtml(href) + '">' + serializeChildren(node.children) + '</a>';
      
    default:
      return serializeChildren(node.children);
  }
}

function wrapWithMark(html, mark) {
  const tagMap = {
    bold: 'strong',
    italic: 'em',
    underline: 'u',
    code: 'code'
  };
  
  const tag = tagMap[mark.type];
  if (!tag) return html;
  
  const attrs = mark.attrs ? serializeAttrs(mark.attrs) : '';
  return '<' + tag + attrs + '>' + html + '</' + tag + '>';
}

HTML to Model

Parse HTML into your model:

function parseHTML(html) {
  const parser = new DOMParser();
  const doc = parser.parseFromString(html, 'text/html');
  
  return {
    type: 'document',
    children: Array.from(doc.body.childNodes)
      .map(node => parseNode(node))
      .filter(Boolean)
  };
}

function parseNode(domNode) {
  if (domNode.nodeType === Node.TEXT_NODE) {
    return {
      type: 'text',
      text: domNode.textContent,
      marks: extractMarks(domNode)
    };
  }
  
  if (domNode.nodeType === Node.ELEMENT_NODE) {
    const nodeType = getNodeType(domNode.tagName);
    if (!nodeType) {
      // Unknown element, unwrap and parse children
      return parseChildren(domNode.childNodes);
    }
    
    return {
      type: nodeType,
      attrs: extractAttributes(domNode, nodeType),
      children: parseChildren(domNode.childNodes)
    };
  }
  
  return null;
}

function extractMarks(textNode) {
  const marks = [];
  let current = textNode.parentElement;
  
  while (current && current !== editor) {
    const mark = getMarkFromElement(current);
    if (mark) {
      marks.push(mark);
    }
    current = current.parentElement;
  }
  
  return marks;
}

HTML Normalization

Normalize inconsistent HTML to match your schema:

  • Convert <b> to <strong>
  • Convert <i> to <em>
  • Convert <div> to <p> when appropriate
  • Remove invalid attributes
  • Fix nesting violations
function normalizeHTML(html) {
  const parser = new DOMParser();
  const doc = parser.parseFromString(html, 'text/html');
  
  // Normalize elements
  normalizeElements(doc.body);
  
  // Fix nesting
  fixNesting(doc.body);
  
  // Remove invalid attributes
  removeInvalidAttributes(doc.body);
  
  return doc.body.innerHTML;
}

function normalizeElements(element) {
  // Convert b to strong
  element.querySelectorAll('b').forEach(b => {
    const strong = document.createElement('strong');
    strong.innerHTML = b.innerHTML;
    b.parentNode.replaceChild(strong, b);
  });
  
  // Convert i to em
  element.querySelectorAll('i').forEach(i => {
    const em = document.createElement('em');
    em.innerHTML = i.innerHTML;
    i.parentNode.replaceChild(em, i);
  });
  
  // Convert div to p (if appropriate)
  element.querySelectorAll('div').forEach(div => {
    if (!div.querySelector('p, ul, ol, h1, h2, h3, h4, h5, h6')) {
      const p = document.createElement('p');
      p.innerHTML = div.innerHTML;
      div.parentNode.replaceChild(p, div);
    }
  });
}