Pretext for AI Chat Interfaces: Instant Bubble Sizing Without DOM Reflow

AI chat interfaces are one of the hardest layout problems on the web. Messages arrive as a stream of tokens. Bubble heights change mid-render. Users scroll up while new content pushes down. Every token that arrives can trigger a reflow, causing scroll jank that makes the experience feel broken.

Pretext solves this by computing text layout in pure JavaScript — before the DOM ever sees it. This post shows how to use Pretext to build AI chat interfaces that are fast, smooth, and pixel-perfect.

The AI Chat Layout Problem

Traditional chat UIs have a simple layout model: messages arrive one at a time, you append them to the DOM, and the browser handles the rest. But AI chat changed everything:

Streaming tokens: GPT, Claude, and other LLMs send responses token by token. Each token changes the message length, which changes the bubble height, which shifts every message below it.
Variable-width bubbles: A good chat UI wraps bubbles tightly around text, not stretching them to a fixed max-width. But computing the tightest width that preserves line count requires measuring the text.
Long conversations: AI conversations can contain hundreds of messages. Virtual scrolling is essential for performance, but it requires knowing every message height — even offscreen ones.
Code blocks and mixed content: AI responses frequently contain code, lists, and formatted text with different font sizes and line heights.

Each of these problems involves the same bottleneck: you need to know how much space text will occupy before you render it.

How Traditional Chat UIs Handle This

Most chat UIs use one of these approaches:

Approach 1: Let the Browser Handle It

// Append message, let CSS do the layout
chatContainer.appendChild(messageBubble);
chatContainer.scrollTop = chatContainer.scrollHeight;

This works for simple cases but causes visible scroll jumping during streaming. Every token triggers a reflow, and if the user has scrolled up, the scroll position shifts unpredictably.

Approach 2: Fixed-Width Bubbles

.bubble {
  max-width: 70%;
  /* Browser wraps text within this fixed width */
}

This avoids measurement entirely but wastes horizontal space. A short message like "Sure!" gets a bubble 70% of the container width. It looks sloppy — especially on mobile where screen space is precious.

Approach 3: DOM Pre-Measurement

function measureBubble(text, maxWidth) {
  const hidden = document.createElement('div');
  hidden.style.cssText = `
    position: absolute; visibility: hidden;
    max-width: ${maxWidth}px; font: 14px/20px Inter;
  `;
  hidden.textContent = text;
  document.body.appendChild(hidden);
  const { width, height } = hidden.getBoundingClientRect();
  document.body.removeChild(hidden);
  return { width, height };
}

This gives accurate dimensions but triggers a reflow per measurement. During streaming, you might measure 30+ times per second as tokens arrive — each time forcing the browser to recalculate layout.

The Pretext Approach

Pretext eliminates all three problems at once. Here is the core pattern:

import { prepare, layout } from 'pretext';

// One-time setup — cache font measurements
const engine = prepare({
  fontFamily: 'Inter',
  fontSize: 14,
  lineHeight: 20,
});

function measureMessage(text, maxWidth) {
  const result = layout(engine, text, { maxWidth });
  return {
    width: result.width,
    height: result.height,
    lineCount: result.lines.length,
  };
}

After prepare() runs once, every call to layout() is pure math — no DOM, no reflow, no jank. You can call it hundreds of times per frame without affecting rendering performance.

Pattern 1: Smooth Streaming Without Scroll Jank

The biggest win for AI chat is during streaming. As tokens arrive, you need to know whether the new token creates a new line (changing the bubble height) or fits on the current line (no height change).

function handleStreamToken(messageId, fullText) {
  const { height: newHeight } = measureMessage(fullText, maxBubbleWidth);
  const prevHeight = heightCache.get(messageId) || 0;

  if (newHeight !== prevHeight) {
    // Height changed — update the bubble and adjust scroll
    heightCache.set(messageId, newHeight);
    updateBubbleHeight(messageId, newHeight);

    // Only adjust scroll if user is near the bottom
    if (isNearBottom()) {
      scrollToBottom({ behavior: 'smooth' });
    }
  }

  // Update text content without triggering layout measurement
  updateBubbleText(messageId, fullText);
}

The key insight: you only adjust scroll position when the height actually changes. Most tokens do not create new lines, so most of the time no scroll adjustment is needed. Without Pretext, you would need a DOM measurement for every single token to know this.

Pattern 2: Tight-Wrap Bubbles

Tight-wrapping means finding the minimum bubble width that keeps the same number of lines. This eliminates the wasted space of fixed max-width bubbles.

function tightWrap(text, maxWidth) {
  // First, get the natural layout at max width
  const natural = layout(engine, text, { maxWidth });
  const lineCount = natural.lines.length;

  // Binary search for the minimum width that preserves line count
  let lo = 0;
  let hi = maxWidth;

  while (hi - lo > 1) {
    const mid = (lo + hi) / 2;
    const test = layout(engine, text, { maxWidth: mid });

    if (test.lines.length === lineCount) {
      hi = mid; // Can go narrower
    } else {
      lo = mid; // Too narrow, lines increased
    }
  }

  return {
    width: hi,
    height: natural.height,
  };
}

This binary search calls layout() about 10–12 times per message. With DOM measurement, that would mean 10–12 reflows per bubble — unusable during streaming. With Pretext, it completes in microseconds.

The result: every bubble fits its content perfectly, just like native messaging apps.

Pattern 3: Virtual Scroll for Long Conversations

AI conversations can easily reach hundreds of messages. Rendering all of them tanks performance. Virtual scroll renders only the visible messages, but it needs to know the height of every message to calculate scroll positions.

const engine = prepare({
  fontFamily: 'Inter',
  fontSize: 14,
  lineHeight: 20,
});

function buildHeightMap(messages) {
  return messages.map(msg => {
    const result = layout(engine, msg.text, {
      maxWidth: maxBubbleWidth,
    });

    // Add padding, avatar space, timestamp height
    const bubbleHeight = result.height;
    const padding = 24; // top + bottom padding
    const metadata = 20; // timestamp row
    return bubbleHeight + padding + metadata;
  });
}

// Compute all heights instantly — no DOM needed
const heights = buildHeightMap(allMessages);
const totalHeight = heights.reduce((sum, h) => sum + h, 0);

Computing heights for 1,000 messages takes under 10ms with Pretext. The same operation with DOM measurement would take 500ms+ and freeze the UI.

Pattern 4: Pre-Computing Heights During Fetch

One of Pretext's unique advantages is that you can compute layout before the component mounts. This means you can calculate heights while chat history is loading:

async function loadChatHistory(chatId) {
  const messages = await fetchMessages(chatId);

  // Compute heights immediately — no need to wait for mount
  const heights = messages.map(msg => {
    const result = layout(engine, msg.text, { maxWidth: 400 });
    return result.height + 24; // + padding
  });

  return { messages, heights };
}

When the component mounts, it already knows every message height. There is no measurement pass, no layout shift, and no flash of incorrectly-sized content.

Handling Code Blocks

AI responses often contain code blocks with a different font. Pretext handles this by creating separate engines for each font configuration:

const textEngine = prepare({
  fontFamily: 'Inter',
  fontSize: 14,
  lineHeight: 20,
});

const codeEngine = prepare({
  fontFamily: 'JetBrains Mono',
  fontSize: 13,
  lineHeight: 20,
});

function measureAIResponse(blocks) {
  let totalHeight = 0;

  for (const block of blocks) {
    const engine = block.type === 'code' ? codeEngine : textEngine;
    const maxWidth = block.type === 'code' ? codeBlockWidth : textWidth;
    const result = layout(engine, block.content, { maxWidth });
    totalHeight += result.height;

    // Add block spacing
    totalHeight += block.type === 'code' ? 32 : 8; // code blocks have more padding
  }

  return totalHeight;
}

Performance Numbers

Here are rough benchmarks for a typical AI chat scenario (1,000 messages, average 50 words each):

Operation	DOM Measurement	Pretext
Initial height map	~500ms	~8ms
Per-token streaming measurement	~0.5ms (with reflow)	~0.01ms
Tight-wrap (12 iterations)	~6ms	~0.1ms
Memory overhead	Creates/destroys DOM nodes	~50KB cached engine

The streaming measurement is the most critical number. At 30 tokens per second (typical for LLM streaming), DOM measurement would consume 15ms per second just for layout — that is 25% of the frame budget at 60fps. Pretext uses 0.3ms per second — essentially free.

Integration with React

Here is a minimal React hook for using Pretext in a chat component:

import { prepare, layout } from 'pretext';
import { useRef, useMemo } from 'react';

function usePretextEngine(fontConfig) {
  const engineRef = useRef(null);

  if (!engineRef.current) {
    engineRef.current = prepare(fontConfig);
  }

  return engineRef.current;
}

function useBubbleSize(text, maxWidth, fontConfig) {
  const engine = usePretextEngine(fontConfig);

  return useMemo(() => {
    if (!text) return { width: 0, height: 0 };
    const result = layout(engine, text, { maxWidth });
    return { width: result.width, height: result.height };
  }, [engine, text, maxWidth]);
}

// Usage in a message component
function ChatBubble({ message, maxWidth }) {
  const { width, height } = useBubbleSize(
    message.text,
    maxWidth,
    { fontFamily: 'Inter', fontSize: 14, lineHeight: 20 }
  );

  return (
    <div style={{ width, minHeight: height }} className="chat-bubble">
      {message.text}
    </div>
  );
}

When Not to Use Pretext for Chat

Pretext works with plain text strings. If your AI chat renders rich HTML (bold, italic, links, inline images), Pretext cannot measure the mixed layout. In those cases, you have two options:

Use Pretext for height estimation: Measure the plain text version for scroll calculations, and let the browser handle the final rich render. The estimate will be close enough for smooth scrolling.
Use Pretext for the text segments: Parse the markdown into blocks, measure each text block with Pretext, and add fixed heights for non-text elements (images, embeds).

Conclusion

AI chat interfaces push web layout to its limits. Streaming tokens, tight-wrap bubbles, and long conversation histories all demand text measurement at a scale that DOM-based approaches cannot handle without jank.

Pretext gives you exact text dimensions through pure JavaScript computation. No reflows, no hidden elements, no main thread blocking. For AI chat, the benefits are immediate and dramatic: smooth streaming, pixel-perfect bubbles, and instant virtual scroll — all without touching the DOM.

Try it yourself in the Pretext Playground, or see the tight chat bubbles demo to see the difference between CSS max-width and Pretext-wrapped bubbles side by side.

Table of Contents