Markdown to blocks to WXR
Chapter 1 cleaned a single HTML string. Chapter 2 staged a folder of Markdown files in memory. This chapter turns each of those files into the actual format the WordPress importer plugin reads: WXR, an extended-RSS export. Along the way we meet three more components — Markdown, BlockParser, and DataLiberation — and watch them compose into something none of them could do alone.
Markdown to block markup
The Markdown component does one thing well: it bridges Markdown and WordPress block markup, in either direction. The MarkdownConsumer class takes a Markdown string and returns a result object containing both the rendered block markup and any frontmatter parsed from the document's leading YAML.
Two outputs come back: the post metadata (read with get_meta_value() for scalars, or get_all_metadata() for the raw structure) and the block markup itself, which is the <!-- wp:heading -->…<!-- /wp:heading --> string that WordPress stores in post_content. From here on we treat that string the way WordPress treats it.
Audit the produced blocks
Before we ship the converted post into a WXR file, the importer should sanity-check what came out. Did Markdown conversion produce blocks the destination site can render? Are there headings out of order? Are there blocks the importer doesn't know how to handle? WP_Block_Parser walks the same block markup WordPress core uses and gives us a structured tree:
Two patterns to keep. The flat counter (a queue that walks innerBlocks) answers any "how many" or "does it use" question. The level checker is a domain-specific rule — accessibility wants no jumps in heading depth — but every audit you'll write follows the same shape: walk the tree, gate by blockName, ask the question. The reference page for BlockParser covers both patterns in more depth.
Apply chapter 1's cleaner inside the blocks
Block markup is HTML embedded in HTML comments. Chapter 1's clean_post_html() takes an HTML string and returns a clean one — we can run it on the whole block-markup string in one pass, because WP_HTML_Tag_Processor is happy to walk the HTML between the block-comment delimiters. The block comments themselves don't look like tags to the processor, so they pass through untouched:
Notice how <!-- wp:heading --> survived the walk verbatim. The Tag Processor only sees real HTML tags; comments and text aren't tags to it. That's why combining the two components here works without any special-casing — Markdown produces block markup, the cleaner walks the HTML inside it, and the comments pass through as plain bytes.
Stream a WXR file with DataLiberation
We have post titles, post content (clean block markup), post metadata. The format the WordPress importer reads is WXR — WordPress eXtended RSS — an XML dialect with a fixed shape. DataLiberation's WXRWriter takes ImportEntity objects and streams them into a byte sink, one entity at a time, without ever holding the whole export in memory:
The writer holds only what it needs to close currently-open XML tags — fewer than ten kilobytes of state for any reasonable pipeline. Every append_entity() writes one item to the underlying byte sink and forgets it. You can build a WXR from twenty thousand posts on a host with sixty-four megabytes of RAM and the importer code looks no different from the two-post version above.
End-to-end: Markdown folder to WXR file
Now we wire it all together. The pipeline reads the staged Markdown files from chapter 2, converts each to block markup, cleans the HTML inside it, builds an ImportEntity with title and slug from frontmatter, and streams the whole thing into a WXR document. This is the importer's first complete end-to-end run:
One pass. Three components composed (Markdown for parsing, HTML for cleaning, DataLiberation for WXR), each doing one thing well. The output is real WXR — drop it on a WordPress site through the importer plugin and you get three published posts with the cleaned content, the right slugs, and the frontmatter titles.
Refinement: rewrite URLs across an existing WXR
The pattern above (build WXR from sources) is one half of DataLiberation. The other half is reading and transforming an existing WXR. WXREntityReader emits one entity at a time from a WXR document, and you can wire it to a WXRWriter to produce a transformed copy:
The same pattern handles every "transform an export between sites" job — staging-to-production URL rewrites, theme migrations, slug normalization. Reader on the left, writer on the right, your transformation in the middle. Feed the reader bytes incrementally (instead of append_bytes( $source ) all at once) and pipe the writer to a file sink (instead of MemoryPipe), and the same code processes a 10 GB export with the memory footprint of one entity at a time.
Recap
You can now:
- Convert Markdown plus YAML frontmatter into block markup with
MarkdownConsumer. - Walk the produced block tree with
WP_Block_Parserto count, audit, or rewrite blocks. - Apply HTML rewrites to block markup without breaking the surrounding block comments.
- Stream a WXR document with
WXRWriterin constant memory regardless of input size. - Read an existing WXR with
WXREntityReaderand pipe its entities through a transformation into a new WXR.
The importer is now functionally complete for text content. What's missing is the network — when a Markdown post references , the destination site doesn't have that image. Chapter 4 fixes that.