Recap and where to go next
Across four chapters you built a working content importer. It reads a ZIP of Markdown posts, cleans the HTML inside each one, frontloads referenced images over HTTP, and streams a WXR file the WordPress importer plugin will accept. None of it required curl, libzip, libxml2, or DOMDocument; all of it runs on PHP 7.2 through 8.3 and inside a browser via WordPress Playground.
What you built
| Chapter 1 | → | clean_post_html() using WP_HTML_Tag_Processor: lazy-load images, rewrite URLs, strip scripts, all in one pass. |
| Chapter 2 | → | Read the input ZIP through ZipFilesystem, stage it in InMemoryFilesystem, defend against zip-slip with ZipDecoder::sanitize_path(). |
| Chapter 3 | → | Convert each post with MarkdownConsumer, audit the output with WP_Block_Parser, stream the WXR with WXRWriter. |
| Chapter 4 | → | Frontload images with HttpClient through a sliding-window event loop; mount remote archives with SeekableRequestReadStream. |
What the toolkit does that the tutorial didn't touch
The importer used eight components. The toolkit ships eighteen. Here's what's left, with the use case each one shows up in:
- Git — snapshot your importer's runs into a pure-PHP Git repository for revision history. Useful for "what changed between last week's import and this week's." Reference →
- Merge — three-way diff and merge for content sync. If posts edit on both the source and the destination side, this is how you reconcile them. Reference →
- HttpServer — a tiny local listening port for OAuth callbacks during a CLI workflow, fixture servers for HttpClient tests, or a status page during a long import. Not for production traffic. Reference →
- CORSProxy — when you ship the importer as a browser tool, a server-side proxy to fetch URLs that don't send the right CORS headers. Reference →
- CLI — POSIX-style argument parser to wrap your importer as
importer.php --site-url=… --dry-run. Reference → - Encoding — UTF-8 validation and scrubbing for inputs that may contain mixed encodings. Most importers eventually need it. Reference →
- XML — the cursor-based XML processor underneath DataLiberation; reach for it directly when you need to walk export-sized files. Reference →
- Blueprints — declarative site setup. Spin up the destination WordPress with the right plugins and options before running the importer against it. Reference →
- Polyfill — WordPress-shaped helpers (
esc_html,add_filter,__) so toolkit code can run outside WordPress without ifdefs. Reference → - ToolkitCodingStandards — PHPCS sniffs encoding the project's review feedback as enforceable rules. Borrow if your project follows WordPress style. Reference →
Patterns worth keeping
Three shapes recurred across the tutorial. Watch for them in your own code:
Cursor over a string
WP_HTML_Tag_Processor walks a string forward, records edits as a side-buffer of byte-range replacements, and emits the modified string only when you call get_updated_html(). The result is byte-honest — bytes you didn't edit come through bit-identical. When you need to make small changes to large markup, that property is gold. The XML component's XMLProcessor applies the same pattern to XML.
Pull / consume streams
ZipFilesystem::open_read_stream(), HttpClient response bodies, InflateReadStream, and the rest all share the same shape: pull(N) reads up to N bytes from the underlying source into an internal buffer and returns how many ended up there; consume(N) reads N bytes from that buffer and advances past them. Memory used is bounded by the chunk size, never by the file size. Once you internalize this loop you can compose any byte source with any byte sink.
One interface, multiple backends
Code that takes a Filesystem rather than a path doesn't care if the filesystem is on disk, in memory, in a SQLite database, or inside a ZIP. That's how the importer's stage works for both production (memory) and debugging (local disk) without a code change. Same pattern shows up in HttpClient (curl vs sockets transport) and ByteStream (file, memory, deflate, hash all implementing the same byte-stream interface).