PHP Toolkit

Quickstart

By the end of this page you will have rewritten an HTML attribute in five lines of PHP, in a runtime that lives inside this browser tab. You'll see the shape every chapter of the tutorial follows: a problem in plain English, a small chunk of code, and a paragraph that points at what to look at.

Install

You don't need to install anything to follow the tutorial — the snippets on this site run in your browser. If you want to run the same code in your own project later, this is the line you'll copy:

composer require wp-php-toolkit/html

Each component installs separately; you only pull in what you use. The HTML component depends on nothing except PHP itself.

Rewrite an attribute

Here's the smallest useful thing the toolkit does. The example feeds a snippet of HTML into WP_HTML_Tag_Processor, finds every <img> tag, and adds loading="lazy" if the author didn't already set loading themselves.

Click Run. The first run on this page boots a PHP runtime in WebAssembly and unzips the toolkit into it; later runs reuse the same runtime, so they're instant.

Look at the output. The first <img> gained loading="lazy". The second one — which had loading="eager" already — was left alone. The whitespace, the <p> tag, the <article> wrapper, every byte we didn't ask the processor to touch came through unchanged. That property is the entire reason this component exists: rewriting HTML byte-for-byte without re-serializing it.

Why a cursor, not a DOM

The traditional PHP move here is DOMDocument::loadHTML. That works, but loading 50 KB of post content into a libxml DOM, mutating it, and serializing it back gives you a string that's nearly the same as the input — different whitespace, normalized attribute quoting, occasionally a self-closing tag where there wasn't one before. For email templates and feed readers that compare strings byte-for-byte, that's a bug.

The Tag Processor walks the HTML linearly, records edits as a small list of byte-range replacements, and applies them only when you call get_updated_html(). The HTML you didn't edit comes through bit-identical. The HTML you edited contains exactly your edits, and nothing else.

That model — small, linear, byte-honest — is the toolkit's whole sensibility. Every other component that follows uses some version of it.

Recap

You can now:

That's the whole shape of the tutorial. Each chapter takes one component, shows you the smallest useful thing it does, and folds the result into a content importer that grows page by page.

In chapter 1 you'll meet the canonical importer's first input — a folder of Markdown posts whose embedded HTML needs cleaning before it ever sees a WordPress database. We'll add lazy loading, rewrite relative URLs, and strip event handlers in a single linear pass.