PHP Toolkit

XML

A streaming, namespace-aware XML processor in pure PHP. Read and modify huge feeds, WXR exports, ePub manifests, and Office Open XML parts without ever loading the document into memory and without depending on libxml2.

composer require wp-php-toolkit/xml

SimpleXMLElement and DOMDocument both need libxml2 and both build a complete in-memory tree. XMLProcessor walks the document forward as a cursor, keeps modifications in a side buffer, and emits the full updated XML with get_updated_xml() only when you ask for it.

This design came from WordPress-scale documents such as WXR exports. A migration may only need to rewrite wp:attachment_url values or bump a feed attribute, so the processor optimizes for targeted cursor edits instead of a full validating XML stack.

Bump every price in a catalog

Find each <book>, read its price, write a new one, emit the updated document.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

use WordPress\XML\XMLProcessor;

$xml = <<<'XML'
<catalog>
<book sku="A1" price="29.99"><title>PHP Internals</title></book>
<book sku="A2" price="14.50"><title>WordPress at Scale</title></book>
</catalog>
XML;

$p = XMLProcessor::create_from_string( $xml );
while ( $p->next_tag( 'book' ) ) {
	$old = (float) $p->get_attribute( '', 'price' );
	$new = number_format( $old * 1.10, 2, '.', '' );
	$p->set_attribute( '', 'price', $new );
}

echo $p->get_updated_xml();

Read namespaced attributes from a WXR export

WordPress's WXR commonly uses wp:, dc:, and content: prefixes bound to namespace names such as http://wordpress.org/export/1.2/. Pass that expanded namespace name, not the prefix; the processor handles whichever prefix the document actually uses.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

use WordPress\XML\XMLProcessor;

$wxr = <<<'XML'
<?xml version="1.0"?>
<rss xmlns:wp="http://wordpress.org/export/1.2/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel><item>
<title>Hello World</title>
<dc:creator>admin</dc:creator>
<wp:post_id>42</wp:post_id>
<wp:status>publish</wp:status>
</item></channel></rss>
XML;

$WP = 'http://wordpress.org/export/1.2/';
$DC = 'http://purl.org/dc/elements/1.1/';

$p = XMLProcessor::create_from_string( $wxr );
while ( $p->next_tag( 'item' ) ) {
	while ( $p->next_token() ) {
		if ( $p->is_tag_closer() && 'item' === $p->get_tag_local_name() ) break;
		if ( ! $p->is_tag_opener() ) continue;
		$ns = $p->get_tag_namespace();
		$local = $p->get_tag_local_name();
		$prefix = ( $WP === $ns ) ? 'wp/' : ( ( $DC === $ns ) ? 'dc/' : '' );
		echo "{$prefix}{$local}: ";
		while ( $p->next_token() && '#text' !== $p->get_token_name() ) {}
		echo trim( $p->get_modifiable_text() ) . "\n";
	}
}

Rewrite URLs across an entire WXR export

Large WXR exports can hold many URLs in <link>, <guid>, and post content. Streaming the file lets you rewrite large exports without loading the whole XML document into memory.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

use WordPress\XML\XMLProcessor;

$wxr = <<<'XML'
<?xml version="1.0"?><rss xmlns:wp="http://wordpress.org/export/1.2/"><channel>
<wp:base_site_url>https://old.example.com</wp:base_site_url>
<item><link>https://old.example.com/2024/post-1</link>
<guid>https://old.example.com/?p=1</guid></item>
</channel></rss>
XML;

$from = 'https://old.example.com';
$to   = 'https://new.example.com';

$p = XMLProcessor::create_from_string( $wxr );
$rewritten = 0;

while ( $p->next_token() ) {
	if ( '#text' !== $p->get_token_name() ) continue;
	$text = $p->get_modifiable_text();
	if ( false === strpos( $text, $from ) ) continue;
	$p->set_modifiable_text( str_replace( $from, $to, $text ) );
	$rewritten++;
}

echo "rewrote {$rewritten} text nodes\n\n";
echo $p->get_updated_xml();

Parse OPML to extract feed URLs

OPML is the format Feedly and many readers use to import/export feed lists. Flat, attribute-heavy XML — exactly what a tag processor handles best.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

use WordPress\XML\XMLProcessor;

$opml = <<<'XML'
<?xml version="1.0"?><opml version="2.0"><head><title>My Feeds</title></head>
<body>
<outline text="Tech"><outline text="Hacker News" type="rss" xmlUrl="https://news.ycombinator.com/rss"/>
<outline text="LWN" type="rss" xmlUrl="https://lwn.net/headlines/rss"/></outline>
<outline text="WordPress" type="rss" xmlUrl="https://wordpress.org/news/feed/"/>
</body></opml>
XML;

$p = XMLProcessor::create_from_string( $opml );
while ( $p->next_tag( 'outline' ) ) {
	$url = $p->get_attribute( '', 'xmlUrl' );
	if ( null === $url ) continue;
	echo $p->get_attribute( '', 'text' ) . "\t" . $url . "\n";
}

Pitfalls

See also