PHP Toolkit

BlockParser

WordPress core's block parser, packaged as a standalone library. Turn block markup into a structured tree, lint posts for common authoring mistakes, and audit block usage — all without booting WordPress.

composer require wp-php-toolkit/blockparser

Block markup is not plain HTML. A post can contain HTML comments that identify blocks, JSON attributes inside those comments, freeform HTML between blocks, and nested blocks whose rendered HTML is interleaved with parent markup.

This component packages WordPress core's block parser so importers, linters, migration tools, and static analyzers can understand block content without loading WordPress. It deliberately mirrors core behavior — same array shape, same null blocks for freeform HTML, same core block names such as core/paragraph — so code written against this parser keeps working when run inside WordPress, and vice versa.

Reach for it when you need answers about the block tree: which blocks a post uses, which attributes they carry, where nested blocks appear, or whether content violates a rule your project cares about.

What you get back

WP_Block_Parser::parse() returns an array of blocks. Each block is an associative array with five keys: blockName, attrs, innerBlocks, innerHTML, and innerContent.

innerHTML is the HTML inside the block with inner blocks stripped out. innerContent is the interleaved version: an array of HTML strings with null placeholders marking where each inner block belongs.

Most code starts by checking blockName, then reading attrs or innerHTML. When a post has container blocks such as Group, Columns, or Navigation, look inside innerBlocks too.

Parse a document

The simplest possible use. Pass a string, get back a tree.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

$document = "<!-- wp:heading {\"level\":2} -->\n<h2>Welcome</h2>\n<!-- /wp:heading -->\n\n"
	. "<!-- wp:paragraph -->\n<p>Hello from the block editor.</p>\n<!-- /wp:paragraph -->";

$blocks = ( new WP_Block_Parser() )->parse( $document );
foreach ( $blocks as $block ) {
	if ( null === $block['blockName'] ) {
		continue;
	}
	echo $block['blockName'] . ': ' . trim( strip_tags( $block['innerHTML'] ) ) . "\n";
}

Count every block type in a post

A common audit task: "How many Paragraph, Image, and Gallery blocks does this post use?" A small queue keeps the example readable while still visiting nested blocks.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

$document = "<!-- wp:group --><div class=\"wp-block-group\">"
	. "<!-- wp:heading --><h2>Title</h2><!-- /wp:heading -->"
	. "<!-- wp:paragraph --><p>One.</p><!-- /wp:paragraph -->"
	. "<!-- wp:paragraph --><p>Two.</p><!-- /wp:paragraph -->"
	. "<!-- wp:image {\"id\":1} --><figure><img src=\"a.jpg\"/></figure><!-- /wp:image -->"
	. "</div><!-- /wp:group -->";

$blocks = ( new WP_Block_Parser() )->parse( $document );

$counts = array();
$queue  = $blocks;

while ( ! empty( $queue ) ) {
	$block = array_shift( $queue );

	if ( null !== $block['blockName'] ) {
		$name             = $block['blockName'];
		$counts[ $name ] = isset( $counts[ $name ] ) ? $counts[ $name ] + 1 : 1;
	}

	foreach ( $block['innerBlocks'] as $inner_block ) {
		$queue[] = $inner_block;
	}
}

arsort( $counts );
foreach ( $counts as $name => $n ) {
	echo str_pad( (string) $n, 4, ' ', STR_PAD_LEFT ) . '  ' . $name . "\n";
}

Check whether a post uses a block

Useful for templates, audits, and migrations: answer one yes/no question without caring where the block appears in the tree.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

$document = "<!-- wp:group --><div class=\"wp-block-group\">"
	. "<!-- wp:buttons --><div class=\"wp-block-buttons\">"
	. "<!-- wp:button --><div class=\"wp-block-button\"><a>Buy now</a></div><!-- /wp:button -->"
	. "</div><!-- /wp:buttons -->"
	. "</div><!-- /wp:group -->";

$blocks = ( new WP_Block_Parser() )->parse( $document );

function post_has_block( $blocks, $name ) {
	$queue = $blocks;

	while ( ! empty( $queue ) ) {
		$block = array_shift( $queue );
		if ( $name === $block['blockName'] ) {
			return true;
		}

		foreach ( $block['innerBlocks'] as $inner_block ) {
			$queue[] = $inner_block;
		}
	}

	return false;
}

echo post_has_block( $blocks, 'core/button' ) ? "has button\n" : "missing button\n";
echo post_has_block( $blocks, 'core/gallery' ) ? "has gallery\n" : "missing gallery\n";

Lint headings for hierarchy mistakes

"Don't skip from H2 to H4" is a real accessibility rule. The helper below keeps headings in document order, including headings nested inside Group, Column, and Cover blocks.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

$document = "<!-- wp:heading -->\n<h2>Intro</h2>\n<!-- /wp:heading -->"
	. "<!-- wp:heading {\"level\":4} -->\n<h4>Subsection</h4>\n<!-- /wp:heading -->"
	. "<!-- wp:heading {\"level\":3} -->\n<h3>Body</h3>\n<!-- /wp:heading -->";

$blocks = ( new WP_Block_Parser() )->parse( $document );

function collect_headings( $blocks, &$headings ) {
	foreach ( $blocks as $block ) {
		if ( 'core/heading' === $block['blockName'] ) {
			$headings[] = array(
				'level' => isset( $block['attrs']['level'] ) ? (int) $block['attrs']['level'] : 2,
				'text'  => trim( strip_tags( $block['innerHTML'] ) ),
			);
		}

		collect_headings( $block['innerBlocks'], $headings );
	}
}

$headings = array();
collect_headings( $blocks, $headings );

$last = 1;
foreach ( $headings as $heading ) {
	$level = $heading['level'];
	$label = $heading['text'];

	if ( $level > $last + 1 ) {
		echo "WARN {$label}: jumped from H{$last} to H{$level}\n";
	} else {
		echo "ok   {$label}: H{$level}\n";
	}
	$last = $level;
}

Find all instances of a custom block

When auditing an export for a block your plugin owns, collect every match and print the fields a human cares about.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

$document = "<!-- wp:paragraph --><p>Reviews</p><!-- /wp:paragraph -->"
	. "<!-- wp:my-plugin/testimonial {\"author\":\"Jane\",\"rating\":5} -->"
	. "<blockquote>Loved it.</blockquote>"
	. "<!-- /wp:my-plugin/testimonial -->"
	. "<!-- wp:my-plugin/testimonial {\"author\":\"Joe\",\"rating\":4} -->"
	. "<blockquote>Pretty good.</blockquote>"
	. "<!-- /wp:my-plugin/testimonial -->";

$blocks = ( new WP_Block_Parser() )->parse( $document );

function find_blocks_by_name( $blocks, $name, &$matches ) {
	foreach ( $blocks as $block ) {
		if ( $name === $block['blockName'] ) {
			$matches[] = $block;
		}

		find_blocks_by_name( $block['innerBlocks'], $name, $matches );
	}
}

$testimonials = array();
find_blocks_by_name( $blocks, 'my-plugin/testimonial', $testimonials );

foreach ( $testimonials as $i => $b ) {
	echo ( $i + 1 ) . '. ' . $b['attrs']['author'] . ' (' . $b['attrs']['rating'] . '/5): '
		. trim( strip_tags( $b['innerHTML'] ) ) . "\n";
}

Detect blocks with stale embed URLs

A real-world content audit: find every core/embed whose URL points at a domain you have retired.

<?php
require '/wordpress/wp-content/php-toolkit/vendor/autoload.php';

$document = <<<'HTML'
<!-- wp:embed {"url":"https://twitter.com/wordpress/status/1","providerNameSlug":"twitter"} /-->
<!-- wp:embed {"url":"https://youtube.com/watch?v=abc","providerNameSlug":"youtube"} /-->
<!-- wp:embed {"url":"https://vine.co/v/xyz","providerNameSlug":"vine"} /-->
HTML;

$retired = array( 'vine.co', 'plus.google.com' );

foreach ( ( new WP_Block_Parser() )->parse( $document ) as $b ) {
	if ( 'core/embed' !== $b['blockName'] ) {
		continue;
	}
	$url  = isset( $b['attrs']['url'] ) ? $b['attrs']['url'] : '';
	$host = parse_url( $url, PHP_URL_HOST );
	$bad  = $host && in_array( $host, $retired, true );
	echo ( $bad ? 'STALE  ' : 'ok     ' ) . $url . "\n";
}

Pitfalls

See also