lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. It's also very fast and memory friendly, just so you know. For an introduction and further ...
The larger purpose of this script it to migrate content from the Atavist publishing platform to Wordpress. The problem is that Atavist only exports to JSON, which has an array of html content split ...