XML is a ubiquitous format for data transfer, configurations, and so much more. Processing it can be easy, but it can also be a pain. As always with Perl, there is always more than one way to do it… So I’ve created a framework to test, and compare some of the modules available.
The modules I considered were:
XML::Simple XML::Smart XML::Parser (using Tree style) XML::Twig
My tests take a given XML (a list of films as generated by Mediathek), parse out the individual film entries, and print them out to a csv file. I don’t actually need this, but it’s a typical functionality you might need when working with XML, and allows some good comparisons with both small and huge file sizes.
Considerations:
The main difference between the clever modules (XML::Simple and XML::Smart) and the more grass-roots (XML::Parser and XML::Twig) is the method they use to work through the XML. To be clever, the clever ones suck in the whole XML, analyse it, and create a good hash/array representation of the elements. This has the advantage of having an easy to use format to work with afterwards, but also requires sucking in the whole XML in one go: not a problem for small files, but it quickly becomes dangerous because the system will use 10-20 times the file size in memory: a 60MB file can easily consume 1GB in memory! XML::Twig by contrast works through the file element by element, and so its memory usage is very controllable: it is possible to parse hundreds of megabytes without the perl application expanding beyond 12MB in memory! On the other hand: the clever modules are much easier to use, and you don’t need to know anything about your XML to process it. That makes these modules much more attractive for small (e.g. configuration) files.
For the details of the implementation, please have a look at the code examples:
git clone git@github.com:robin13/rcl-ironman.git
My tests showed these relationships, though your mileage might vary – it depends a lot on the depth/complexity, and variance in size (big on Mondays, small on Thursdays?) of the XML you are dealing with.
By Speed (seconds to process a 66MB file)
XML::Parser 26.5 XML::Simple 76.0 XML::Twig 132.8 XML::Smart 394.9
By Usability (subjective) to implement
XML::Simple Easy XML::Twig Usable XML::Parser Complex XML::Smart Not so smart...
By Memory consumption (system memory in kB used for a 66MB file)
XML::Twig 972 XML::Parser 506532 (using Tree style) XML::Simple 628268 XML::Smart 1336604
To summarise:
If you want high performance: XML::Parser
If you want relatively easy, memory efficient parsing of huge files: XML::Twig
If you want easy-to-implement for small files: XML::Simple
If you want to have a bad deal: XML::Smart
What I left out:
Probably a lot of usable modules… If you know of something important, please comment!
Input/Output: XML::Simple and XML::Parser can output XML too. That might be important for you (e.g. config files)!
My code is proof of concept, not highly optimised: feel free to improve, or even add other modules to the tests. Git makes that easy! 🙂