Recently, I found myself having to deal with XML files – specifically, PSI MI XML version 2.5 as used by the MINT and IntAct databases. Being a relative novice to parsing this kind of XML, I found it pretty painful. Normally I’d look to BioPerl but their Bio::Graph modules are rather far from “production” (for me they work only on a small range of PSI MI version 1 files).
I highly recommend the O’Reilly XML.com site. Lots of tutorials and introductory material that should point you in the right direction. Of course eventually, you’ll have to settle on a module of choice – for me XML::Twig was overkill, XML::Simple too simple and I ended up with XML::SimpleObject – works for me. I still find it hard to get my head around XML parsing – one Perl head argues that Perl mentality and XML mentality don’t sit well together and I’m inclined to agree. “Why would any freedom-loving Perl poet submit to this insanity?” he asks, in relation to the DOM.