The power of XSL Transforms

at 10:46 am

I began working on ZSDKP a few weeks ago and figured out that a key feature of the application would be its ability to scrape items from other sites like Wowguru and Allakhazam. The basic premise is that these sites provide their item XML data and I can just parse it to inject into my database. I would have loved nothing more than to use Wowhead but they do not give you a link (that I can find) to their XML data.

However, I quickly encountered a key problem: I wanted to use one parsing algorithm for all sites that would create the item based on my own field names. This meant I was susceptible to a key pitfall: wowguru and allakhazam use different XML structures from each other which is, additionally, different from the scheme I use.

Enter: XSL Transforms!

The Problem

So, here is a couple examples of what I am faced with.

Wowguru uses this type of XML structure:

http://pastebin.com/fc8178c5

Allakhazam uses this type of XML structure:

http://pastebin.com/f5ed2fc7a

As you can see, there is a number of tags that are different, unused, or extra. Parsing this as a normal XML file would be an absolute nightmare. Compounding the problem is the fact that I want my XML files to look like:

http://pastebin.com/f79430635

The Solution

As I was pondering the question of how to deal with such a problem, I asked for some ideas from a computer forum I frequent. It was pointed out to me that XSL transforms would allow me to do this job, all I needed to do was set up the templates.

Let me state something now: whoever came up with this is a genious because he/she saved me countless hours of pain and anguish. Now, with this basic XML template, I can totally reformat both Wowguru and Wowallakhazam’s XML structure to meet any structure I want. So, with two small templates and a couple hours work, I had two fully-functioning transforms. Here is a copy of the Wowguru transform:

http://pastebin.com/f74baab68

As you can see, with some manipulation of the data being extracted from the original XML file, I can realign it to match my desired format. The Allakhazam implementation is far more impressive because it takes advantage of some recursive calls to sort out class restrictions(WoWAllakhazam uses a number system akin to Unix file permissions to store class restrictions, my format uses straight names, so I needed to parse out the value of the number and translate it into a list of names).

Conclusion

If you have XML files that need to be parsed and returned in a new format, don’t question any other method. Get your butt in gear and use XSL to get the job done on the quick. Most web server implementations support this now, so there is no overhead set up needed. You will be causing yourself undue headaches to try anything else. A word to the wise though: XSL is a declarative language, not procedural. That is, there are no ways to reassign variable vlaues or anything like that. Recursion must be used to do any rolling variable changes by calling the same function again with a modified value.

If you know how to program in languages such as LISP or its derivatives, this shouldn’t be much of an issue.

bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Leave a Reply