ADIF Parser - adding ADIX support
This is the second part of the post “ADIF Parser in Python“ - therefore, if you have not read that post you should start there. This picks up and adds ADIX support to our parser and does so in a manner that requires no change (but we make a few for convenience) to the original parser. It also does not require any change to the actual ADIF validation or importing processes.
The code presented here could greatly benefit from being a class or using nested functions as it would eliminate the horrible use of globals but I wanted to keep with using a simple procedural style to be easy to understand and follow along. Those who use OO or understand nested functions can easily alter this code to use those paradigms. The complete source for the resulting parser will be provided as a link at the end of this post.
Our goals for this post are:
- Add ADIX (XML-based ADIF) parsing to our existing parser
- Change no code (or as little as possible) of our original parser
- Share all validation and import processes between the two formats
The first thing we need to do is add an import for the fast Expat based XML parser to the top of our file.
The Expat XML parser is an event based parser. When a new tag is found, it executes a function of your choice. When character data is found, it again executes a function of your choice and finally when a tag is ended, yes, it again executes a function of your choice. So, let’s start by creating those three functions.
def startElement(name, attrs): global fieldName fieldName = name.lower()
Here is a dirty use of a global, fieldName. This is necessary in our simple procedural style as the character data function does not get the tag name passed to it. In this case when the parser finds a tag, say: <call>AA1A</call> this startElement function will be called with the name value of “call” and attrs of … an empty list. We save this name to our global fieldName variable as lower case, thus it matches exactly to our ADI parser and it’s use of all lower case field names (remember, it converts all field names to lower case as well).
def charData(data): global rec, fieldName data = data.rstrip() if len(data) > 0 and fieldName != None: rec[fieldName] = data
This charData function is executed for data contained inside of any begin/end tags. We strip the trailing whitespace characters off as they are not necessary. We also check to make sure that there is actually content. For example, <record><call>…</call></record>… the charData function would be called for both the record and call tags even though the record tag has no character data associated with it. Once stripped of any whitespace the data value will be a zero length for the record tag so we skip doing anything with it.
def endElement(name): global recs, rec if name == "record": adifFixup(rec) recs.append(rec) rec = dict()
This method is called when an end tag is found, i.e.: </call>. We do nothing unless the tag is ending a record. In that case we call our common adifFixup function (which contains our common validation and harmonization) and then append all the data we have just stored in the rec dictionary to our list of records, recs.
Now that we have our three essential Expat functions defined we can start in with the actual parsing of the XML file. Our method to parse the ADI file was called adiParse thus we are going to name this one adixParse.
def adixParse(raw): global fieldName, rec, recs fieldName = None rec = dict() recs =  p = xml.parsers.expat.ParserCreate() p.StartElementHandler = startElement p.EndElementHandler = endElement p.CharacterDataHandler = charData p.Parse(raw) return recs
This method is pretty self explanatory. We:
- Provide default values for our global data collection variables fieldName, rec and recs
- Create the Expat XML parser
- Assign the critical event handler functions
- Trigger the parse method of the Expat XML parser
- Return the resulting recs variable (as though it weren’t globally accessible)
Now, if you remember our adiParse method, we passed the file name in not the actual ADIF content. That is the only thing I’ve changed this time around. I did this because we will provide a generic parse method that will automatically determine what kind of file is being parsed. Thus, you only need to know one method’s name for any type of ADIF file, whether it be an ADI, ADIX or some other file format adopted later on. Before we get to that method, let’s change our old adiParse method to no longer read the file but have the contents passed to it. Our old method began with:
def adiParse(filename): raw = readfile(filename) # Find the EOH, in this simple example we are skipping # header parsing.
Let’s change that to read:
def adiParse(raw): # Find the EOH, in this simple example we are skipping # header parsing.
Notice the parameter name changed from filename to raw and the raw = readfile(filename) line was removed. At this point, you can also remove the entire readfile function that we created in the first version. It will no longer be used. I created it in anticipation of both adiParse and adixParse using it but I changed things around a bit making it easier for everyone.
Ok, the final piece of the puzzle is the adifParse method, the one that you will call to do any work, whether it be an ADI or ADIX file. The adifParse file will read the file’s content, look for an XML signature and if found execute the adixParse method otherwise it will execute the adiParse method. Here it is:
def adifParse(filename): fh = open(filename, 'r') content = fh.read() fh.close() isXml = content.find(" -1: return adixParse(content) else: return adiParse(content)
Oh, we need to change our main method code as well. Right now it calls adiParse directly. It currently looks like:
recs = adiParse(sys.argv) for rec in recs: print rec
Simply change the call to adiParse to be adifParse. You can then pass an old ADIF file (even 1.x!), a new ADIF file (with Unicode fields) or even an ADIX file to your program:
recs = adifParse(sys.argv) for rec in recs: print rec
I hope that you have enjoyed this little exercise and what I really hope you’ve done was see how easy it is to support both formats with full unicode support without having to do any duplicate work. By my count it’s 94 total lines of code for the entire parser, Unicode supported in both ADIF and ADIX. In the end we have an ADIF system that is completely harmonized and beautiful in it’s implementation and use both for the programmer and the end user.
Complete code and example files for this project can be found at: http://www.kb8lfa.com/adif/pyadif/.