Right now the ADIF group is making one of it’s most important decisions ever, to extend ADIF to include a XML-based format to include new features and possibly leave the existing ADIF format unchanged (new features added only to the new format). The feature that triggered this change was the addition of Unicode to the ADIF format. A working proposal exists to simply add new field names to the existing ADIF format. This blog post goes over a simple ADIF parser written in Python and how to extend it to handle Unicode data.

Python is not my primary language. I choose to use it because it’s readily available on most platforms and easily understood. This parser was written for this blog post. It was not written for the purpose of being copy and pasted into a production environment. It has no error checking and no real validation. These topics are not necessary to show how to extend the parser for Unicode data thus they would just add to the complexity and the real meaning would be lost. I will not be explaining python constructs or methods. I am assuming the ability to read and understand this code.

The entire code for this sample is only 49 lines of code (excluding comments) but with all the comments here in the blog, it makes it look much longer.

Getting things started we need two imports for python. The first is “sys” which gives us access to command line arguments and the second is “re” which is a regular expression library that allows us to do case insensitive searches (don’t worry, no regex is actually used!)

import sys
import re

Looking into future posts of reading different file formats, I decided to create a quick helper function to read the entire contents of a file into memory. Most ADIF files are not large enough to be of a concern on a modern computer but if your program processes huge ADIF files, you’ll want to read a record in at a time for memory purposes.

def readfile(filename):
    fh = open(filename, 'r')
    content = fh.read()
    return content

Now, in our sample our parser will be reading a record and creating a dictionary representation (with lower case keys). Thus, we create a format independent procedure to clean the record up. For example, if the band was specified but band_rx was not, assume it’s the same as band. Other things should be done here such as DXCC lookups, distance calculations, etc… That’s not the purpose of this discussion so we’ve included just a small “fixup” procedure.

def adifFixup(rec):
    if rec.has_key('band') and not rec.has_key('band_rx'):
        rec['band_rx'] = rec['band']
    if rec.has_key('freq') and not rec.has_key('freq_rx'):
        rec['freq_rx'] = rec['freq']

Now we get to the real deal, the parser. The basic flow is easy. We track the position through the file using the ‘pos’ variable.

  1. First, we skip the header parsing in this simple example, it’s not necessary for our example.

  2. Second, we simply use the strings find method to find the start of a field ‘<’ and from that position on the end of a field ‘>’.

  3. Third, we split the field definition by the ‘:’ character giving us possibly a list of up to three elements [name, length, size]. The name is converted to lower case for ease of use later on.

  4. Forth, depending on the field definition we read content (length > 0) or we append the collected record to our record list (name == ‘eor’)

def adiParse(filename):
    raw = readfile(filename)

    # Find the EOH, in this simple example we are skipping
    # header parsing.
    pos = 0
    m = re.search('', raw, re.IGNORECASE)
    if m != None:
        # Start parsing our ADIF file after the  marker
        pos = m.end()

    recs = []
    rec = dict()
    while 1:
        # Find our next field definition <...>
        pos = raw.find('<', pos)
        if pos == -1:
             return recs
        endPos = raw.find('>', pos)

        # Split to get individual field elements out
        fieldDef = raw[pos + 1:endPos].split(':')
        fieldName = fieldDef[0].lower()
        if fieldName == 'eor':
            adifFixup(rec)     # fill in information from lookups
            recs.append(rec)   # append this record to our records list
            rec = dict()       # start a new record

            pos = endPos
        elif len(fieldDef) > 1:
            # We have a field definition with a length, get it's
            # length and then assign the value to the dictionary
            fieldLen = int(fieldDef[1])
            rec[fieldName] = raw[endPos + 1:endPos + fieldLen + 1]
        pos = endPos
    return recs

Now, let’s add a bit of code to test our code. This accepts an ADIF file name on the command line and prints each record’s dictionary {‘call’:’AA1A’, ‘band’:’20m’, … }

recs = adiParse(sys.argv[1])
for rec in recs:
    print rec

So, at this point we have a fully functional ADIF parser and sample test program. If this were an actual program you would loop through the records importing them into a database or consuming the data in some other form.

Now, the fun part. Let’s extend it to accept Unicode data! This is an exercise to see how hard it actually is to accomplish this task. We need to change one line:

rec[fieldName] = raw[endPos + 1:endPos + fieldLen + 1]

to replace any &lt; that may exist in it’s content to the actual < character. To do this, replace the above line with:

rec[fieldName] = raw[endPos + 1:endPos + fieldLen + 1].replace('<', '<')< code="">

Your parser is now capable of parsing any Unicode field as defined by the “Extend ADIF with Unicode“ proposal. That’s it! Now, this is assuming that your language supports Unicode strings. If it does not, then you will have additional work on your hands in either proposal (Extend ADIF with Unicode or ADI/ADIX). To give this a try, download the file: http://www.kb8lfa.com/adif/ADIF_withUnicode.txt and specify it on the command line. Notice the contents of the file in your web browser. It specifies only the visible Unicode characters, not a byte count. This makes Unicode fields just as easy to edit as any other ADIF file. Please see the screen shot of this file being edited in Notepad on Windows at: ADIF Extended with Unicode - Editing in Notepad.

This parser took 22 minutes to write and 1 minute to extend to support the “Extend ADIF with Unicode“ proposal. Presumably you already have an ADIF parser written (with proper error handling) so all that is left to do is to decode &lt; to the character <.

I hope you’ve enjoyed this little tutorial. Soon I’ll show how to extend this parser (without changing any existing parsing code) to support another format, for example, ADIX. You’ll see how easy it is to implement both proposals and how they will compliment each other fully.