Joe Ganley

I make software and sometimes other things.


Due to Blogger's termination of support for FTP, this blog is no longer active.

It is possible that some links from here, particularly those within the site, are now broken. If you encounter one of those, your best bet is to go to the new front page and hunt for it from there.

Most, but not all, of the blog's posts are on this page; the archives are here.

Scotland, PAParsing postal addresses is surprisingly difficult. The USPS document that describes how to properly format US addresses is over 200 pages long. How many types of road (e.g. Street, Court, Avenue, etc.) do you suppose are in that document? I once tried to list all of the ones I could think of, and came up with a couple of dozen. That document contains over 200, including some strange ones such as Loaf and Stravenue. Further complicating matters, many names can serve more than one purpose depending on where in an address they are; for example, according to the 02000 census data, there are at least 68 cities in the US whose names are the names of states (including 25 cities named Washington). And of course, US rules don't apply in other countries, whose rules are all different. Supposing you had a good parser for US addresses, you'd like to be able to determine if an address was in the US. One way might be to just search for the names of other countries in your string. However, again from the census data, there are at least 85 cities in the US whose names are the names of countries elsewhere in the world (including, strangely, 17 cities named Lebanon).

It's one of those problems where you can throw together a 75% solution in a few hours, but from there each subsequent level of improvement gets increasingly difficult. It's no surprise, then, that the address parsing software out there is either really expensive, or doesn't work very well, or both. On the long list of projects I hope to get to someday is to write a good open-source address-parsing package.

Labels: , ,

Comments (0)