Thiessen Polygons and Listing Neighboring Features
I was helping someone with a project recently that I thought would be straightforward but turned out to be rather complex. We had a list of about 10,000 addresses that had to be plotted as coordinates, and then we needed to create Thiessen or Voroni polygons for each point to create market areas. Lastly we needed to generate an adjacency table or list of neighbors; for every polygon list all the neighboring polygons.
For step one I turned to the USC Geocoding service to geocode the addresses; I became a partner a ways back so I could batch geocode datasets for students and faculty on my campus. Once I had coordinates I plotted them in ArcGIS 10 (and learned that the Add XY data feature had been moved to File > Add Data > Add XY Data). Step 2 seemed easy enough; in Arc you go to ArcToolbox > Analysis Tools > Proximity > Create Thiessen Polygons. This creates a polygon for each point and assigns the attributes of each point to the polygon.
I hit a snag with Step 3 – Arc didn’t have a tool for generating the adjacency table. After a thorough search of the ESRI and Stack Exchange forums, I stumbled on the Find Adjacent Features Script by Ken Buja which did exactly what I wanted in ArcGIS 9.2 and 9.3, but not in 10. I had used this script before on a previous project, but I’ve since upgraded and can’t go back. So I searched some more until I found the Find Adjacent & Neighboring Polygons Tool by cmaene. I was able to add this custom toolbox directly to ArcToolbox, and it did exactly what I wanted in ArcGIS 10. I get to select the unique identifying field, and for every ID I get a list of the IDs of the neighboring polygons in a text file (just like Ken’s tool). This tool also had the option of saving the list of neighbors for each feature directly in the attribute table of a shapefile (which is only OK for small files with few neighbors; fields longer than 254 characters get truncated), and it gave you the option of listing neighbors to the next degree (a list of all the neighbor’s neighbors).
Everything seemed to run fine, so I re-ran the tool on a second set of Thiessen polygons that I had clipped with an outline of the US to create something more geographically realistic (so polygons that share a boundary only in the ocean or across the Great Lakes are not considered neighbors).
THEN – TROUBLE. I took some samples of the output table and checked the neighbors of a few features visually in Arc. I discovered two problems. First, I was missing about a thousand records or so in the output. When I geocoded them I couldn’t get a street-level address match for every record; the worse case scenario was a plot to the ZCTA / ZIP code centroid for the address, which was an acceptable level of accuracy for this project. The problem is that if there are many point features plotted to the same coordinate (because they share the same ZIP), a polygon was created for one feature and the overlapping ones fell away (you can’t have overlapping Thiessen polygons). Fortunately this also wasn’t an issue for the person I was helping; we just needed to join the output table back to the master one to track which ones fell out and live with the result.
The bigger problem was the output was wrong. I discovered that the neighbor list for most of the features I checked, especially polygons that had borders on the outer edge of the space, had incomplete lists; each feature had several (and in some cases, all) neighbors missing. Instead of using a shapefile of Thiessen’s I tried running the tool on polygons that I generated as feature classes within an Arc geodatabase, and got the same output. For the heck of it I tried dissolving all the Thiessen’s into one big polygon, and when I did that I noticed that I had orphaned lines and small gaps in what should have been one big, solid rectangle. I tried checking the geometry of the polygons and there were tons of problems. This led me to conclude that Arc did a lousy job when constructing the topology of the polygons, and the neighbor tool was giving me bad output as a result.
Since I’ve been working more with GRASS, I remembered that GRASS vectors have strict topology rules, where features have shared boundaries (instead of redundant overlapping ones). So I imported my points layer from a shapefile into GRASS and then used the v.voroni tool to create the polygons. The geometry looked sound, the attributes of each point were assigned to a polygon, and for overlapping points one polygon was created and attributes of the shared points were dumped. I exported the polygons out as a shapefile and brought them back into Arc, ran the Find Adjacent & Neighboring Polygons tool, spot checked the neighbors of some features, and voila! The output was good. I clipped these polygons with my US outline, ran the tool again, and everything checked out.
Morals of this story? When geocoding addresses consider how the accuracy of the results will impact your project. If a tool or feature doesn’t exist assume that someone else has encountered the same problem and search for solutions. Never blindly accept output; take a sample and do manual checks. If one tool or piece of software doesn’t work, try exporting your data out to something else that will. Open source software and Creative Commons tools can save the day!
Footnote – apparently it’s possible to create lists of adjacent polygons in GRASS using the sides option in v.to.db, although it isn’t clear to me how this is accomplished; the documentation talks about categories of areas on the right and left of a boundary, but not on all sides of an area. Since I already had a working solution I didn’t investigate further.