A brief note – I’ve updated and replaced the country centroids file that I was previously hosting. I extracted data with geographic centroids in latitude and longitude for each country and dependency in the world using extracts from the NGA’s GNS and the USGS GNIS. Data is current as of Feb 2012, with long and short names for countries and two letter alpha FIPS and ISO codes for identification and attribute linking. Available for download on the Resources page.
Posts Tagged ‘xy coordinates’
I recently received my first question from someone who wanted to compare 2005-2007 ACS data with 2008-2010. With the release of the latter, we can make historical comparisons with the three year data for the first time since we have estimates that don’t overlap. We should be able to make some interesting comparisons, since the first set covers the real estate boom years (remember those?) and the second covers the Great Recession. One resource that makes such comparisons relatively painless is over at the Missouri Census Data Center. They’ve put together a really clean and simple interface called the ACS Trends Menu, which allows you to select either two one period estimates or two three period estimates and compare them for several different census geographies – states, counties, MCDs, places, metros, Congressional Districts, PUMAs, and a few others – for the entire US (not just Missouri). The end result is a profile that groups data into the Economic, Demographic, Social, and Housing categories that the Census uses for its Demographic Profile tables. The calculations for change and percent change for the estimates and margins of error are done for you.
Downloading the data is not as straightforward – the links to extract it just brought me some error messages, so it’s still a work in progress. Until then, a simple copy and paste into your spreadsheet of choice will work fine.
If you like the interface, they’ve created separate ones for downloading profiles from any of the ACS periods or from the 2010 Census. The difference here is that you’re looking at one time frame; not across time periods. The interface and the output are the same, but in these menus you can compare four different geographies at once in one profile. Unlike the Trends reports, both the ACS and 2010 Census profiles have easy, clear cut ways to download the profiles as a PDF or a spreadsheet. If you’re happy with data in a profile format and want an interface that’s a little less confusing to navigate than the American Factfinder, these are all great alternatives (and if you’re building web applications these profiles are MUCH easier to work with – you can easily build permanent links or generate them on the fly).
The US Census Bureau also recently put together a great resource called the Guide to State and Local Census Geography. They provide a census geography overview of each state: 2010 population, land area, bordering states, year of entry into the union, population centroids, and a description of how local government is organized in the state – (i.e. do they have municipal civil divisions or only incorporated cities and unincorporated land, etc). You get counts for every type of geography – how many counties, tracts, ZCTAs, and so on, AND best of all you can download all of this data directly in tab delimited files. Need a list of every county subdivision in a state, with codes, land area, and coordinates? No problem – it’s all there.
I was helping someone with a project recently that I thought would be straightforward but turned out to be rather complex. We had a list of about 10,000 addresses that had to be plotted as coordinates, and then we needed to create Thiessen or Voroni polygons for each point to create market areas. Lastly we needed to generate an adjacency table or list of neighbors; for every polygon list all the neighboring polygons.
For step one I turned to the USC Geocoding service to geocode the addresses; I became a partner a ways back so I could batch geocode datasets for students and faculty on my campus. Once I had coordinates I plotted them in ArcGIS 10 (and learned that the Add XY data feature had been moved to File > Add Data > Add XY Data). Step 2 seemed easy enough; in Arc you go to ArcToolbox > Analysis Tools > Proximity > Create Thiessen Polygons. This creates a polygon for each point and assigns the attributes of each point to the polygon.
I hit a snag with Step 3 – Arc didn’t have a tool for generating the adjacency table. After a thorough search of the ESRI and Stack Exchange forums, I stumbled on the Find Adjacent Features Script by Ken Buja which did exactly what I wanted in ArcGIS 9.2 and 9.3, but not in 10. I had used this script before on a previous project, but I’ve since upgraded and can’t go back. So I searched some more until I found the Find Adjacent & Neighboring Polygons Tool by cmaene. I was able to add this custom toolbox directly to ArcToolbox, and it did exactly what I wanted in ArcGIS 10. I get to select the unique identifying field, and for every ID I get a list of the IDs of the neighboring polygons in a text file (just like Ken’s tool). This tool also had the option of saving the list of neighbors for each feature directly in the attribute table of a shapefile (which is only OK for small files with few neighbors; fields longer than 254 characters get truncated), and it gave you the option of listing neighbors to the next degree (a list of all the neighbor’s neighbors).
Everything seemed to run fine, so I re-ran the tool on a second set of Thiessen polygons that I had clipped with an outline of the US to create something more geographically realistic (so polygons that share a boundary only in the ocean or across the Great Lakes are not considered neighbors).
THEN – TROUBLE. I took some samples of the output table and checked the neighbors of a few features visually in Arc. I discovered two problems. First, I was missing about a thousand records or so in the output. When I geocoded them I couldn’t get a street-level address match for every record; the worse case scenario was a plot to the ZCTA / ZIP code centroid for the address, which was an acceptable level of accuracy for this project. The problem is that if there are many point features plotted to the same coordinate (because they share the same ZIP), a polygon was created for one feature and the overlapping ones fell away (you can’t have overlapping Thiessen polygons). Fortunately this also wasn’t an issue for the person I was helping; we just needed to join the output table back to the master one to track which ones fell out and live with the result.
The bigger problem was the output was wrong. I discovered that the neighbor list for most of the features I checked, especially polygons that had borders on the outer edge of the space, had incomplete lists; each feature had several (and in some cases, all) neighbors missing. Instead of using a shapefile of Thiessen’s I tried running the tool on polygons that I generated as feature classes within an Arc geodatabase, and got the same output. For the heck of it I tried dissolving all the Thiessen’s into one big polygon, and when I did that I noticed that I had orphaned lines and small gaps in what should have been one big, solid rectangle. I tried checking the geometry of the polygons and there were tons of problems. This led me to conclude that Arc did a lousy job when constructing the topology of the polygons, and the neighbor tool was giving me bad output as a result.
Since I’ve been working more with GRASS, I remembered that GRASS vectors have strict topology rules, where features have shared boundaries (instead of redundant overlapping ones). So I imported my points layer from a shapefile into GRASS and then used the v.voroni tool to create the polygons. The geometry looked sound, the attributes of each point were assigned to a polygon, and for overlapping points one polygon was created and attributes of the shared points were dumped. I exported the polygons out as a shapefile and brought them back into Arc, ran the Find Adjacent & Neighboring Polygons tool, spot checked the neighbors of some features, and voila! The output was good. I clipped these polygons with my US outline, ran the tool again, and everything checked out.
Morals of this story? When geocoding addresses consider how the accuracy of the results will impact your project. If a tool or feature doesn’t exist assume that someone else has encountered the same problem and search for solutions. Never blindly accept output; take a sample and do manual checks. If one tool or piece of software doesn’t work, try exporting your data out to something else that will. Open source software and Creative Commons tools can save the day!
Footnote – apparently it’s possible to create lists of adjacent polygons in GRASS using the sides option in v.to.db, although it isn’t clear to me how this is accomplished; the documentation talks about categories of areas on the right and left of a boundary, but not on all sides of an area. Since I already had a working solution I didn’t investigate further.
A little while ago I posted a text file with geographic centroids (centers) for each of the world’s countries. The reason why I put this together was that I wanted to test the data defined labeling features in QGIS. While automatic labeling in QGIS isn’t so hot (overlapping labels, multiple lables for each polygon), there are some powerful features for storing and referencing columns for annotation within the attribute table of shapefiles. One of the neat features is the ability to place labels based on coordinates stored in the attribute table.
The first step was to take the centroids file and join in to a shapefile of the worlds countries based on a common ID field, in this case FIPS country codes. QGIS doesn’t support table joins directly, but you can accomplish this with a good plugin called fTools, which includes a lot of additional and useful features. The instructions for getting fTools up and running are available on the fTools website; the installation doesn’t require you to download any files, you just handle everything through the QGIS plugin manager (if you have trouble seeing the plugin manager or getting fTools to appear, check to make sure that you have python installed on your machine). Once fTools is up and running, you’ll see a Tools dropdown menu next to your other menus – drop it down, select data management tools and join attribute tables. You’ll get a dialog box asking which shapefile and field you want to join and which shapefile or table you want to join to it. The plugin only supports joins from other shapefiles and dbf tables, so you have to save the save the country centroids text file as a dbf before you do the join (you can do this in Calc or a pre-2007 version of Excel). These aren’t dynamic joins; fTools will create a new shapefile with the table fields attached.
Once the join is complete, you can add the new shapefile with the new fields, click on the layer, and navigate to the labels tab. Hit the checkbox to turn the labels on, select the field that contains the label in the dropdown box at the top, then select data defined position from the menu below. You’ll see a new series of dropdowns on the right, and you can select your longitude column for the X coordinate and latitude column for the Y coordinate. Hit OK, and voila! You’ll have labels that are centered in the middle of each country.
Of course, the label placement will not be perfect in every case. There will be label overlap in areas with small countries, areas with many countries clustered together, and with countries that have long names. The scale and size of the font will also be a factor, and placing the country name in the center is not always ideal for small island nations. However, you can easily change the label placement by going into an edit mode and changing the coordinates in the attribute table to get optimal placement. You can mouse over the map and use the coordinate information that’s displayed beside the scale in the lower right-hand corner of the window to determine which coordinates are most optimal for a given situation. If you produce several maps at the same area and scale, you can use the same settings over and over again. You can also globally change the placement of all the labels using some of the other label options, such as placing all labels above or to the top-right of the centroid.
Now in order for all of this to work, the coordinates in the country centroid file must be in the same coordinate system as the shapefile. Since the country centroid file uses basic latitude and longitude, I was able to do this with a shapefile that was in the basic WGS 84 geographic coordinate system. If you’re using a different geographic coordinate system or a projected coordinate system, you’ll have to convert the coordinates in the centroid file to match that system. I haven’t delved into this too deeply yet, but there are a number of free tools that you can download that should do this – one of them is called GEOTRANS, and it’s available for free download from the NGA. It can handle batch transformations of coordinate data stored in text files, and supports conversions to several different geographic and projected systems.
I just added a new resource and updated another one on the resources page. I put together a file that contains the centroids (geographic centers) of all of the countries in the world, plus a few territories and dependencies. The centroids are in latitude and longitude coordinates based on WGS 84 in two formats: decimal degrees and degrees / minutes / seconds. It’s a tab delimited text file that you can open or import into any spreadsheet or database program. Each record is uniquely identified by a FIPS 10 code.
I downloaded most of the data from the NGA’s GeoNames Server (GNS). I blogged about the GNS awhile back, pointing out that you could query this gazetteer for individual places or you could download files that have all the features for each country in the world. While it took some time to figure out, you can actually take a middle road and query the database for specific categories of features that you can download. I used the text-based search and the links on the left side of the screen actually open different input boxes that you can use to query or exclude data. I managed to query top-level administrative units (countries) and to exclude most variant country names. After I downloaded the file, I still had to go in and do some clean-up, and I had to go back and get countries I missed by hand – these were mostly dependencies and territories that were excluded based on the search I did (Greenland, French Guiana, Netherlands Antilles, and a number of others).
Then I realized that the GNS excludes the United States and all of its territories. So, I went over to the USGS Geographic Names Information Service (GNIS) and grabbed the data for the US territories. The GNIS is simpler to navigate and you can download records pretty easily. They didn’t have a record for the United States as a whole, so I had to go over to the Census Bureau to get coordinates for the US centroid.
I brought all of these records into one file and placed it on the resources page for download, along with some metadata to describe it. Why would you want to use this stuff? You can use if for basic distance calculations, or as a annotated label field for label placement in GIS. More about that in my next post.
I also updated the country code cross-reference file that I took from the CIA World Factbook. You can use this as a bridge table to relate tables that use different identifiers. So if you wanted to join the fips-based centroid file to an iso-based shapefile of countries, you can join the centroids to the bridge first based on fips, and then that new table to the shapefile based on iso.
Sorry that November has been another crummy month for posts. Here’s one that I’ve been meaning to write for quite awhile.
While there is a lot of free GIS data out there, one of the black holes is business data. Specifically, if you want to plot all of the businesses in one industry or all of the branches or locations of one company, where do you get the data? I’ve found that, if you need a comprehensive resource, this is one of those datasets that you have to pay for.
At our library we subscribe to a great business directory called ReferenceUSA, which is produced by company called InfoUSA. Their directories of American and Canadian businesses are extremely comprehensive and cover every business large an small. They also have an international directory that has mid-size to large businesses. You can generate lists of businesses using several criteria and filters.
For places, you can specify the entire country, states, counties, places, or ZIP codes. You can get generate lists based on company names, keywords, or NAICS codes to grab all of the businesses in one industry. Once you have your list, you can click on each individual business to get a detailed profile. For GIS purposes, you’ll want to use the download option. Depending on your subscription, you’ll be able to download only a certain number of records at a time (we can get 25 records per download). Just download as a csv file, save, open in a spreadsheet, then start downloading subsequent batches and start copying and pasting records in a master file.
When you go to download, you’ll be prompted to choose basic, detailed, or custom. Basic isn’t going to cut it, as it’s missing the key fields – latitude and longitude coordinates. Choose the detailed option to get all of the fields. The custom option has some bugs – you’ll get lat and long without decimal places and some of the data for fields will be missing. Once you have all of the detailed records, you can delete a lot of the unecessary fields. You’ll want to, as many of the field headings are not database friendly – many are long and contain spaces, which will cause problems when you go to import the table into GIS. So be sure to delete any that you don’t need and fix the ones you do need.
Once you have your table ready, add it to your favorite GIS program. In ArcGIS you can use the Add XY Table feature to plot the points and turn them into a shapefile. Remember to specify the X coordinate as your longitude field and the Y coordinate as latitude, and define your geographic coordinate system as WGS 84. Once you plot them, right click on the feature in the Table of Contents and export them out as a shapefile so you have a permanent layer (see my previous XY post for more details). You can map the businesses as regular old points, or make some graduated symbols based on some of the attributes, like sales or total employees (ReferenceUSA doesn’t provide the exact data, but identifies a range, i.e. 1 to 10 employees, 11 to 25, etc).
Most of the open source alternatives also have a tool or plugin that allow you to plot XY data. Of course, the data does include address fields if you wanted to geocode your points rather than plot XY (but plotting XY is a million times easier and doesn’t require downloading huge street network files).
The good news here is that if you’re not affiliated with a university, you can probably get access to this db from a large public library, as many will have a subscription to a business directory as a matter of course. If they don’t have RefUSA they may have an alternative like the D and B Million Dollar Database. It’s another business directory that allows you to download XY data for businesses, but it is not nearly as comprehensive.
Sorry that October has obviously been a pretty weak month for posts. I’ve been driven to distraction lately and haven’t done much GIS related work.
I was working on a project this week that involved manipulating data tables, so I thought I’d share a couple tips here. A number of months ago I wrote a post about manipulating FIPS codes and text-based ID fields. But what if you have to manipulate numeric fields? Adding decimal places, zeros, etc? The answer is – math!
In one field, I had a population figure from the 1970 Census that had been rounded to the hundreds place, so it was listed like this:
BronxÂ Â Â 14718
I wanted to make this a little more explicit by adding the appropriate zeros, so in Excel (or Calc if you prefer) I created a formula to multiply this by 100 =(c2*100) to get the full number with zeros:
BronxÂ Â Â 1471800
I also had fields with latitude and longitude coordinates in decimal degrees, but they lacked decimal points. The longitude field also lacked the minus sign, which means if we plotted the points they would end up in Asia instead of North America (longitude east of the dateline and west of the prime meridian is notated as negative in decimal degrees, as is latitude south of the equator). I knew from the metadata that each coordinate pair was precise to four decimal places, and I knew all of my points were in North America. So I created a formula where I took the latitude and divided by ten thousand =(c3/10000) and took the longitude, divided by ten thousand and multiplied by -1 =((c4/10000)*-1). Here’s the before and after:
BronxÂ Â Â 408492Â Â Â 738800
BronxÂ Â Â 40.8492Â Â Â -73.8800
Some of this may seem pretty obvious, but if you’re used to working with text-based ID fields all of the time (like I am), it’s easy to forget that all you need is simple math to fix number fields.
The last step I took was to check for null values. A few of my data points had 0,0 listed for lat and long, because coordinate data was missing for those particular places. The problem is that 0 IS a value! If we plotted this data, these points would show up where the equator and prime meridian meet below western Africa. You have to represent “no data” as a blank value or null, and not as a zero. I fixed those, plotted, and was good to go.