A brief note – I’ve updated and replaced the country centroids file that I was previously hosting. I extracted data with geographic centroids in latitude and longitude for each country and dependency in the world using extracts from the NGA’s GNS and the USGS GNIS. Data is current as of Feb 2012, with long and short names for countries and two letter alpha FIPS and ISO codes for identification and attribute linking. Available for download on the Resources page.
Posts Tagged ‘country codes’
I’d downloaded some data from the UNdata website and cleaned it up so I could use it for my class, and thought I’d share some tips here. In many cases when you download data from UNdata you get multiple records for each country; one record for each year for each data point. In order to bring this data into GIS, I needed to re-arrange it to move the years from rows to columns, so that I’d have one record for each country with multiple columns for years.
You can do this in Excel using a pivot table, but since I was working off of my Linux notebook, I accomplished this using the Data Pilot tool in Open Office 3.0’s Calc spreadsheet. Here’s what I did:
- I opened the csv file I downloaded from UNdata in Calc, and accepted the defaults on the text import screen. Once it was imported, I saved the file in a spreadsheet format – you can use Calc’s odt format, or you can save it in Excel xls format if you need to use Excel later. But you have to get out of the csv format – Calc crashed on me a couple of times when I was running the Pilot and creating multiple sheets in the csv.
- I went up to the Data menu, selected Data Pilot, and Start, which opens the Data Pilot Menu. I clicked the More button to see the full range of options.
- Then it was a simple matter of dragging the field names into the right places. I dragged the country code and name fields into the rows box, the year into the column box (as I wanted to move years to columns), and the actual data field into the values box. Under the options listed under More, I changed the Results drop down box to save the table in a new sheet, and I unchecked all of the boxes listed below (for adding filters, creating totals rows and columns, etc). Then clicked OK.
- Voila! I had my newly formatted table, with one row for each country and one column for each year. But since I’ll be bringing this data into GIS (and will have to save the data in DBF format as I want my students to bring it into QGIS), I need to make sure that my data doesn’t have any funky formatting that may mess up joining my data to a shapefile. So I added a blank worksheet, copied my new pilot table, and did a Paste Special into the blank worksheet and pasted only text and numbers – with formatting, formulas, and anything else funky left out.
- Once I had my plain, reformatted data in my new sheet, I deleted the top row (which had labels for Sum Value and Year) so I’d be left with only one header row, and I changed the field names to something more database friendly (truncating names and removing spaces). Lastly, I deleted the original data sheet and the formatted data pilot table sheet, so I was left with just the final copy.
That’s it! Sort of. Since I now have the year’s in columns, I could create a few calculated fields to show change over time.
But the last piece will be dealing with the country codes. To get a data table with codes from the UNdata website, you have to choose an Add Columns option from their data browser page before you download, as you don’t get the country codes by default. Then, the codes you get could be anything. Since these data tables are coming from dozens of different organizations, agencies, and bureaus within the UN, the country codes will vary based on what that agency did. In some cases I’ve downloaded data that had the ISO two-digit alpha codes, and in other cases I had three digit numerical ISO codes (stored incorrectly as numbers, so leading zeros were dropped).
Most of the tables I’ve been downloading come from from the World Health Organization (WHO), and came with no standardized country codes. Instead, the codes are sequential numbers assigned to the countries in alphabetical order from 1 to 193. Doh! Then, if a new country gets added they tacked on the next available number regardless of the alphabet – so the country of Montenegro is assigned 194, after Zambia which is 193. Typically, data from countries that are not UN members or observers (like Liechtenstein and Vatican City) and are dependencies (Greenland, Falkland Islands, French Polynesia, etc) are not included in the data sets.
So, I’ll be typing in ISO alpha two codes into one of my data tables and will end up with a table that connects their sequential number system to ISO. Then I can bring this bridge and all of the other tables into a database, relate them to the bridge based on the sequential number, and create new tables out of them that have ISO numbers, so I can join them to my GIS file based on ISO. Or I guess I could add the sequential number field to my countries shapefile and join each table to it based on the sequential number.
Anyway – happy Data Piloting (or Pivoting, if you prefer).
I’ve created a bridge table to relate various country codes: FIPS 10, the three versions of ISO 3166 (two alpha, three alpha, and three numeric), NATO, and the internet country codes. Ok, ok, I didn’t create it, the CIA World Factbook did and has it listed in their appendix, but in HTML. I just copied it and made it database friendly: fixed column headings, removed dashes for nulls, removed trailing and leading spaces, converted numeric codes to text while preserving zeros, and saved it in a tab-delimited text file. You can download it from the Resources page.
If you open it in Excel, Excel annoyingly converts the three digit numeric ISO code back to a number and drops the leading zeros. You can fix this using a trick I illustrated in a previous post, or import it into Calc or Access instead. You’ll be able to designate that field as text during the import process. If you stash the table in a geodatabase, you will be able to relate features and tables that use different codes through this bridge table.
I was also searching for the ISO 3166-2 codes for 1st level subdivisions within countries (like states in the US or provinces in Canada), but had trouble finding anything official as I think they may be copyrighted. I eventually found one source that claims that they received permission to post the codes, so you can take a look there. I also stumbled across a good reference source called Statoids, which gives you background information, lists, and codes for subdivisions on a country by country basis. I’ve added both of these to the Links – Resources page.