Working With Lat Long, Numeric Data

Sorry that October has obviously been a pretty weak month for posts. I’ve been driven to distraction lately and haven’t done much GIS related work.

I was working on a project this week that involved manipulating data tables, so I thought I’d share a couple tips here. A number of months ago I wrote a post about manipulating FIPS codes and text-based ID fields. But what if you have to manipulate numeric fields? Adding decimal places, zeros, etc? The answer is - math!

In one field, I had a population figure from the 1970 Census that had been rounded to the hundreds place, so it was listed like this:

Bronx    14718

I wanted to make this a little more explicit by adding the appropriate zeros, so in Excel (or Calc if you prefer) I created a formula to multiply this by 100 =(c2*100) to get the full number with zeros:

Bronx    1471800

I also had fields with latitude and longitude coordinates in decimal degrees, but they lacked decimal points. The longitude field also lacked the minus sign, which means if we plotted the points they would end up in Asia instead of North America (longitude east of the dateline and west of the prime meridian is notated as negative in decimal degrees, as is latitude south of the equator). I knew from the metadata that each coordinate pair was precise to four decimal places, and I knew all of my points were in North America. So I created a formula where I took the latitude and divided by ten thousand =(c3/10000) and took the longitude, divided by ten thousand and multiplied by -1 =((c4/10000)*-1). Here’s the before and after:

Bronx    408492    738800

Bronx    40.8492    -73.8800

Some of this may seem pretty obvious, but if you’re used to working with text-based ID fields all of the time (like I am), it’s easy to forget that all you need is simple math to fix number fields.

The last step I took was to check for null values. A few of my data points had 0,0 listed for lat and long, because coordinate data was missing for those particular places. The problem is that 0 IS a value! If we plotted this data, these points would show up where the equator and prime meridian meet below western Africa. You have to represent “no data” as a blank value or null, and not as a zero. I fixed those, plotted, and was good to go.

Open Source GIS Wrap-up

I’ve been on an open source GIS tear this month, so in this post I’ll wrap up some odds and ends:

  • There is a project called Sextante, which is essentially an open source ArcToolbox for gvSIG. It adds a lot of geoprocessing and analysis functions and is pretty easy to install. There are 200 + tools in the box, but for some reason not all of them are active. I’m not sure why this is the case, but haven’t poked around much to find out.
  • There are also a number of extra plugins for QGIS that are available through the QGIS wiki under PluginRepository; they include plugins that add more symbolization and that make table joins possible. Haven’t had a chance to try this yet either, but it sounds like these extras could make QGIS a lot more viable as a thematic mapping option.
  • I found out about the QGIS plugins from this article, which offers a good overview of QGIS. The article also discusses one of the other shortcomings of open source GIS - the lack of a support for a simple, desktop geodatabase similar to the Microsoft Access personal geodatabases. PostGIS is certainly powerful and there has been a lot written about it, but a server based geodatabase is not always the best solution, particularly for small, stand-alone projects. There is a cool project called Spatiallite, where someone has created geographically enabled SQLite databases (which are small, stand alone dbs). You can export shapefiles to them, or simply view and edit the attributes in a shapefile via a virtual connection. Based on what I’ve looked at thus far, you can access SQlite databases directly in GRASS and when using GRASS datasets via QGIS, but I haven’t been able to connect to a SQlite db with the other software I’ve looked at - it’s just not supported yet.
  • In researching open source GIS, I’ve looked at a book specifically on GRASS, Open Source GIS: A Grass Approach, as well as two books on web mapping (GIS for Web Developers: Adding ‘Where’ to Your Web Applications and Web Mapping Illustrated: Using Open Source GIS Toolkits)which cover GDAL and OGR, QGIS, GIS servers, PostGIS and PostgreSQL, and a few other tools. There is a book that’s recently been published that focusses specifically on Open Source Desktop GIS - Desktop GIS: Mapping the Planet with Open Source Tools. I pre-ordered a copy on Amazon that was supposed to ship in Mid September, but is now being delayed until late October. Based on the table of contents it looks pretty thorough and covers many of the choices I listed in my previous post, and I’m looking forward to its arrival. Sadly, it does not appear to cover gvSIG, which was my favorite option thus far.

Why Consider ArcGIS Alternatives?

Last week I shared my adventures evaluating open source software. Why bother looking at alternatives to ArcGIS? There are significant barriers of entry to ArcGIS. Whenever I give an introductory GIS presentation to anyone, I inevitably have to answer the question of “How can I get access to this software?” Inevitably, the answer is you have to spend a lot of money, or if your institution already has a subscription, you need to go through a lengthy process to get access.

  • Price. A single, stand-alone copy of ArcView costs $1500. Not only is that prohibitively expensive for me, it’s impossible for students. Which means that students who are taking a GIS class have to use the software in a computer lab on campus to complete assignments. This is not always convenient for many students, and is particularly problematic where I work since we are primarily a commuter campus.
  • License limitations. If you’re running Arc through a central license server, PCs have to be connected to the server through a hardwired connection - no wireless. Our library has a laptop checkout program for students which would give students an alternative to using a computer lab. But not being able to install the software on a laptop eliminates this possibility. It also makes it a pain for me to give presentations, as I always have to make sure that the room I’ll be presenting in has the software. My short term solution is to use an eval copy on a laptop. You can purchases USB keys that have the license info on them, but if you work in a large, complex academic or government setting, getting one can be a challenge. And every year we have to go through the process of getting the license renewed.
  • Installation and Bugs. As Arc users know, installation can be time consuming, particularly since you can’t have two versions of Arc installed concurrently - you have to uninstall one before installing the new one. And how many service packs have been issued for version 9.2? Six. IT people love it when they have to install fixes in a dozen labs / classrooms in the middle of a semester, particularly when they have to do it 5 or 6 times a year. In reality, we skip several service packs and live with the bugs.
  • Forced Obsolescence. This is particularly aggravating. Every year or two, we all have to go through the ritual of making an upgrade, which involves time consuming un-installation and installation. And you need to make sure that different branches of your organization that use GIS are on the same page, otherwise you’ll run into incompatibility issues (like when mxd files created in version  9.2 don’t work in 9.1).
  • Cross platform. I run a linux box at home and occasionally would like to take my work with me. There are a number of students and faculty members at my school who are ardent Mac users. But ArcGIS runs only on Windows.

The open source alternatives are free, easy to install (usually), can be installed anywhere without restrictions, the software doesn’t expire, and upgrades are a rather simple affair. The obvious downside is that none of them have the power, scope, or usability that ArcGIS has. At least, not yet.

Open Source GIS for Thematic Mapping

I’ve been exploring the open source GIS alternatives, and have been pretty overwhelmed by the number of choices. For an overview of what’s out there, you can check out The State of Open Source GIS (a large pdf) from Refractions Research, and a series of comparison tables assembled by a geography prof at the Univ of Calgary. You can also search the web for “Open Source GIS”, and you’ll find a number of blogs, forums, lists, and sites that cover it in some detail.

There are a lot of alternatives, and many of them are geared to a particular purpose: raster vs vector, viewer vs map making vs analysis, etc. I’m looking for something that’s cross-platform that I can use for vector-based thematic mapping, and something that I can easily introduce and teach to novices. I need software that allows me to: work with common formats like shapefiles, transform projections, add data tables and join them to shapefiles, symbolize data with a good selection of color schemes, classify data with several methods including natural breaks, add labels, and produce maps as pdfs, images, or in print. Preferably, I want something that has strong map layout capabilities. I don’t want to use a graphic design package for final map creation.

I’ve looked at five options that are all great, but there isn’t one that covers everything I’m looking for :

  • GRASSGRASS - an established and powerful GIS. Setting up the environment for working with files takes some getting used to (you can’t simply open a window and start adding files), and the native graphic interface is complex. GRASS only works with it’s own native file formats, so you have to import everything into that format first. GRASS was really designed for analysis and modeling (tasks for which it excels), but is not the best choice for basic thematic mapping, or for novices.
  • QGISQGIS - one of the strongest attributes of QGIS is that it harnesses the power of GRASS in a more user friendly environment. If you use the GRASS plug-in and work with GRASS datasets, you can do table joins and have access to several classification methods. If you use QGIS on it’s own (working with shapefiles or raster images), you won’t have these capabilities. It seems that projection transformations are limited to a certain subset of projections, and common global thematic map projections like Robisnon or Winkel Tripel are missing (you would have to create them manually). While QGIS does have a print layout screen, I found that it was a little clunkly to use. Color schemes are limited, and labeling isn’t good - it automatically labels every single polygon in a multi-part layer, and the only way to turn it off is via a hack. QGIS is a great viewer (particularly for rasters) and is a good alternative GRASS front end, but probably won’t be your choice for making high-quality thematic maps.
  • gvSIGgvSIG - billed as an alternative to the old ArcView 3.x, gvSIG lives up to this reputation. A map project has separate, defined areas for data views, maps, and data tables. You can do projection transformations and it does support EPSG and ESRI, but the process is a little confusing. The map window has a default projection, but when you add a layer it uses the layer’s projection but keeps the windows projection, and it’s difficult to figure out what projection the layer is in (if this makes any sense!) You can do table joins, and there is a good selection of color schemes for classification and several classification methods. It is the only one that I’ve seen that has natural breaks as an option. It also has the best map layout compared to the other software I’ve looked at and you can export maps out to a number of formats. The biggest weakness is labeling, which is very rudimentary. It doesn’t place labels in the center of polygons, but offsets them slightly (appropriate for points but not polygons), and has no conflict detection. It does allow you to create annotation layers, and improvements are in the works for the next version. Since the software was created in Spain, you’ll occasionally find a menu here or a button there that was missed in translation to English.
  • udiguDIG - the windows and toolbars are not as GIS-like as the other software options, which makes findings things a little tougher. Udig has good projection transformation support, great selection of color schemes for symbolization, and excellent label placement (the best, by far), with conflict detection and different placement options. Like QGIS, the map layout screen is located under the print option. The templates are rudimentary but are easy to use and get the job done. The detractors here are data classification (natural breaks is not a choice) and table joins - there is no option for adding and joining attribute tables to spatial files whatsoever.
  • openJUMPOpenJUMP - has a great interface, particularly for working with attribute tables, good selection of color schemes for symbolization and good label placement features. Table joins are supported for text files (no DBFs). Equal Intervals is the only data classification method available (no natural breaks), and projection transformation is only available via a plug-in. The bigger issue is that there is no print option or map layout. These are available through a plugin as well, but I haven’t tried installing it yet. Plugin installation requires altering or over-writing some of the program files, and I was dubious to try.

There are always work-arounds to fill in the features that are missing. For file and projection transformations, you can always use the GDAL / OGR tools, which I would recommend (although annoyingly I can’t seem to get the projection transformations to Robinson or Winkel to work). The NCEAS at UC Santa Barbara has a nice wiki with examples of commands. Table joins can be accomplished outside the GIS using a database package. You could use a spreadsheet or stats package to figure out break points for data, and change the breaks manually in the GIS. For labels, you can export labels out as annotation, or you can convert polygons to points and use the points layer as a label layer (just make the points invisible but turn the labeling on). Then you can edit that file and move the labels around to get them in the right position. If you make lots of thematic maps for the same area, you can use the same label file over and over again.

This isn’t an exhaustive overview and I haven’t created a consistent procedure for testing all the options. There are a few other elements that I would also want to explore (How good is the support for legends? Can you normalize data or calculate new attribute fields? Can you add a graticule? Change the background color for a view? Convert a table of XY coordinates to a point layer? Is there a geoprocessing tool for generalizing layers?)

If pressed, which option would I choose? I’m inclined to go with gvSIG, which really reminds me of the old ArcView, and would hope that label placement improves with the next version - each of these software packages are constantly improving works in progress. Perhaps it’s best to regard these software options as different tools in one toolbox. If I need to make a thematic map of the world, showing population by country with only a few labels, then I’ll go with gvSIG, where I can easily do table joins, use natural breaks, and have lots of colors at my disposal. If I need to make basic reference maps, say ZIP codes of the NYC metro area, then I’ll go with uDig, as I’ll be able to quickly label them all and still have good color scheme choices. If I need to do geoprocessing or analysis, I’ll also have to evaluate the options - maybe look at plugins, or hunker down and learn GRASS.

In the end, I’m certainly grateful that there are solid, open, and free choices out there and that people are freely giving their time and talent to everyone’s benefit. Why consider these alternatives? I’ll cover that in my next post.

Red States / Blue States

It’s been a busy summer - I’ve spent a good chunk of it working on an election mapping project. The library wanted to create a resource for students to use for the upcoming 2008 presidential election. Here it is:

Red States, Blue States: Electoral Strategy Behind the Map

A few of the procedures and issues I encountered while working on the project became fodder for a number of posts to this blog, so I thought I’d share the end result.

I’ve also been assembling data and pages for a server I’ve been given to provide GIS data at Baruch, and have been investigating open source alternatives to ArcGIS. More on that later…

GIS Data: UNSDI, gData, CEGRP, AIMS

I’ve stumbled across a few good sites for GIS data lately. Check these out:

UNSDI-NCO: The United Nations Spatial Data Infrastructure site, maintained by the Netherlands Coordination Office. They have many global datasets as well as country-specific ones, often for developing countries where data is hard to come by. Includes boundaries, roads, infrastructure, and natural features. Click on the Datasets link under the Categories menu to see the list, then click on the feature of you choice. You’ll have to scroll through the metadata to the Distribution Info element to get to a download link. Not all of the datasets are available for public download.

gData: This site is housed at Berkeley as part of the Biogeomancer Project, whose goal is to share data on biodiversity. You can download boundaries, hydrography, infrastructure, topography, and climate data in vector and raster formats for any country in the world. The data is aggregated, and in some cases improved, from many public sources. Administrative boundaries include 1st, 2nd, and often 3rd level divisions. A great, comprehensive source.

CEGRP: China Earthquake Geosptial Research Portal, housed at Harvard. The goal of the site is to gather and distribute geospatial data in response to the earthquake that hit Sichuan China in May 2008. Vector and raster layers for all of China and for this particular region where the earthquake hit.

AIMS: Afghanistan Information Management Services. A non-profit group located in Afghanistan that has created and maintains a geospatial infrastructure to support the government. Vector datasets for the entire country and the city of Kabul are available for download. They also offer a number of static pdf maps.

FIPS and ISO Country Code Table

I’ve created a bridge table to relate various country codes: FIPS 10, the three versions of ISO 3166 (two alpha, three alpha, and three numeric), NATO, and the internet country codes. Ok, ok, I didn’t create it, the CIA World Factbook did and has it listed in their appendix, but in HTML. I just copied it and made it database friendly: fixed column headings, removed dashes for nulls, removed trailing and leading spaces, converted numeric codes to text while preserving zeros, and saved it in a tab-delimited text file. You can download it from the Resources page.

If you open it in Excel, Excel annoyingly converts the three digit numeric ISO code back to a number and drops the leading zeros. You can fix this using a trick I illustrated in a previous post, or import it into Calc or Access instead. You’ll be able to designate that field as text during the import process. If you stash the table in a geodatabase, you will be able to relate features and tables that use different codes through this bridge table.

I was also searching for the ISO 3166-2 codes for 1st level subdivisions within countries (like states in the US or provinces in Canada), but had trouble finding anything official as I think they may be copyrighted. I eventually found one source that claims that they received permission to post the codes, so you can take a look there. I also stumbled across a good reference source called Statoids, which gives you background information, lists, and codes for subdivisions on a country by country basis. I’ve added both of these to the Links - Resources page.

Adding Long / Lat XY Data to ArcMap

Here’s a tutorial I’ve been meaning to write: adding a table of longitude and latitude coordinates to ArcMap and turning them into features. For this example, I’ll be using place names from the GEOnet Names Server country files. The US National Geospatial Intelligence Agency has a pretty extensive list of geographic features for each country, with coordinates in many formats, including longitude and latitude in decimal degrees. I’ll use Botswana in southern Africa as an example, as it has a small record set and because I have some admin boundaries handy that I’ve downloaded from SAHIMS.

  • Download the file from the GNS and unzip it. It is a tab-delimited text file. If you like, you can open it in Excel or another spreadsheet to see what it looks like. This works fine for this example, but won’t work for larger or more populated countries because the files will exceed the maximum number of records that a spreadsheet can handle (65k). You’ll need to import the file into a database (Access for example) if you want to take a look in those cases. In either event, you’ll be able to add the text file directly to ArcMap, so no worries.
  • Add XY Data Open ArcMap and under the Tools menu, select Add XY Data. In the dialog box, you’ll select the file that contains your XY coordinates. Choose the text file you’ve downloaded. ArcMap will then search through the fields and look for appropriate ones to add as X and Y fields. In this case, it should correctly choose LONG for X and LAT for Y. If Arc couldn’t figure it out, you would have to specify which columns have the coordinates. Longitude is ALWAYS the X coordinate, and Latitude is ALWAYS the Y. Finally, you’ll select a projection. Choose the standard geographic coordinate system WGS 1984, which is usually a safe bet when adding long/lat data from most sources.
  • Add XY Dialog BoxHit OK, and Arc will plot the coordinates (after you click through the warning message). In this example, it looks like there is one wayward point, way to the north. When you see something like this, it often means that one of the coordinates is missing a minus sign: latitudes below the equator are negative, as are longitudes east of the international date line and west of the prime meridian. If you use the identity tool, you’ll see that the minus sign for latitude for this wayward point is missing. The easiest thing to do would be to go back into the text file, edit it, and add it to ArcMap again.
  • Even though Arc has plotted the points, they still don’t exist as features (remember the warning message? That’s essentially what it was saying). Select the plotted points in the Table of Contents, right-click, select Data, and select Export. Export the points out as a new shapefile or a feature class in a geodatabase. Then add the new features to the map.
  • At this point, it may be helpful to have a frame of reference for all of these points. Get your hands on some administrative layers, like country boundaries. I downloaded the outline of Botswana from SAHIMS. This step usually requires projecting and reprojecting, as you’ll need to get your points layer to match the projection of the other files you’re working with. I always use the ArcToolbox within ArcCatalog to fiddle with projections and then add the finished files to a new, blank map in ArcMap. In my case, the Botswana boundary was undefined - I had to consult the metadata from their website to figure out what the projection is (NAD 1927) and then define it using the ArcToolbox (Data Management Tools, Projections and Transformations, Define Projection). Then, I had to convert the Botswana points layer from WGS 1984 to match the boundary’s NAD 1927 projection (using Data Management Tools, Projections and Transformations, Feature, Project).
  • Plotted points with boundaryAdd the projected boundary and reprojected points to your map. Many of these points are point features (villages, towns, farms, mountain peaks), while others represent the geographic centers of lines (roads, rivers) or areas (administrative areas, parks, reserves). You’ll probably want to extract certain kinds of features. At this point, you’ll want to take a look at the attribute table for the points file and consult the NGS description for the names files. The description will tell you what each of the data columns represents and what all of the codes mean. The FC field will come in quite handy here, as it designates categories for each feature. So if we wanted to extract populated places, under the Selection Menu in ArcMap we could do a Select by Attribute where the field FC is equal to P, which is the code for populated place features. Once they are selected, you can do a Data, Export to create a new shapefile with just those features.
  • Alternatives do abound here. If you prefer, you could do a lot of the work of editing and creating feature subsets within a geodatabase. You can also follow these same, general procedures using open source tools (I believe that QGIS has a tool for adding XY data). And while we’re discussing a specific example here, the same basic steps would apply for any XY dataset.

Searching for Foreign Census Data

I’ve been looking for census data for various countries, and have visited the usual suspects that aggregate this data - the CIA World Factbook and the United Nations Population Information Network. Other supra-national orgs like the IMF and World Bank also create and compile this info. These are fine sources, particularly if your goal is to look at basic data for several (or all) countries. But if you are studying or writing about one country in particular, it may seem odd to cite the UN, and even odder to cite the CIA. It would be better to go right to the source - the chief statistical agency in that particular country. In all likelihood, this agency would also have more in-depth stats than the aggregators.

But - where is the source? Rather than be left to the mercy of google, where you’ll uncover the obvious suspects and lots of commercial sites and joe-schmoes who republished some data from last decade, visit the US Census Bureau’s list of foreign statistical agencies, which will lead you right to the source.

Assuming you can find some pages with some data (census data isn’t public domain in every country and isn’t necessarily online for free, or at all, in which case you may need to go with some of the aggregate sources), the next obstacle will be overcoming the language barrier. Many countries will publish pages in several languages, including English. Some may publish only limited info in English, or no info in English at all. If you don’t read the lingua franca, you can try a translating tool like Babblefish or the Google Language Tool to translate the page for you. The translation may not be perfect, but it should be good enough where you can figure out what you need (although if the language you are translating doesn’t use the Roman alphabet and Arabic numerals - i.e. 1,2,3 etc, you may have some trouble).

The toughest obstacle to overcome may be the organizational barrier. If you are familiar with the US Census Bureau, you’ll know that it’s a large and complex organization with many subdivisions and datasets (decennial census, acs, population estimates, etc). And despite it’s enormity, it doesn’t collect all socio-economic data (religious affiliation) and may not be the best source for all data (current labor force stats). Well - other countries are just as complicated, so be wary!

Another strategy would be to visit Wikipedia - not to cite as a source, but to find what sources they use. You’ll find many country specific articles that cite the CIA Factbook or the UN, but some of the more detailed and well written ones do cite reports written by the statistical agencies for the country in question, often with a link to the page or report. If you have access to some library databases, like Gale Virtual Reference, they will (usually) cite sound references as well. Happy hunting!

Heading Cross Country with Google Maps

I’m flying out to Seattle tonight, and will eventually be driving back to New York, after side trips to Vancouver BC, Olympia, and San Fran. When I moved to Seattle for grad school back in 2005, I used a Rand McNally road atlas to plot time and distances between stops. This time around I used Google Maps, which made it much easier to fiddle around with spacing the stops out. You can check out my plan (for Seattle to New York via San Fran) here. Place names and Long / Lat coordinates get passed through the url, making it easy to hack together your own map.

I’m still taking that road atlas with me though!