Archive for the ‘Data Processing’ Category

SpatiaLite and QGIS: Loading, Joining, Mapping Shapefiles and Tables

Saturday, January 30th, 2010

I stuck with with the Long Term Support Version of QGIS (1.02) last semester while I was teaching, but now I finally have had a chance to experiment with the latest version (1.4) which has a lot of great new features including: improved symbolization, labeling, print layouts, and support for SpatiaLite – a personal (single file) geodaatbase based on SQLite. For a summary of the new QGIS features check out the QGIS blog and this developer’s blog, and for an overview of SpatialLite you can go to the official docs page and this tutorial. The latter will show you the obvious strengths of SpatialLite – the ability to store features and attributes in one container, with the ability to run standard SQL and spatial queries on both. Since that’s covered pretty well, I thought I’d run through a basic operation – how do you load a shapefile and an attribute table in SpatialLite, join them, connect to the database in QGIS and map the data. I’m using the SpatialLite GUI, but for those more inclined you could use the command line tool instead.

Loading shapefile in SpatialiteFire up the GUI, and create a new, empty geodatabase under the File menu.Once we have a container, we can hit the load a shapefile button. I have a census PUMA layer for NYC that I’ve formatted by erasing water features. Click load, go to the path, give the file a nice brief name, and specify the SRID – the EPSG code that specifies what coordinate system my shapefile is in. In this case, it’s 4269 as the layer is in NAD83 (you can check your files by opening the prj file in a text editor or by using the OGR tools).

Table viewOnce it’s loaded, you can expand the listing in the table of contents to see all the field names of the feature, and you can right click on it and choose the edit option to see all of the data in the attribute table.

Next we can load a data table. I have a 2006-2008 ACS census table in tab-delimited text format that I’ve pre-formatted. The table has the number of workers (labor force age 16+) and number of workers who commuted to work via the subway for every PUMA in the State of New York (it’s faster to download the whole state and filter out the city PUMAs later). Hit the load txt/csv button, specify a path, a new table name (subway_commuters), the delimiter used, and load the table. It’s given a different icon in the table of contents (toc), to distinguish a regular data table from a feature class.

spatlite4The next step is to join them together; I already insured that they both share a common, unique identifier; a FIPS code that has a state and PUMA code. If I run a standard SELECT query I can join the tables in a temporary view – but that’s not what I want. I can save the query as a view, but I won’t be able to access the view within QGIS (at least not with this current stable version of SpatialLite, 2.31). What we have to do here is create a brand new table that combines both the puma feature class and the subway commuter table (referred to in Microsoft Access land as a Make Table Query). Here’s the SQL that we type in the command window:

CREATE TABLE pumas_nyc_subcom AS
SELECT *
FROM nyc_pumas, sub_commuters
WHERE PUMA5ID00=GEO_ID2

Execute the query, and we get a message that an empty results set was generated. Uh, ok. But then if we select the database path at the top of the TOC , right-click, and refresh, we’ll see our new combined table, pumas_nyc_subcom, and we can expand it and take a look at the data. The join worked, but we’re not done yet. Right now this is just a regular old data table (notice the icon?) We have to turn this into a feature class next.

Joined and created feature classExpand the fields for the new table in the TOC, select the Geometry field, right click, and check the geometry. We’ll see that it’s MULTIPOLYGON geometry, the projection is still NAD83, and there are 55 features (the non-NYC PUMAS were filtered out during the join, leaving us just with NYC data). Right click on Geometry again, choose the option to Recover Geometry. Specify the geometry type and the SRID, run, refresh the database, and success. A little globe appears next to pumas_nyc_subcom, indicating that it’s now a feature class.

spatlite7

QGIS Spatialite connection interfaceAt this point we can fire up QGIS. In the toolbar for versions post 1.02, there should be a connect to SpatialLite button. Hit connect, add a New database, and browse to get to it. Once it’s loaded, then we can hit connect to connect to it, and we’ll be able to see all feature classes (but NOT data tables, which is why we had to go through the join). Select pumas_nyc_subcom, which has features and data, and click add.

As with any GIS, now we have to symbolize the features to map the subway commuters. Right click on the layer in the table of contents, select properties, and you’ll get to the recently redesigned properties menu. Go to Symbology, map the subway commuters field by graduated values, change some colors, and voila, a map!

QGIS map with data and new labelsThe developers are still experimenting with improvements – there’s a button in the upper right-hand corner of the symbology tab that asks you if you want to try the New Symbology – this is a new layout, with the introduction of graduated color palettes. It’s pretty slick, but still a work in progress (color ranges are assigned from dark to light, with the lowest values getting the darkest color; the opposite of cartographic convention). The same label properties are there too, but you can experiment with the improved labeling engine under the Plugins menu. The automatic placement of labels is vastly improved.

Mapping totals for subway commuters isn’t as interesting as mapping the percentage of commuters in each PUMA who ride the subway. So I’ll share my experiments working with calculated fields (in SpatialLite and QGIS) in my next post.

UNdata Processing, Calc Data Pilot

Sunday, September 27th, 2009

I’d downloaded some data from the UNdata website and cleaned it up so I could use it for my class, and thought I’d share some tips here. In many cases when you download data from UNdata you get multiple records for each country; one record for each year for each data point. In order to bring this data into GIS, I needed to re-arrange it to move the years from rows to columns, so that I’d have one record for each country with multiple columns for years.

You can do this in Excel using a pivot table, but since I was working off of my Linux notebook, I accomplished this using the Data Pilot tool in Open Office 3.0′s Calc spreadsheet. Here’s what I did:

  • screenshot1I opened the csv file I downloaded from UNdata in Calc, and accepted the defaults on the text import screen. Once it was imported, I saved the file in a spreadsheet format – you can use Calc’s odt format, or you can save it in Excel xls format if you need to use Excel later. But you have to get out of the csv format – Calc crashed on me a couple of times when I was running the Pilot and creating multiple sheets in the csv.
  • I went up to the Data menu, selected Data Pilot, and Start, which opens the Data Pilot Menu. I clicked the More button to see the full range of options.
  • screenshot3Then it was a simple matter of dragging the field names into the right places. I dragged the country code and name fields into the rows box, the year into the column box (as I wanted to move years to columns), and the actual data field into the values box. Under the options listed under More, I changed the Results drop down box to save the table in a new sheet, and I unchecked all of the boxes listed below (for adding filters, creating totals rows and columns, etc). Then clicked OK.
  • screenshot4Voila! I had my newly formatted table, with one row for each country and one column for each year. But since I’ll be bringing this data into GIS (and will have to save the data in DBF format as I want my students to bring it into QGIS), I need to make sure that my data doesn’t have any funky formatting that may mess up joining my data to a shapefile. So I added a blank worksheet, copied my new pilot table, and did a Paste Special into the blank worksheet and pasted only text and numbers – with formatting, formulas, and anything else funky left out.
  • screenshot5Once I had my plain, reformatted data in my new sheet, I deleted the top row (which had labels for Sum Value and Year) so I’d be left with only one header row, and I changed the field names to something more database friendly (truncating names and removing spaces). Lastly, I deleted the original data sheet and the formatted data pilot table sheet, so I was left with just the final copy.

That’s it! Sort of. Since I now have the year’s in columns, I could create a few calculated fields to show change over time.

But the last piece will be dealing with the country codes. To get a data table with codes from the UNdata website, you have to choose an Add Columns option from their data browser page before you download, as you don’t get the country codes by default. Then, the codes you get could be anything. Since these data tables are coming from dozens of different organizations, agencies, and bureaus within the UN, the country codes will vary based on what that agency did. In some cases I’ve downloaded data that had the ISO two-digit alpha codes, and in other cases I had three digit numerical ISO codes (stored incorrectly as numbers, so leading zeros were dropped).

Most of the tables I’ve been downloading come from from the World Health Organization (WHO), and came with no standardized country codes. Instead, the codes are sequential numbers assigned to the countries in alphabetical order from 1 to 193. Doh! Then, if a new country gets added they tacked on the next available number regardless of the alphabet – so the country of Montenegro is assigned 194, after Zambia which is 193. Typically, data from countries that are not UN members or observers (like Liechtenstein and Vatican City) and are dependencies (Greenland, Falkland Islands, French Polynesia, etc) are not included in the data sets.

So, I’ll be typing in ISO alpha two codes into one of my data tables and will end up with a table that connects their sequential number system to ISO. Then I can bring this bridge and all of the other tables into a database, relate them to the bridge based on the sequential number, and create new tables out of them that have ISO numbers, so I can join them to my GIS file based on ISO. Or I guess I could add the sequential number field to my countries shapefile and join each table to it based on the sequential number.

Anyway – happy Data Piloting (or Pivoting, if you prefer).

Formulas for Working With Census ACS Data in Excel / Calc

Friday, June 26th, 2009

After downloading US census data, you often need to reformat it before using it. It’s quite common that you download files where the population is broken down by gender and age, and you need to aggregate the data to get a total or divide a particular characteristic to get a percent total. This is pretty straightforward if you’re working with decennial census data, but data from the American Community Survey (ACS) is a little trickier to deal with since you’re working with estimates that have a margin of error. When creating new data, you also have to calculate what the margin of error is for your derived numbers. I’ll walk through some examples of how you would do this in a spreadsheet (the formulas below will work in either Excel or Calc).

Creating an Aggregate

We’ll use the following data in our example:

screenshot1

We have the total population of people three years and older who are enrolled in school, and a breakdown of this population enrolled in grades 1 through 4 and grades 5 through 8 in a few counties in New York, with margins of error for each data point. Our data is from the 3 year averaged 2005-2007 American Community Survey.

Let’s say we want to create a total for students who are enrolled in grades 1 through 8 for each county. We create a new column and sum the estimates for each county with the formula e3+g3, or sum(e3:g3).

To calculate a margin or error (MOE) for our grade 1 to 8 data, first we have to use the find and replace command to get rid of the “+/-” signs in the MOE column, so our spreadsheet will treat our values as numbers and not text (this is an issue if you downloaded the data as an Excel file – if you download a txt file the +/- is not included). Depending on the dataset you’re working with, you may also need to replace dashes, which represent data that was null or not estimated.

Once the data is cleaned up, we can insert a new column with this formula:

=SQRT((F3^2)+(H3^2))

This calculates our new margin of error by squaring the moes for each of our data points, summing the results together, and taking the square root of that sum. In other words,

=SQRT((MOE1^2)+(MOE2^2))

Once that’s done, you may want to round the new MOE to a whole number.

Creating a Percent Total

Let’s calculate the percentage of the population 3 years and older enrolled in school that are in grades 1 through 8. Based on what we have thus far (I hid the columns E,F,G, and H for grades 1-4 and 5-8 in this screenshot, as we don’t need them):

screenshot-2

We insert a new column where we divide our subgroup by the total, as you would expect – I3/C3. In the next column we insert the following formula to create a MOE for our new percent total:

=(SQRT((J3^2)-((K3^2)*(D3^2))))/C3

This one’s a little weightier than our last formula. We’re taking the square of our percent total (K3) and the square of the MOE of the total population (D3), multiplying them together, then subtracting that number from the square of the MOE of our subgroup (J3). Then we take the square root of the whole thing, then divide it by our total population (C3). If you’re saying – HUH? Maybe this is clearer:

=(SQRT((MOEsubset^2)-((PercentTotal^2)*(MOEtotalpop^2))))/TotalPop

Finally, we have something like this:

screenshot-3

Based on our data, we can say things like “There were approximately 30,556 students enrolled in 1st through 8th grade per year in Dutchess County, NY between 2005 and 2007, plus or minus 1,184 students. An estimated 37% of the population enrolled in school in the county was in the 1st through 8th grade, plus or minus 1%.” The ACS estimates have a 90% confidence interval.

Wrap Up

In this example we worked with aggregating and calculating percentages based on characteristics. We could also use these same formulas to aggregate data by geography, if we wanted to add the characteristics for all the counties together.

For the full documentation on working with ACS data, take a look at the appendix in the Census’ ACS Compass Guide, What General Data Users Need to Know. It provides you with the formulas in their proper statistical notation (for those of you more mathematically inclined than I) and includes formulas for calculating other kinds of numbers, such as ratios and percent change. It does provide you with worked-through examples, but not with spreadsheet formulas. I used their examples when I created formulas the first time around, so I could compare my formula results to their examples to insure that I was getting it right. I’d strongly recommend doing that before you start plugging away with your own data – one misplaced parentheses and you could end up with a different (and incorrect) result.

Creating a New Shapefile in ArcGIS: Part II

Friday, May 15th, 2009

In my previous post I gave an overview of how to create a shapefile from scratch, where we created a point layer to identify places and neighborhoods in NYC. In this post, I’ll pick up where we left off.

Whenever you create new features in a shapefile, ArcGIS automatically adds a couple of fields, including an auto-number ID field that uniquely identifies each feature. This was sufficient for our example as the 291 place names we were working with do not have a standard ID number that represents them. If we were creating features that did have a recognized ID number or code, we certainly would want to add an additional field to hold that number. This would allow us to share and relate our data to other datasets that use that conventional ID. For example, if we had a layer with the 50 states, we would want to have a FIPS number or the two digit postal code for each state in the attribute table, so we could relate our states feature to the zillions of other state-based data tables out there that also use these codes.

It’s also helpful to add other identifiers to relate our place names to some larger geographic area. Why? Let’s say we want to filter our neighborhoods by borough – perhaps we just want to label neighborhoods in Manhattan or calculate distances only between places that appear in the Bronx. It would be useful to have a borough code or some other code associated with each of our place names for running queries.

scrnshot6As it turns out, the City of New York does use a standardized system of three digit codes to identify all boroughs and community districts in the city. In our example, the code for Manhattan Community District 12, which contains Inwood and Washington Heights, is 112. The first digit identifies the borugh and the second two digits identify the district. It would be a good idea to assign each of our neighborhoods this district code, so we could filter our features by either borough or district.

When we create each feature, we could manually type in the code in it’s own field just like we added the neighborhood names, but that would be rather tedious – and unnecessary. A better choice would be to do a spatial join. Whereas a “regular” join allows us to join attribute tables based on a common ID field, a spatial join allows us to assign attributes to one layer based on their geographical relationship to another layer.

scrnshot7In the Table of Contents, right click on the neighborhoods layer and choose Joins and Relates – Joins. We’ll get the familiar Join dialog box. However, if you hit the first drop down box that says Join Attributes From a Table and choose Join Data Based on Spatial Location, we’ll get the options for doing a spatial join. Choose the community districts as the layer to join to the neighborhoods, and since we’re joining points to polygons we’ll choose parameters that are relevant for relating these two features. In this case, give each point (neighborhood / place) the attributes of the polygons (districts) that it falls inside. ArcGIS will create a new point layer with the joined fields when you hit OK. Open the attribute table of the new point layer, and you’ll see the additional fields, including the community district numbers. You’ll also get some rather useless fields from the district layer, like the length and area of each district, which you can safely delete.

So instead of tediously entering these numbers by hand for each neighborhood, we simply run the spatial join process once (after we’ve finished adding the points for all 291 neighborhoods) and the IDs are automatically added.

Creating a New Shapefile in ArcGIS: Part I

Thursday, May 14th, 2009

I’m working with a grad student who needs to create a new shapefile from scratch, and thought I’d turn the instructions for doing this in ArcGIS into a tutorial / post for creating new point layers. The idea in this example is to create a point layer that shows the relative center of 291 neighborhoods in New York City. Since many of these neighborhoods are place names without finite boundaries, we’ll have to use various sources (NYC Planning map and Rand McNally street maps) to pinpoint the relative center of each neighborhood.

These points will be used for labeling each neighborhood. In this case, creating a new, georeferenced layer is preferable to creating 291 text labels on a map that are not tied to geography in any way.

  • The first step is to download some layers from the NYC Department of Planning to use for reference, such as a layer for boroughs and community districts. Community districts are used by the city to approximate neighborhoods. Many of the neighborhoods that we are trying to plot are, in many cases, smaller areas or places within these boundaries.
  • scrnshot1Next, open ArcCatalog and create a folder to store the data. Then, right click on the folder in the table of contents and select New – Shapefile. In the Create New Shapefile window, we give the shapefile a name, select Point as the feature type, and hit Edit to change the
    coordinate system. In the Spatial Reference Properties menu, we’ll import a coordinate system from one of the files we downloaded from NYC Planning, which uses New York State Plane for Long Island. Click OK and OK again, and we’ll have a new shapefile.
  • scrnshot2Right now, our new shapefile isn’t very exciting because it’s empty – you can preview it in the catalog to see for yourself. If you preview the table, you’ll see that Arc created three fields – FID, Shape, and ID, which it will automatically fill in when we start creating features. Before we do that, we’ll have to add an additional column to store the name of the neighborhood. To do that, open ArcMap and add the neighborhood layer to the map. Then, right click on the layer in the Table of Contents and open the attribute table. Hit the Options button and choose Add Field. In the Add Field menu, name the new field, choose Text as the type, and change the length to 80 (in case we have some neighborhoods with long names). Hit OK, and you’ll have a new field.
  • scrnshot3Let’s add our reference layers next. Hit the Add Data button (or File – Add Data), and add the borough boundaries and community districts (if you don’t see anything after you add them, right click on one of these layers and choose Zoom to Layer). Go into the symbology tab for each layer and change their display to make the areas appear more distinctive. Make sure your neighborhood layer is on top of your other layers.
  • Now it’s time to start plotting neighborhoods. Go to the Selection menu – Set selectable Layers, and turn off all the layers except the neighborhood layer. Then, use the dropdown on the Editor Toolbar and Select Start Editing (if you don’t see the Editor Toolbar, make sure it’s activated by going to View – Toolbars and select it). scrnshot4On the Editor Toolbar, make sure the Create New Feature task is activated and that the target layer is the neighborhood layer, and not any of the reference layers. Zoom in to the top of Manhattan. With the Pencil tool selected in the toolbar, and using your sources (NYC planning map, Rand McNally street map, whatever), click on the map to approximate where the center of the Inwood neighborhood would be. A blue dot should appear on the map. Then right-click on the neighborhoods layer in the Table of Contents and open the attributes table. You’ll see a brand new record for your new dot. Click in the empty field for Name, type in the name of the neighborhood, and press enter.
  • That’s the process! Next, locate the area for Washington Heights and click on the map to create the point for that neighborhood. The new dot will appear hi-lighted, while the previous dot for Inwood will now appear as a regular point symbol. Now it’s just a matter of plugging away. Make sure to occasionally save your edits by clicking Editor and choosing Save Edits. If you make a mistake, you can delete a feature by selecting the Select Feature tool in the regular tool bar (white arrow with a blue and white feature box next to it), select the particular point, and hit the delete key. If you’re having trouble pinpointing the right location for the neighborhood, try downloading additional reference layers to guide you. The NYC DOITT also has a page with GIS layers for the city with features like parks and streets that may be helpful. When you’re finished editing, choose Stop Editing under the Editor Toolbar.

    scrnshot5

  • The ultimate goal of this exercise was to get neighborhood labels to appear without the actual point. To accomplish this, change the point symbol for the neighborhood to nothing by going into the Symbology tab for the layer and reducing the fill to no color, the outline to nothing, and the size to zero. Then open the Labels tab under the Properties menu, turn labels on using the name field as the label field, select Placement Properties and choose the setting to place the labels on top of the point, hit ok, and voila! Perfectly centered neighborhood names that are part of a georeferenced layer.

This covers the basics. In the next post, I’ll go a little further and discuss adding additional fields to the new file, without having to type them in manually.

Transform Projections with GDAL / OGR

Tuesday, April 14th, 2009

The GDAL / OGR tools are an open source, cross platform, command-line toolkit that can be used for viewing GIS metadata, performing attribute queries, and converting file formats, among other things. It can also be used for transforming coordinate systems and projections for GIS files. I’ll demonstrate in this brief tutorial how to accomplish this using the OGR tools, which are for vector based GIS. The raster based GDAL tools work in a similar fashion.

Viewing basic coordinate system / projection info:

ogrinfo -al -so world_wgs.shp

Where ogrinfo is the name of the tool, -al is a switch to get detailed info about the layer, -so is a switch to display summary info, and world_wgs.ship is the name of our file. Run that command and we’ll get something that looks like this, with info about the features, coordinate system, and attribute fields of our shapefile:

INFO: Open of `world_wgs.shp’
using driver `ESRI Shapefile’ successful.

Layer name: world_wgs
Geometry: Polygon
Feature Count: 243
Extent: (-179.808664, -89.677397) – (179.808664, 83.435942)
Layer SRS WKT:
GEOGCS["GCS_WGS_1984",
DATUM["WGS_1984",
SPHEROID["WGS_1984",6378137,298.257223563]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]]
CNTRY_NAME: String (254.0)
FIPS_CNT_1: String (254.0)
ISO_2DIGIT: String (254.0)
ISO_3DIGIT: String (254.0)
STATUS: String (254.0)
COLORMAP: Real (18.6)
CONTINENT: String (254.0)
UN_CONTINE: String (254.0)
REGION: String (254.0)
UN_REGION: String (254.0)

Convert coordinate systems supported by EPSG

GDAL / OGR and most of the open source GIS software supports projections and coordinate systems that are part of the EPSG library. If you want to do a conversion between two coordinate systems and they are both supported by EPSG, you just have to reference the EPSG code that’s used to identity the system that you want to project to. You can look up codes using spatialreference.org.

Let’s say we want to convert our shapefile that’s in WGS 84 (common lat and long) to NAD 83 (used frequently in North America):

ogr2ogr -t_srs EPSG:4269 world_new.shp world_wgs.shp

Where ogr2ogr is the name of the tool, -t_srs is the command for transforming from one coordinate system to the other, EPSG:4269 is the code that identifies the coordinate system we want the new file to have – NAD83, world_new.shp is the name of the output file that will have the new projection that we want, and world_wgs.shp is our input file. If you run the command and get no error message, you’re in good shape. Just run the ogrinfo command on the new file to verify that it’s been re-projected.

Convert coordinate system not supported by EPSG

The EPSG library is extensive, but doesn’t contain everything, particularly some global and continental map projections. GDAL / OGR can still do the job, but you’ll have to provide the tool with the proper frame of reference since the EPSG library doesn’t have the info. Let’s say we want to project our WGS file to the Robinson Projection, which is not part of EPSG.

First, go back to spatialreference.org and search for Robinson. Its ID code is ESRI 54030 – not part of the EPSG library. Click on the link for the projection to open its window. You’ll be able to look at the projection data in a number of standard file formats. Select OGC_WKT from the list, and it will open the text in a new window, showing you the parameters of that projection. In your browser, go up to file, save as, and save the file as robinson_ogcwkt.txt in the same directory as the shapefile you want to reproject.

Now that you have the projection info stored in the text file, run the following command to make the conversion:

ogr2ogr -t_srs robisnon_ogcwkt.txt world_rob.shp world_wgs.shp

It’s the same command as our previous one, except that you’re referencing the text file with your data instead of an EPSG code.

Define an undefined coordinate system

If you run the ogrinfo command and your coordinate system is undefined, you should define it before doing anything else, and you must define an undefined projection before converting to another projection. Look at the metadata that came with you file or go back to the source to figure out what it is. For example the US Census Bureau Generalized Cartographic Boundary Files for 2000 are in NAD83 according to their metadata, but the files lack a projection definition.

To define one, use the following command:

ogr2ogr -a_srs EPSG:4269 states_nad83.shp states_unknown.shp

The only difference here is the -a_srs command is used to assign a coordinate system to a file – the rest of the parameters are the same. If you’re defining a non-EPSG projection, use the same method from the previous example – download a definition file from spatialreference.org and use the file name in place of the EPSG code.

More help and where to download:

UC Santa Barbara NCEAS and the UC Davis Soil Lab both have short tutorials and sample commands of GDAL / OGR.

If you want to thumb through the world’s map projections, the folks at radicalcartography have a nice projection reference page with visuals and brief descriptions.

Visit the GDAL / OGR page for downloading, or if you’re a Windows or Mac user, you can download QGIS and GDAL / OGR together from the QGIS download page. Linux users can get GDAL / OGR via your package handler – depending on your distro, you may have it already.

QGIS: Data Defined Labeling and Table Joins

Saturday, March 7th, 2009

A little while ago I posted a text file with geographic centroids (centers) for each of the world’s countries. The reason why I put this together was that I wanted to test the data defined labeling features in QGIS. While automatic labeling in QGIS isn’t so hot (overlapping labels, multiple lables for each polygon), there are some powerful features for storing and referencing columns for annotation within the attribute table of shapefiles. One of the neat features is the ability to place labels based on coordinates stored in the attribute table.

The first step was to take the centroids file and join in to a shapefile of the worlds countries based on a common ID field, in this case FIPS country codes. QGIS doesn’t support table joins directly, but you can accomplish this with a good plugin called fTools, which includes a lot of additional and useful features. The instructions for getting fTools up and running are available on the fTools website; the installation doesn’t require you to download any files, you just handle everything through the QGIS plugin manager (if you have trouble seeing the plugin manager or getting fTools to appear, check to make sure that you have python installed on your machine). Once fTools is up and running, you’ll see a Tools dropdown menu next to your other menus – drop it down, select data management tools and join attribute tables. You’ll get a dialog box asking which shapefile and field you want to join and which shapefile or table you want to join to it. The plugin only supports joins from other shapefiles and dbf tables, so you have to save the save the country centroids text file as a dbf before you do the join (you can do this in Calc or a pre-2007 version of Excel). These aren’t dynamic joins; fTools will create a new shapefile with the table fields attached.

Once the join is complete, you can add the new shapefile with the new fields, click on the layer, and navigate to the labels tab. Hit the checkbox to turn the labels on, select the field that contains the label in the dropdown box at the top, then select data defined position from the menu below. You’ll see a new series of dropdowns on the right, and you can select your longitude column for the X coordinate and latitude column for the Y coordinate. Hit OK, and voila! You’ll have labels that are centered in the middle of each country.

Of course, the label placement will not be perfect in every case. There will be label overlap in areas with small countries, areas with many countries clustered together, and with countries that have long names. The scale and size of the font will also be a factor, and placing the country name in the center is not always ideal for small island nations. However, you can easily change the label placement by going into an edit mode and changing the coordinates in the attribute table to get optimal placement. You can mouse over the map and use the coordinate information that’s displayed beside the scale in the lower right-hand corner of the window to determine which coordinates are most optimal for a given situation. If you produce several maps at the same area and scale, you can use the same settings over and over again. You can also globally change the placement of all the labels using some of the other label options, such as placing all labels above or to the top-right of the centroid.

Now in order for all of this to work, the coordinates in the country centroid file must be in the same coordinate system as the shapefile. Since the country centroid file uses basic latitude and longitude, I was able to do this with a shapefile that was in the basic WGS 84 geographic coordinate system. If you’re using a different geographic coordinate system or a projected coordinate system, you’ll have to convert the coordinates in the centroid file to match that system. I haven’t delved into this too deeply yet, but there are a number of free tools that you can download that should do this – one of them is called GEOTRANS, and it’s available for free download from the NGA. It can handle batch transformations of coordinate data stored in text files, and supports conversions to several different geographic and projected systems.

QGIS Label Placement With XY Coordinates

QGIS Label Placement With XY Coordinates

Working With Lat Long, Numeric Data

Friday, October 24th, 2008

Sorry that October has obviously been a pretty weak month for posts. I’ve been driven to distraction lately and haven’t done much GIS related work.

I was working on a project this week that involved manipulating data tables, so I thought I’d share a couple tips here. A number of months ago I wrote a post about manipulating FIPS codes and text-based ID fields. But what if you have to manipulate numeric fields? Adding decimal places, zeros, etc? The answer is – math!

In one field, I had a population figure from the 1970 Census that had been rounded to the hundreds place, so it was listed like this:

Bronx    14718

I wanted to make this a little more explicit by adding the appropriate zeros, so in Excel (or Calc if you prefer) I created a formula to multiply this by 100 =(c2*100) to get the full number with zeros:

Bronx    1471800

I also had fields with latitude and longitude coordinates in decimal degrees, but they lacked decimal points. The longitude field also lacked the minus sign, which means if we plotted the points they would end up in Asia instead of North America (longitude east of the dateline and west of the prime meridian is notated as negative in decimal degrees, as is latitude south of the equator). I knew from the metadata that each coordinate pair was precise to four decimal places, and I knew all of my points were in North America. So I created a formula where I took the latitude and divided by ten thousand =(c3/10000) and took the longitude, divided by ten thousand and multiplied by -1 =((c4/10000)*-1). Here’s the before and after:

Bronx    408492    738800

Bronx    40.8492    -73.8800

Some of this may seem pretty obvious, but if you’re used to working with text-based ID fields all of the time (like I am), it’s easy to forget that all you need is simple math to fix number fields.

The last step I took was to check for null values. A few of my data points had 0,0 listed for lat and long, because coordinate data was missing for those particular places. The problem is that 0 IS a value! If we plotted this data, these points would show up where the equator and prime meridian meet below western Africa. You have to represent “no data” as a blank value or null, and not as a zero. I fixed those, plotted, and was good to go.

FIPS and ISO Country Code Table

Tuesday, July 22nd, 2008

I’ve created a bridge table to relate various country codes: FIPS 10, the three versions of ISO 3166 (two alpha, three alpha, and three numeric), NATO, and the internet country codes. Ok, ok, I didn’t create it, the CIA World Factbook did and has it listed in their appendix, but in HTML. I just copied it and made it database friendly: fixed column headings, removed dashes for nulls, removed trailing and leading spaces, converted numeric codes to text while preserving zeros, and saved it in a tab-delimited text file. You can download it from the Resources page.

If you open it in Excel, Excel annoyingly converts the three digit numeric ISO code back to a number and drops the leading zeros. You can fix this using a trick I illustrated in a previous post, or import it into Calc or Access instead. You’ll be able to designate that field as text during the import process. If you stash the table in a geodatabase, you will be able to relate features and tables that use different codes through this bridge table.

I was also searching for the ISO 3166-2 codes for 1st level subdivisions within countries (like states in the US or provinces in Canada), but had trouble finding anything official as I think they may be copyrighted. I eventually found one source that claims that they received permission to post the codes, so you can take a look there. I also stumbled across a good reference source called Statoids, which gives you background information, lists, and codes for subdivisions on a country by country basis. I’ve added both of these to the Links – Resources page.

Adding Long / Lat XY Data to ArcMap

Monday, July 7th, 2008

Here’s a tutorial I’ve been meaning to write: adding a table of longitude and latitude coordinates to ArcMap and turning them into features. For this example, I’ll be using place names from the GEOnet Names Server country files. The US National Geospatial Intelligence Agency has a pretty extensive list of geographic features for each country, with coordinates in many formats, including longitude and latitude in decimal degrees. I’ll use Botswana in southern Africa as an example, as it has a small record set and because I have some admin boundaries handy that I’ve downloaded from SAHIMS.

  • Download the file from the GNS and unzip it. It is a tab-delimited text file. If you like, you can open it in Excel or another spreadsheet to see what it looks like. This works fine for this example, but won’t work for larger or more populated countries because the files will exceed the maximum number of records that a spreadsheet can handle (65k). You’ll need to import the file into a database (Access for example) if you want to take a look in those cases. In either event, you’ll be able to add the text file directly to ArcMap, so no worries.
  • Add XY Data Open ArcMap and under the Tools menu, select Add XY Data. In the dialog box, you’ll select the file that contains your XY coordinates. Choose the text file you’ve downloaded. ArcMap will then search through the fields and look for appropriate ones to add as X and Y fields. In this case, it should correctly choose LONG for X and LAT for Y. If Arc couldn’t figure it out, you would have to specify which columns have the coordinates. Longitude is ALWAYS the X coordinate, and Latitude is ALWAYS the Y. Finally, you’ll select a projection. Choose the standard geographic coordinate system WGS 1984, which is usually a safe bet when adding long/lat data from most sources.
  • Add XY Dialog BoxHit OK, and Arc will plot the coordinates (after you click through the warning message). In this example, it looks like there is one wayward point, way to the north. When you see something like this, it often means that one of the coordinates is missing a minus sign: latitudes below the equator are negative, as are longitudes east of the international date line and west of the prime meridian. If you use the identity tool, you’ll see that the minus sign for latitude for this wayward point is missing. The easiest thing to do would be to go back into the text file, edit it, and add it to ArcMap again.
  • Even though Arc has plotted the points, they still don’t exist as features (remember the warning message? That’s essentially what it was saying). Select the plotted points in the Table of Contents, right-click, select Data, and select Export. Export the points out as a new shapefile or a feature class in a geodatabase. Then add the new features to the map.
  • At this point, it may be helpful to have a frame of reference for all of these points. Get your hands on some administrative layers, like country boundaries. I downloaded the outline of Botswana from SAHIMS. This step usually requires projecting and reprojecting, as you’ll need to get your points layer to match the projection of the other files you’re working with. I always use the ArcToolbox within ArcCatalog to fiddle with projections and then add the finished files to a new, blank map in ArcMap. In my case, the Botswana boundary was undefined – I had to consult the metadata from their website to figure out what the projection is (NAD 1927) and then define it using the ArcToolbox (Data Management Tools, Projections and Transformations, Define Projection). Then, I had to convert the Botswana points layer from WGS 1984 to match the boundary’s NAD 1927 projection (using Data Management Tools, Projections and Transformations, Feature, Project).
  • Plotted points with boundaryAdd the projected boundary and reprojected points to your map. Many of these points are point features (villages, towns, farms, mountain peaks), while others represent the geographic centers of lines (roads, rivers) or areas (administrative areas, parks, reserves). You’ll probably want to extract certain kinds of features. At this point, you’ll want to take a look at the attribute table for the points file and consult the NGS description for the names files. The description will tell you what each of the data columns represents and what all of the codes mean. The FC field will come in quite handy here, as it designates categories for each feature. So if we wanted to extract populated places, under the Selection Menu in ArcMap we could do a Select by Attribute where the field FC is equal to P, which is the code for populated place features. Once they are selected, you can do a Data, Export to create a new shapefile with just those features.
  • Alternatives do abound here. If you prefer, you could do a lot of the work of editing and creating feature subsets within a geodatabase. You can also follow these same, general procedures using open source tools (I believe that QGIS has a tool for adding XY data). And while we’re discussing a specific example here, the same basic steps would apply for any XY dataset.

Copyright © 2012 Gothos. All Rights Reserved.
No computers were harmed in the 0.400 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.