Archive for the ‘Resources’ Category

New Version of Introductory GIS Tutorial Now Available

Sunday, October 7th, 2012

The latest version of my Introduction to GIS tutorial using QGIS is now available. I’ve completely revised how it’s organized and presented; I wrote the first two manuals in HTML, since I wanted something that gave me flexibility with inserting many images in a large document (word processors are notoriously poor at this). Over the summer I learned how to use LaTeX, and the result for this 3rd edition is an infintely better document, for classroom use or self study.

I also updated the manual for use with QGIS 1.8. I’m thinking that the addition of the Data Browser and the ability to simply select the CRS of the current layer or project when you’re doing a Save As (rather than having to select the CRS from the master list) will save a lot of valuable time in class. With every operation that we perform we’re constantly creating new files as the result of selections and geoprocessing, and I always lose a few people each time we’re forced to crawl through the file system to add new layers we’ve created. These simple changes should speed things up. I’ve updated the manual throughout to reflect these changes, and have also updated the datasets to reflect what’s currently available. I provide a summary of the most salient changes in the introduction.

American Factfinder Tutorial & Census Geography Updates

Monday, July 23rd, 2012

I’ve been en-meshed in the census lately as I’ve been writing a paper about the American Community Survey. Here are a few a things to share:

  • Since I frequently receive questions about how to use the American Factfinder, I’ve created a brief tutorial with screenshots demonstrating a few ways to navigate it. I illustrate how to download a profile for a single census tract from the American Community Survey, and how to download a table for all ZIP Code Tabulation Areas (ZCTAs) in a county using the 2010 Census.
  • New boundaries for PUMAs based on 2010 census geography have been released; they’re not available from the TIGER web-based interface yet but you get can state-based files from the FTP site. I’ve downloaded the boundaries for New York and there are small changes here and there from the 2000 Census boundaries; not surprising as PUMAs are built from tracts and tract boundaries have changed. One big bonus is that PUMAs now have names associated with them, based on local government suggestions. In NY State they either take the name of counties with some directional element (east, central, south, etc), or the name of MCDs that are contained within them. In NYC they’ve been given the names of community districts.
  • I’ve done some digging through the FAQs at https://askacs.census.gov/ and discovered that the census is going to stick with the old 2000 PUMA boundaries for the next release of the American Community Survey – the 2011 ACS will be released at the end of this year. 2010 PUMAs won’t be used until the 2012 ACS, to be released at the end of 2013.
  • Urban Areas are the other holdovers in the ACS that use 2000 vintage boundaries. The ACS will also transition to the 2010 boundaries for urban areas in the 2012 ACS.
  • In the course of my digging I discovered that the census will begin including ZCTA-level data as part of the 5-year ACS estimates, beginning with the 2011 release this year. 2010 ZCTA boundaries are already available, and 2010 Census data has already been released for ZCTAs. The ACS will use the 2010 vintage ZCTAs for each release until they’re redrawn for 2020.

Country Centroids File Updated

Monday, February 13th, 2012

A brief note – I’ve updated and replaced the country centroids file that I was previously hosting. I extracted data with geographic centroids in latitude and longitude for each country and dependency in the world using extracts from the NGA’s GNS and the USGS GNIS. Data is current as of Feb 2012, with long and short names for countries and two letter alpha FIPS and ISO codes for identification and attribute linking. Available for download on the Resources page.

ACS Trend Reports and Census Geography Guide

Sunday, February 12th, 2012

I recently received my first question from someone who wanted to compare 2005-2007 ACS data with 2008-2010. With the release of the latter, we can make historical comparisons with the three year data for the first time since we have estimates that don’t overlap. We should be able to make some interesting comparisons, since the first set covers the real estate boom years (remember those?) and the second covers the Great Recession. One resource that makes such comparisons relatively painless is over at the Missouri Census Data Center. They’ve put together a really clean and simple interface called the ACS Trends Menu, which allows you to select either two one period estimates or two three period estimates and compare them for several different census geographies – states, counties, MCDs, places, metros, Congressional Districts, PUMAs, and a few others – for the entire US (not just Missouri). The end result is a profile that groups data into the Economic, Demographic, Social, and Housing categories that the Census uses for its Demographic Profile tables. The calculations for change and percent change for the estimates and margins of error are done for you.

Downloading the data is not as straightforward – the links to extract it just brought me some error messages, so it’s still a work in progress. Until then, a simple copy and paste into your spreadsheet of choice will work fine.

ACS Trends Menu

If you like the interface, they’ve created separate ones for downloading profiles from any of the ACS periods or from the 2010 Census. The difference here is that you’re looking at one time frame; not across time periods. The interface and the output are the same, but in these menus you can compare four different geographies at once in one profile. Unlike the Trends reports, both the ACS and 2010 Census profiles have easy, clear cut ways to download the profiles as a PDF or a spreadsheet. If you’re happy with data in a profile format and want an interface that’s a little less confusing to navigate than the American Factfinder, these are all great alternatives (and if you’re building web applications these profiles are MUCH easier to work with – you can easily build permanent links or generate them on the fly).

The US Census Bureau also recently put together a great resource called the Guide to State and Local Census Geography. They provide a census geography overview of each state: 2010 population, land area, bordering states, year of entry into the union, population centroids, and a description of how local government is organized in the state – (i.e. do they have municipal civil divisions or only incorporated cities and unincorporated land, etc). You get counts for every type of geography – how many counties, tracts, ZCTAs, and so on, AND best of all you can download all of this data directly in tab delimited files. Need a list of every county subdivision in a state, with codes, land area, and coordinates? No problem – it’s all there.

Thiessen Polygons and Listing Neighboring Features

Monday, January 2nd, 2012

I was helping someone with a project recently that I thought would be straightforward but turned out to be rather complex. We had a list of about 10,000 addresses that had to be plotted as coordinates, and then we needed to create Thiessen or Voroni polygons for each point to create market areas. Lastly we needed to generate an adjacency table or list of neighbors; for every polygon list all the neighboring polygons.

For step one I turned to the USC Geocoding service to geocode the addresses; I became a partner a ways back so I could batch geocode datasets for students and faculty on my campus. Once I had coordinates I plotted them in ArcGIS 10 (and learned that the Add XY data feature had been moved to File > Add Data > Add XY Data). Step 2 seemed easy enough; in Arc you go to ArcToolbox > Analysis Tools > Proximity > Create Thiessen Polygons. This creates a polygon for each point and assigns the attributes of each point to the polygon.

I hit a snag with Step 3 – Arc didn’t have a tool for generating the adjacency table. After a thorough search of the ESRI and Stack Exchange forums, I stumbled on the Find Adjacent Features Script by Ken Buja which did exactly what I wanted in ArcGIS 9.2 and 9.3, but not in 10. I had used this script before on a previous project, but I’ve since upgraded and can’t go back. So I searched some more until I found the Find Adjacent & Neighboring Polygons Tool by cmaene. I was able to add this custom toolbox directly to ArcToolbox, and it did exactly what I wanted in ArcGIS 10. I get to select the unique identifying field, and for every ID I get a list of the IDs of the neighboring polygons in a text file (just like Ken’s tool). This tool also had the option of saving the list of neighbors for each feature directly in the attribute table of a shapefile (which is only OK for small files with few neighbors; fields longer than 254 characters get truncated), and it gave you the option of listing neighbors to the next degree (a list of all the neighbor’s neighbors).

Everything seemed to run fine, so I re-ran the tool on a second set of Thiessen polygons that I had clipped with an outline of the US to create something more geographically realistic (so polygons that share a boundary only in the ocean or across the Great Lakes are not considered neighbors).

THEN – TROUBLE. I took some samples of the output table and checked the neighbors of a few features visually in Arc. I discovered two problems. First, I was missing about a thousand records or so in the output. When I geocoded them I couldn’t get a street-level address match for every record; the worse case scenario was a plot to the ZCTA / ZIP code centroid for the address, which was an acceptable level of accuracy for this project. The problem is that if there are many point features plotted to the same coordinate (because they share the same ZIP), a polygon was created for one feature and the overlapping ones fell away (you can’t have overlapping Thiessen polygons). Fortunately this also wasn’t an issue for the person I was helping; we just needed to join the output table back to the master one to track which ones fell out and live with the result.

The bigger problem was the output was wrong. I discovered that the neighbor list for most of the features I checked, especially polygons that had borders on the outer edge of the space, had incomplete lists; each feature had several (and in some cases, all) neighbors missing. Instead of using a shapefile of Thiessen’s I tried running the tool on polygons that I generated as feature classes within an Arc geodatabase, and got the same output. For the heck of it I tried dissolving all the Thiessen’s into one big polygon, and when I did that I noticed that I had orphaned lines and small gaps in what should have been one big, solid rectangle. I tried checking the geometry of the polygons and there were tons of problems. This led me to conclude that Arc did a lousy job when constructing the topology of the polygons, and the neighbor tool was giving me bad output as a result.

Since I’ve been working more with GRASS, I remembered that GRASS vectors have strict topology rules, where features have shared boundaries (instead of redundant overlapping ones). So I imported my points layer from a shapefile into GRASS and then used the v.voroni tool to create the polygons. The geometry looked sound, the attributes of each point were assigned to a polygon, and for overlapping points one polygon was created and attributes of the shared points were dumped. I exported the polygons out as a shapefile and brought them back into Arc, ran the Find Adjacent & Neighboring Polygons tool, spot checked the neighbors of some features, and voila! The output was good. I clipped these polygons with my US outline, ran the tool again, and everything checked out.

Morals of this story? When geocoding addresses consider how the accuracy of the results will impact your project. If a tool or feature doesn’t exist assume that someone else has encountered the same problem and search for solutions. Never blindly accept output; take a sample and do manual checks. If one tool or piece of software doesn’t work, try exporting your data out to something else that will. Open source software and Creative Commons tools can save the day!

Footnote – apparently it’s possible to create lists of adjacent polygons in GRASS using the sides option in v.to.db, although it isn’t clear to me how this is accomplished; the documentation talks about categories of areas on the right and left of a boundary, but not on all sides of an area. Since I already had a working solution I didn’t investigate further.

Goings on at FOSS4G 2011

Thursday, September 15th, 2011

I’m at FOSS4G in Denver this week (Free and Open Source for Geospatial conference) and have learned a few things (eventually all presentations, audio and visuals of slides, will be available online):

  • There will be a QGIS update, version 1.71, sometime this month; it’s a minor release that will fix a few bugs. Some future version of QGIS will included a Data Browser (think Arc Catalog).
  • For folks who have asked me how they can get more cartographic production power out of QGIS, Inkscape looks like a good option – folks at UC Davis have been experimenting with it with some success.
  • Learned about a documentation system for open source (or any) project called Sphinx; documents are stored as restructured text files with some Python scripts that link them together and provide formatting for output and display.
  • Got a great, clear, concise overview of what’s involved with an open source web mapping stack.
  • There’s a study at Idaho State (affiliated with the group of folks there that created Map Window)that’s attempted to define the core functions of GIS based on a survey of GIS users. You can view their data by contacting the project lead.
  • Educators at a community college in Arizona are experimenting with an open source raster program called Opticks; a viable solution to more expensive packages like ERDAS and IDRISI.
  • There are some new Python libraries you can use to create and mine KML data
  • The FCC used a clever method for collapsing / aggregating US Census geography from the block level to create their Broadband Map.
  • While I’ve heard of and poked around the Open Street Map Project, I never realized that many of the users were contributing to the project by walking, cycling, and driving around with GPS units, which they upload to create and update road networks around the world. They also use some free datasets (like the Census TIGER files and equivalents from other countries) to augment and provide a frame of reference for their systems.
  • Data in the UK is finally opening up some more, and demand for products from the Ordnance Survey have been off the charts.
  • My presentation on using QGIS in an Academic library went pretty well, and I was pleased to discover I’m not the only GIS librarian at the conference! I’ve met folks from Ontario, Alberta, and Kansas.

ZIP Code KML Map for NYC Census Data

Saturday, September 10th, 2011

With the release of both the 2010 Census profiles for ZCTAs (ZIP Code Tabulation Areas) and the TIGER line files for 2010 Census geographies, I created another Google Map finding aid for NYC neighborhood data by ZIP code (I previously created one for PUMAs with American Community Survey data). Once again I used the Export to KML plugin that was created for ArcGIS. This allowed me to use the TIGER shapefile in ArcGIS to create the map I wanted and then export it as a KML, while using fields in the attribute table of each feature to insert the ZCTA number into stable links for the census profiles, automatically generating unique urls for each feature. Click on the ZCTA in the map, and then click on a link to open a profile directly from the new American Factfinder.

There were two new obstacles I had to contend with this time. The first was that my department has finally migrated to Windows 7 from Windows XP, and I upgraded from ArcGIS 9.3 to 10. I had to reinstall the Export to KML plugin (version 2.5.5) and ran into trouble; fortunately all the work-arounds were included in the plugin’s documentation. I don’t have administrator rights on my machine, so I had to have someone install the plugin as an administrator; this included running the initial setup file AND running Arc as an administrator as you add and turn the plugin on. That was straightforward, but when I ran it the first time I got an error message – there’s a particular Windows dll or ocx file that the plugin needs and it was missing (presumably something that was included in XP but not in 7). I downloaded the necessary file, and with administrator rights moved it into the system32 folder and registered the file via the command line. After that I was good to go.

The second issue was with the Census Bureau’s new American Factfinder. With the old Factfinder the urls that were generated as you built and accessed tables were static and you could simply save and bookmark them. Not the case in the new Factfinder; you can bookmark some basic tables but most of them are “too complex to bookmark”; you can save and download queries from the online ap but that’s it. After some digging I found a CB document that tells you how you can create deep links to any query you run and table you create. The url consists of a fixed series of codes that identify the dataset, year, table, and geography. So this link:

http://factfinder2.census.gov/bkmk/table/1.0/en/DEC/10_DP/DPDP1/8600000US10010

Tells us that were getting a table from version 1.0 of the American Factfinder in English. It’s from the Decennial Census, 2010 Demographic Profiles, Demographic Profile Table 1, for ZCTA 10010 (860 is the summary level code that indicates we’re looking at ZCTAs). So for the plugin to create the links, I just included this URL but for the last five digits I specified the attribute from the ZCTA shapefile that held the ZCTA code. So when the plugin creates the KML, each KML feature has a link generated that is specific to it:

http://factfinder2.census.gov/bkmk/table/1.0/en/DEC/10_DP/DPDP1/8600000US[ZCTA5CE10]

You can see this previous post for details on how the Export to KML plugin works.

For now, the 2010 and 2000 Census are in the new American Factfinder. The American Community Survey, the Economic Census, population estimates, and a few other datasets are still in the older, legacy Factfinder. According to the CB all of this data will be migrated to the new Factfinder by the end of 2011 and the legacy version will disappear. At that point I’ll have to update my PUMA map so that it points to the profiles in the new Factfinder.

You can take a look at the ZCTA map and profiles below (I’m hosting it on the NYC data resource guide I’ve created for my college). As I’ve written before, ZCTAs are odd Census geographies since they are approximations of residential USPS ZIP Codes created by aggregating census blocks based on addresses; you can see in many instances where boundaries have a blocky teeth-like appearance instead of straight lines. Since they’re created directly by aggregating blocks, ZCTAs don’t correspond or mesh with other census boundaries like tracts or PUMAs, or even legal boundaries like counties. In some cases my assignment of county-based colors doesn’t ring true. For example, ZCTA 11370 includes part of the East Elmhurst neighborhood in Queens and Rikers Island, which is in the Bronx. ZCTA 10463 includes the Bronx neighborhoods of Kingsbridge and Spuyten Duyvil and the Manhattan neighborhood of Marble Hill (a geographic anomaly; it’s not on the Island of Manhattan but it’s part of Manhattan borough).

The most salient issue with ZCTAs is that they are only tabulated for the decennial census and not the American Community Survey; the currency of data and spectrum of census variables will be limited compared to other types of geography. ***NOTE*** This is no longer the case – ZCTA-level data is now available as part of the 5-year ACS, beginning with the 2007-2011 series.


View Larger Map

Giving GRASS GIS a Try

Saturday, July 30th, 2011

I’ve been taking the plunge in learning GRASS GIS this summer, as I’ve been working at home (and thus don’t have access to ArcGIS) on a larger and more long-term project (and thus don’t want to mess around with shapefiles and csv tables). I liked the idea of working with GRASS vectors, as everything is contained in one folder and all my attributes are stored rather neatly in a SQLite database.

I started out using QGIS to create my mapset and to connect it to my SQLite db which I had created and loaded with some census data. Then I thought, why not give the GRASS interface a try? I started using the newer Python-wx GUI and as I’m trying different things, I bounce back and forth between using the GUI for launching commands and the command line for typing them in – all the while I have Open Source GIS A GRASS GIS Approach at my side and the online manual a click away . So far, so good.

I loaded and cleaned a shapefile with the GRASS GUI (the GUI launches v.in.ogr, v.build, abd v.clean) and it’s attributes were loaded into the SQLite database I had set (using db.connect – need to do this otherwise a DBF is created by default for storing attributes). Then I had an age-old task to perform – the US Census FIPS / ANSI codes where stored in separate fields, and in order to join them to my attribute tables I had to concatenate them. I also needed to append some zeros to census tract IDs that lacked them – FIPS codes for states are two digits long, counties are three digits, and tracts are between four and six digits, but to standardize them four digit tracts should have two zeros appended.

Added the new JOIN_ID column using v.db.addcol, then did the following using db.execute:

UPDATE tracts_us99_nad83
SET JOIN_ID = STATE || COUNTY || TRACT

Then:

UPDATE tracts_us99_nad83
SET JOIN_ID = JOIN_ID || ’00’
WHERE length(JOIN_ID)=9

So this:

STATE COUNTY TRACT
01 077 0113
01 077 0114
01 077 011502
01 077 011602

Becomes this:

JOIN_ID
01077011300
01077011400
01077011502
01077011602

db.execute GRASS GUI

I could have done this a few different ways from within GRASS: instead of the separate v.db.addcol command I could have written a SQL statement in db.execute to alter the table and add a column. Or, instead of db.execute I could have used the v.db.update command.

My plan is to use GRASS for geoprocessing and analysis (will be doing some buffering, geographic selection, and basic spatial stats), and QGIS for displaying and creating final maps. I just used v.in.db to transform an attribute table with coordinates in my db to a point vector. But now I’m realizing that in order to draw buffers, I’ll need a projected coordinate system that uses meters or feet, as inputting degrees for a buffer distance (for points throughout the US) isn’t going to work too well. I’m also having trouble figuring out how to link my attribute data to my vectors – I can easily use v.db.join to fuse the two together, but there is a way to link them more loosely using the CAT ID number for each vector, but I’m getting stuck. We’ll see how it goes.

Some final notes – since I’m working with large datasets (every census tract in the US) and GRASS uses a topological data model where common node and boundaries between polygons are shared, geoprocessing can take awhile. I’ve gotten in the habit of testing things out on a small subset of my vectors, and once I get it right I run the process on the total set.

Lastly, there are times where I read about commands in the manual and for the life of me I can’t find them in the GUI – for example, finding a menu to delete (i.e. permanently remove) layers. But if you type the command without any of its parameters in the command line (in this case, g.remove) it will launch the right window in the GUI.

GRASS GIS Interface

Updates for QGIS 1.7 Wroclaw

Tuesday, July 5th, 2011

The latest version of QGIS, 1.7 “Wroclaw” was released a few weeks ago. Some of the recent updates render parts of my GIS Practiucm out of date (unless you’re sticking with versions 1.5 or 1.6), so I’ll be making updates later this summer for the upcoming workshops this academic year. In the meantime, I wanted to summarize the most salient changes here for the participants in this past spring’s workshops, and for anyone else who may be interested. Here are the big two changes that affect the tutorial / manual:

Transforming Projections – In previous versions you would go under Vector – > Data Management Tools > Export to New Projection. In 1.7 this has been dropped from the ftools vector menu. To transform the projection of a file, you select it in the Map Legend (ML), right-click, hit Save As, give it a new name and specify the new CRS there. The QGIS developers have provided some info on how QGIS handles projections that’s worth checking out. You can go in the settings and have QGIS transform projections on the fly, which is fine depending on what you’re going to do. My preference is to play it safe – do the transformations and make sure all your files and the window share the same CRS. It can save you headaches later on.

Table Joins – In previous versions you would also accomplish this under Vector – > Data Management Tools > Join Attributes, where you’d join a DBF or CSV to a shapefile to create a new file with both the geometry and the data. Now that’s out, and QGIS can support dynamic joins, similar to ArcGIS where you couple the attribute table to the shapefile without permanently fusing the two. To do this you must add your DBF or CSV directly to your project; do this the same way you’d add a vector layer. Hit the Add Vector button, and change the drop down at the bottom so you can see all files (not just shapefiles) in your directory. Add your table. Then select your layer in the ML and double click to open the properties menu. You’ll see a new tab for Joins. Hit the tab and hit the plus button to add a new join. You’ll select your table, the table’s join field, and the layer’s join field. Hit OK to join em, and it’s done. Open the attribute table for the layer and you’ll see all columns for the layer and the joined field.
New Join Menu QGIS 1.7

This sounds great, but I had some trouble when I went to symbolize my data. Using the old symbology tab, I couldn’t classify any of my columns from my attribute table using Equal Intervals; it populated each class with zeros. Quantiles worked fine. If I switched to the new symbology, I still couldn’t use Equal Intervals, and Quantiles and Natural Breaks only worked partially – my dataset contained negative values which were simply dropped instead of classified. Grrrrr. I got around this by selecting my layer in the ML (after doing the join), right clicked, and saved it as a new layer. This permanently fused the shapefile with the attributes, and I had no problem classifying and mapping data in the new file. I went to the forum and asked for help to see if this is a bug – will report back with the results.

Here are some other updates worth noting:

  • Feature Count – if you right click on a layer in the ML, you can select the Feature Count option and the number of features will be listed under your layer. If you’ve classified your features, it will do a separate count for each classification.
  • Feature Count QGIS 1.7

  • Measuring Tool – it looks like it’s now able to convert units on the fly, so even if you’re using a CRS with units in degrees, it can convert it to meters and kilometers (you can go into the options menu if you want to switch to feet and miles).
  • Labels – it looks like the experimental labelling features have been made default in this verson, and they do work better, especially with polygons.
  • Map Composer Undo – undo and redo buttons have been added to the map composer, which makes it much easier to work with.
  • Undo Redo Button Map Composer

  • Map Composer Symbols – if you go to insert an image in the map composer, you’ll have a number of default SVG symbols you can load, incuding north arrows
  • Export to PDF – from the map composer, this was pretty buggy in the past but I was able to export a map today without any problems at all.

GIS Practicum and QGIS Tutorial

Thursday, May 19th, 2011

I recently finished running my day-long practicum this semester, Introduction to GIS Using Open Source Software, which used QGIS to introduce new users to GIS. Each participant received a printed tutorial booklet for the class, which I’m now providing online under a Creative Commons license here:

http://www.baruch.cuny.edu/geoportal/practicum/

I plan on revising the tutorial once a year, and the online version will be one version behind what I use and distribute in class. I’ll be busy this summer tweaking the guide and the class and will offer the in-class version to members of my university again in the 2011-12 academic year.

I held the workshop three times and had 45 participants out of a possible 60. Half were graduate students and the other half were faculty or staff. The top three disciplines that were represented were public health, library and information science, and public administration; I also had a smattering of folks from across the social sciences and a few from the natural sciences. Despite my intention to introduce GIS to novices, about half of the participants had some previous experience using GIS, primarily ArcGIS. All of these folks were pleasantly surprised with how well QGIS performed.

If you have any questions or comments about the tutorial feel free to contact me:

francis DOT donnelly AT baruch DOT cuny DOT edu

There’s also the gothos email address, but to be honest I don’t check it as often as I should – frank


Copyright © 2017 Gothos. All Rights Reserved.
No computers were harmed in the 0.462 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.