ZIP Code KML Map for NYC Census Data

September 10th, 2011

With the release of both the 2010 Census profiles for ZCTAs (ZIP Code Tabulation Areas) and the TIGER line files for 2010 Census geographies, I created another Google Map finding aid for NYC neighborhood data by ZIP code (I previously created one for PUMAs with American Community Survey data). Once again I used the Export to KML plugin that was created for ArcGIS. This allowed me to use the TIGER shapefile in ArcGIS to create the map I wanted and then export it as a KML, while using fields in the attribute table of each feature to insert the ZCTA number into stable links for the census profiles, automatically generating unique urls for each feature. Click on the ZCTA in the map, and then click on a link to open a profile directly from the new American Factfinder.

There were two new obstacles I had to contend with this time. The first was that my department has finally migrated to Windows 7 from Windows XP, and I upgraded from ArcGIS 9.3 to 10. I had to reinstall the Export to KML plugin (version 2.5.5) and ran into trouble; fortunately all the work-arounds were included in the plugin’s documentation. I don’t have administrator rights on my machine, so I had to have someone install the plugin as an administrator; this included running the initial setup file AND running Arc as an administrator as you add and turn the plugin on. That was straightforward, but when I ran it the first time I got an error message – there’s a particular Windows dll or ocx file that the plugin needs and it was missing (presumably something that was included in XP but not in 7). I downloaded the necessary file, and with administrator rights moved it into the system32 folder and registered the file via the command line. After that I was good to go.

The second issue was with the Census Bureau’s new American Factfinder. With the old Factfinder the urls that were generated as you built and accessed tables were static and you could simply save and bookmark them. Not the case in the new Factfinder; you can bookmark some basic tables but most of them are “too complex to bookmark”; you can save and download queries from the online ap but that’s it. After some digging I found a CB document that tells you how you can create deep links to any query you run and table you create. The url consists of a fixed series of codes that identify the dataset, year, table, and geography. So this link:

http://factfinder2.census.gov/bkmk/table/1.0/en/DEC/10_DP/DPDP1/8600000US10010

Tells us that were getting a table from version 1.0 of the American Factfinder in English. It’s from the Decennial Census, 2010 Demographic Profiles, Demographic Profile Table 1, for ZCTA 10010 (860 is the summary level code that indicates we’re looking at ZCTAs). So for the plugin to create the links, I just included this URL but for the last five digits I specified the attribute from the ZCTA shapefile that held the ZCTA code. So when the plugin creates the KML, each KML feature has a link generated that is specific to it:

http://factfinder2.census.gov/bkmk/table/1.0/en/DEC/10_DP/DPDP1/8600000US[ZCTA5CE10]

You can see this previous post for details on how the Export to KML plugin works.

For now, the 2010 and 2000 Census are in the new American Factfinder. The American Community Survey, the Economic Census, population estimates, and a few other datasets are still in the older, legacy Factfinder. According to the CB all of this data will be migrated to the new Factfinder by the end of 2011 and the legacy version will disappear. At that point I’ll have to update my PUMA map so that it points to the profiles in the new Factfinder.

You can take a look at the ZCTA map and profiles below (I’m hosting it on the NYC data resource guide I’ve created for my college). As I’ve written before, ZCTAs are odd Census geographies since they are approximations of residential USPS ZIP Codes created by aggregating census blocks based on addresses; you can see in many instances where boundaries have a blocky teeth-like appearance instead of straight lines. Since they’re created directly by aggregating blocks, ZCTAs don’t correspond or mesh with other census boundaries like tracts or PUMAs, or even legal boundaries like counties. In some cases my assignment of county-based colors doesn’t ring true. For example, ZCTA 11370 includes part of the East Elmhurst neighborhood in Queens and Rikers Island, which is in the Bronx. ZCTA 10463 includes the Bronx neighborhoods of Kingsbridge and Spuyten Duyvil and the Manhattan neighborhood of Marble Hill (a geographic anomaly; it’s not on the Island of Manhattan but it’s part of Manhattan borough).

The most salient issue with ZCTAs is that they are only tabulated for the decennial census and not the American Community Survey; the currency of data and spectrum of census variables will be limited compared to other types of geography.


View Larger Map

Giving GRASS GIS a Try

July 30th, 2011

I’ve been taking the plunge in learning GRASS GIS this summer, as I’ve been working at home (and thus don’t have access to ArcGIS) on a larger and more long-term project (and thus don’t want to mess around with shapefiles and csv tables). I liked the idea of working with GRASS vectors, as everything is contained in one folder and all my attributes are stored rather neatly in a SQLite database.

I started out using QGIS to create my mapset and to connect it to my SQLite db which I had created and loaded with some census data. Then I thought, why not give the GRASS interface a try? I started using the newer Python-wx GUI and as I’m trying different things, I bounce back and forth between using the GUI for launching commands and the command line for typing them in – all the while I have Open Source GIS A GRASS GIS Approach at my side and the online manual a click away . So far, so good.

I loaded and cleaned a shapefile with the GRASS GUI (the GUI launches v.in.ogr, v.build, abd v.clean) and it’s attributes were loaded into the SQLite database I had set (using db.connect – need to do this otherwise a DBF is created by default for storing attributes). Then I had an age-old task to perform – the US Census FIPS / ANSI codes where stored in separate fields, and in order to join them to my attribute tables I had to concatenate them. I also needed to append some zeros to census tract IDs that lacked them – FIPS codes for states are two digits long, counties are three digits, and tracts are between four and six digits, but to standardize them four digit tracts should have two zeros appended.

Added the new JOIN_ID column using v.db.addcol, then did the following using db.execute:

UPDATE tracts_us99_nad83
SET JOIN_ID = STATE || COUNTY || TRACT

Then:

UPDATE tracts_us99_nad83
SET JOIN_ID = JOIN_ID || ’00′
WHERE length(JOIN_ID)=9

So this:

STATE COUNTY TRACT
01 077 0113
01 077 0114
01 077 011502
01 077 011602

Becomes this:

JOIN_ID
01077011300
01077011400
01077011502
01077011602

db.execute GRASS GUI

I could have done this a few different ways from within GRASS: instead of the separate v.db.addcol command I could have written a SQL statement in db.execute to alter the table and add a column. Or, instead of db.execute I could have used the v.db.update command.

My plan is to use GRASS for geoprocessing and analysis (will be doing some buffering, geographic selection, and basic spatial stats), and QGIS for displaying and creating final maps. I just used v.in.db to transform an attribute table with coordinates in my db to a point vector. But now I’m realizing that in order to draw buffers, I’ll need a projected coordinate system that uses meters or feet, as inputting degrees for a buffer distance (for points throughout the US) isn’t going to work too well. I’m also having trouble figuring out how to link my attribute data to my vectors – I can easily use v.db.join to fuse the two together, but there is a way to link them more loosely using the CAT ID number for each vector, but I’m getting stuck. We’ll see how it goes.

Some final notes – since I’m working with large datasets (every census tract in the US) and GRASS uses a topological data model where common node and boundaries between polygons are shared, geoprocessing can take awhile. I’ve gotten in the habit of testing things out on a small subset of my vectors, and once I get it right I run the process on the total set.

Lastly, there are times where I read about commands in the manual and for the life of me I can’t find them in the GUI – for example, finding a menu to delete (i.e. permanently remove) layers. But if you type the command without any of its parameters in the command line (in this case, g.remove) it will launch the right window in the GUI.

GRASS GIS Interface

Updates for QGIS 1.7 Wroclaw

July 5th, 2011

The latest version of QGIS, 1.7 “Wroclaw” was released a few weeks ago. Some of the recent updates render parts of my GIS Practiucm out of date (unless you’re sticking with versions 1.5 or 1.6), so I’ll be making updates later this summer for the upcoming workshops this academic year. In the meantime, I wanted to summarize the most salient changes here for the participants in this past spring’s workshops, and for anyone else who may be interested. Here are the big two changes that affect the tutorial / manual:

Transforming Projections – In previous versions you would go under Vector – > Data Management Tools > Export to New Projection. In 1.7 this has been dropped from the ftools vector menu. To transform the projection of a file, you select it in the Map Legend (ML), right-click, hit Save As, give it a new name and specify the new CRS there. The QGIS developers have provided some info on how QGIS handles projections that’s worth checking out. You can go in the settings and have QGIS transform projections on the fly, which is fine depending on what you’re going to do. My preference is to play it safe – do the transformations and make sure all your files and the window share the same CRS. It can save you headaches later on.

Table Joins – In previous versions you would also accomplish this under Vector – > Data Management Tools > Join Attributes, where you’d join a DBF or CSV to a shapefile to create a new file with both the geometry and the data. Now that’s out, and QGIS can support dynamic joins, similar to ArcGIS where you couple the attribute table to the shapefile without permanently fusing the two. To do this you must add your DBF or CSV directly to your project; do this the same way you’d add a vector layer. Hit the Add Vector button, and change the drop down at the bottom so you can see all files (not just shapefiles) in your directory. Add your table. Then select your layer in the ML and double click to open the properties menu. You’ll see a new tab for Joins. Hit the tab and hit the plus button to add a new join. You’ll select your table, the table’s join field, and the layer’s join field. Hit OK to join em, and it’s done. Open the attribute table for the layer and you’ll see all columns for the layer and the joined field.
New Join Menu QGIS 1.7

This sounds great, but I had some trouble when I went to symbolize my data. Using the old symbology tab, I couldn’t classify any of my columns from my attribute table using Equal Intervals; it populated each class with zeros. Quantiles worked fine. If I switched to the new symbology, I still couldn’t use Equal Intervals, and Quantiles and Natural Breaks only worked partially – my dataset contained negative values which were simply dropped instead of classified. Grrrrr. I got around this by selecting my layer in the ML (after doing the join), right clicked, and saved it as a new layer. This permanently fused the shapefile with the attributes, and I had no problem classifying and mapping data in the new file. I went to the forum and asked for help to see if this is a bug – will report back with the results.

Here are some other updates worth noting:

  • Feature Count – if you right click on a layer in the ML, you can select the Feature Count option and the number of features will be listed under your layer. If you’ve classified your features, it will do a separate count for each classification.
  • Feature Count QGIS 1.7

  • Measuring Tool – it looks like it’s now able to convert units on the fly, so even if you’re using a CRS with units in degrees, it can convert it to meters and kilometers (you can go into the options menu if you want to switch to feet and miles).
  • Labels – it looks like the experimental labelling features have been made default in this verson, and they do work better, especially with polygons.
  • Map Composer Undo – undo and redo buttons have been added to the map composer, which makes it much easier to work with.
  • Undo Redo Button Map Composer

  • Map Composer Symbols – if you go to insert an image in the map composer, you’ll have a number of default SVG symbols you can load, incuding north arrows
  • Export to PDF – from the map composer, this was pretty buggy in the past but I was able to export a map today without any problems at all.

FOSS4G In Denver This Sept

June 20th, 2011

I’m all set to go to FOSS4G 2011, the global conference on Free and Open Source Software for Geospatial, organized by OSGeo. The conference takes place in Denver, CO from Mon Sept 12 to Fri the 16th. The first two days (12th-13th) consist of morning and and afternoon workshops while the main conference takes place from the 14th to the 16th and features talks, presentations, tutorials, exhibits, and some fun social events.

The full program is available here, and it looks like it’s chock full of interesting presentations and lots of great learning opportunities via the workshops and tutorials. I’ll be presenting on Weds afternoon, for those interested in my adventures in introducing QGIS on a college campus.

If you’re on the fence about attending, consider this: this is the sixth year for the conference and it’s only the second time that it’s been held in North America (Canada hosted the 2nd conference in 2007) and the first time it’s being hosted in the US. So if you’re in North America and getting funding from your organization for travel is an issue, now’s your best chance to go. This is truly an international conference (was also hosted in Switzerland, South Africa, Australia, and Spain) so it probably won’t be back on these shores for awhile.

Here’s some more motivation – early registration at the discounted rate ends on June 30th!

2010 Census Data Being Released

June 16th, 2011

The US Census Bureau has begin releasing data for Summary File 1, which is the primary summary data set that the Bureau tabulates. They will release data for groups of states on a weekly basis from June through September. Alabama and Hawaii were the first states released today. California, Delaware, Kansas, Pennsylvania and Wyoming are out next week.

This data is based on the 100% count of the population and is being released for geographies that nest within states: states, counties, county subdivisions, places, census tracts, ZCTAs, and congressional districts, and in some cases block groups and blocks. You can download the data table by table by building queries via the new American Factfinder, or power users can download entire datasets via the FTP site.

You’ll see how small the 2010 Census is compared to the past: we’re only going to get basic demographic variables. The extensive number of socio-economic indicators – education, income, language, employment status, etc – are no longer collected as part of the decennial census; you have to turn to the American Community Survey for this data, which is released on an annual basis.

Here’s what’s in the 2010 Census:

  • Total Population
  • Urban and Rural Population
  • Gender and Age
  • Race
  • Hispanic or Latino Origin
  • Households (Including Type and Size)
  • Group Quarters
  • Families
  • Family Relationships
  • Housing Units
  • Occupancy Status (Occupied or Vacant)
  • Tenure (Owner or Renter Occupied)

Many of these variables are cross-tabulated by age, gender, race, Hispanic or Latino Origin, Household Type, and Household Size. Once we get to the fall of 2011 we’ll start to see national level data for divsions, regions, and metropolitan areas.

Define Projection for a Batch of Shapefiles

June 12th, 2011

I was working on a project where I had downloaded 51 shapefiles (state-based census tract files) from the Census Generalized Cartographic Boundary Files. Each file lacked a projection .prj file, so I had to define each one as NAD83. Not wanting to do this one at a time, I used the GDAL / OGR tools and a bash script to process them all in a batch. I wrote a little script in a text file and then pasted it in the command line:

#!/bin/bash
for i in $(ls *.shp); do
ogr2ogr -f “ESRI Shapefile” -a_srs “EPSG:4269″ ./nad83 $i
done

It iterates through a list of all the shapefiles in a directory, uses OGR to define them as NAD83, then writes them to a new subdirectory called NAD83.

After searching through the web for some guidance on this, I later realized that there was a nice, succinct example of this in a book that I had (yeah – remember books? They’re still great!)

#!/bin/bash
# from Sherman (2008) Desktop GIS Mapping the Planet With Open Source Tools pp 243-44

for shp in *.shp
do
echo “Processing $shp”
ogr2ogr -f “ESRI Shapefile” -t_srs EPSG:4326 geo/$shp $shp
done

This does the same thing, difference here is that it prints a message to the command line for each file that’s processed and uses the -t_srs switch (transform projection) rather than the -a_srs (assign an output projection), which in this case seems to do the same thing. Of course you could tweak this a little to transform projections from one system to another as well.

This is fine and good if you’re using Linux and can use bash (go here for more info about bash). If you’re using Windows, you can do this if you’re using a Linux / UNIX terminal emulator like MSYS; otherwise you can use the DOS Command Prompt and write a batch (.bat) file to do this instead – the post on this forum is the first thing I found in my quest to figure all of this out.

Libraries Help Create Video Game Geography

June 9th, 2011

Just for fun – I stumbled on a blog post from a TV Station in California that discusses how the developers for the new L.A. Noire video game made extensive use of libraries and archives to recreate Los Angeles of 1947. They used property, city planning, and USGS maps to recreate the street grid and landscape and aerial and street-level photos to faithfully replicate everything from specific buildings to street lights and garbage cans. They also dove into newspaper archives and Raymond Chandler’s works and personal papers for dialogue and story lines. It’s a good thing that we keep libraries and archives around to organize and collect all this stuff…

How Archivists Helped Video Game Designers Recreate the City’s Dark Side for ‘L.A. Noire’

Don’t know what I’m talking about? Check out this trailer.

Copyright RockStar Games 2011

GIS Practicum and QGIS Tutorial

May 19th, 2011

I recently finished running my day-long practicum this semester, Introduction to GIS Using Open Source Software, which used QGIS to introduce new users to GIS. Each participant received a printed tutorial booklet for the class, which I’m now providing online under a Creative Commons license here:

http://www.baruch.cuny.edu/geoportal/practicum/

I plan on revising the tutorial once a year, and the online version will be one version behind what I use and distribute in class. I’ll be busy this summer tweaking the guide and the class and will offer the in-class version to members of my university again in the 2011-12 academic year.

I held the workshop three times and had 45 participants out of a possible 60. Half were graduate students and the other half were faculty or staff. The top three disciplines that were represented were public health, library and information science, and public administration; I also had a smattering of folks from across the social sciences and a few from the natural sciences. Despite my intention to introduce GIS to novices, about half of the participants had some previous experience using GIS, primarily ArcGIS. All of these folks were pleasantly surprised with how well QGIS performed.

If you have any questions or comments about the tutorial feel free to contact me:

francis DOT donnelly AT baruch DOT cuny DOT edu

There’s also the gothos email address, but to be honest I don’t check it as often as I should – frank

2010 Census Redistricting Data

April 17th, 2011

The Redistricting Summary Data [P.L. 94-171] from the 2010 Census has all been published for the nation, states, counties, and places, and is available via the new American Factfinder. The redistricting data includes basic demographic data: total population, race, Hispanic or Latino origin, and number of housing units occupied and vacant. Data is available down to census blocks and is available for most (but not all – no ZCTAs or PUMAs) geographies.

If you don’t want all the data for a state, don’t want to slog through the Factfinder, and are comfortable working with large text files, you can FTP the summary data from the Redistricting Data homepage. If you want basic summary data for states, counties, and places and don’t want to fuss with the Factfinder or text files, you can download Excel spreadsheets from the Redistricting Data Press Kit. They also have some pdf / jpg maps showing county level population and population change, plus interactive map widgets like the one below for the country and for each state. 2010 Redistricting TIGER Shapefiles have also been released for geographies included in the redistricting dataset.

The full 2010 Census for all geographies will be released throughout this summer and into the fall in Summary File 1 [SF1]. Stay tuned.

Joining CSV Files in QGIS

April 4th, 2011

DBF files are the other thorny issue that comes up when I’m teaching the Intro to GIS workshops with QGIS – specifically how do you create and edit them? Doing that has been pretty inconvenient in the Windows world since they were deprecated in Excel 2007. You can download plugins or basic stand-alone software to do the job. Since I’m a Linux user I have the Open Office suite by default, and I can easily work with DBFs in Calc. That’s an option for Windows and Mac users too, but it’s kind of a drag to download an entire office suite just for working with DBFs (assuming most folks are use MS Office).

Another possibility is to dump DBF all together; even though the join table option in the fTools menu in QGIS only presents you the option of joining shapfiles or DBF, it turns out if you choose the DBF radio button option you can actually point to DBFs OR CSV files when you’re browsing to point to your data table. The join proceeds the same way, and you get a new shapefile with the data from the CSV joined to it. CSVs can be easily created from any spreadsheet, text editor, or database program in any operating system, so you can prep your attribute data in your program of choice before exporting to CSV and importing to GIS.

The problem with this approach is that all of the fields from the CSV file are automatically saved as text or strings when they’re appended to the shapefile. This means that any numeric data you have can’t be treated numerically; you can’t perform calculations or classify data as value ranges for mapping. You can go into the attribute table, enter the edit mode, and use the field calculator to create new fields where you convert the values to integers, but this adds a bunch of duplicate fields and is rather messy. You can’t delete columns from shapefiles within QGIS; you have to edit the attributes outside the software to remove extra columns.

Here’s an easy work around: you can create a CSVT file to go along with your CSV file. Open a text editor like Notepad or gedit and create a one line file where you specify the data type for each of the fields in your CSV file. Save the CSVT with the SAME NAME as your CSV file. Now when you go to join your CSV to your shapefile in QGIS, it will read the CSVT and save all of your fields properly in the new shapefile.

So for a CSV file like this with six fields:

CODE, ST, ID_NAICS51, EMP_51, ID_TOTAL, TOTAL_EMP
01, AL, ENU0100010551, 27013, ENU0100010510, 1570188
02, AK, ENU0200010551, 6988, ENU0200010510, 237708
04, AZ, ENU 0400010551, 41833, ENU0400010510, 2174919 …

You’d create a CSVT file like this:

“String”, “String”, “String”, “Integer”, “String”, “Integer”

So the 4th and 6th fields that have numeric values are saved as integers. There are a few other field types you can use, like Real for decimal numbers and some Date / Time options, and you can specify length and precision – see this post for details.

Using shapefiles, CSVs, and CSVTs are fine for small to medium projects and for the introductory workshops I’m teaching; geodatabases are another option and are certainly better for medium to large projects, but introducing them in my intro workshop is a little too much.

(NOTE – in QGIS 1.7 the join tool has been dropped from the ftools menu. To join a csv or dbf in 1.7+, you have to add your data table as a vector to your map, and then use the join tab within the properties menu of the feature you want to join it to. See this post for details.)


Copyright © 2012 Gothos. All Rights Reserved.
No computers were harmed in the 0.390 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.