Article on Working With the American Community Survey

June 17th, 2013

I’ve got another article that’s just hit the presses. In this one I discuss the American Community Survey: how it differs from the Decennial Census, when you should use it versus other summary data sets, how to work with the different period estimates, and how to create derived estimates and calculate their margins of error. For that last piece I’ve essentially done an extended version of this old post on Excel formulas, with several different and updated examples.

The article is available via Emerald’s journal database. If you don’t have access to it from your library feel free to contact me and I’ll send you a copy (can’t share this one freely online).

Title: The American Community Survey: practical considerations for researchers
Author(s): Francis P. Donnelly
Citation: Francis P. Donnelly, (2013) “The American Community Survey: practical considerations for researchers”, Reference Services Review, Vol. 41 Iss: 2, pp.280 – 297
Keywords: American Community Survey, Census, Census geography, Data handling, Decennial census, Demographic data, Government data processing, Government information, Margins of error, Sample-based data, United States of America, US Census Bureau
Article type: Technical paper
DOI: 10.1108/00907321311326228 (Permanent URL)
Publisher: Emerald Group Publishing Limited

Downloading Data for Small Census Geographies in Bulk

May 7th, 2013

I needed to download block group level census data for a project I’m working on; there was one particular 2010 Census table that I needed for every block group in the US. I knew that the American Factfinder was out – you can only download block group data county by county (which would mean over 3,000 downloads if you want them all). I thought I’d share the alternatives I looked at; as I searched around the web I found many others who were looking for the same thing (i.e. data for the smallest census geographies covering a large area).

The Census FTP site at http://www2.census.gov/

This would be the first logical step, but in the end it wasn’t optimal based on my need. When you drill down through Census 2010, Summary File 1, you see a file for every state and a national file. Initially I thought – great! I’ll just grab the national file. But the national file does NOT contain the small census statistical areas – no tracts, block groups, or blocks. If you want those small areas you have to download the files for each of the states – 51 downloads. When you download the data you can also download an MS Access database, which is an empty shell with the geography and field headers, and you can import each of the text file data tables (there a lot of them for 2010 SF1) into the db and match them to the headers during import (the instructions that were included for doing this were pretty good). This is great if you need every variable in every table for every geography, but I was only interested in one table for one geography. I could just import the one text file with my table, but then I’d have to do this import process 51 times. The alternative is to use some Python to get that one text file for every state into one big file and then do the import once, but I opted for a different route.

The NHGIS at https://www.nhgis.org/

I always recommend this resource to anyone who’s looking for historical census data or boundary files, but it’s also good if you want current data for these small areas. I was able to use their query window to widdle down the selection by dataset (2010 SF1), geography (block groups), and topic (Hispanic origin and race in my case), then I was able to choose the table I needed. On the last screen before download I was able to check a box to include all 50 states plus DC and PR in one file. I had to wait a couple minutes for the request to process, then downloaded the file as a CSV and loaded it into my database. This was the best solution for my circumstances by far – one table for all block groups in the country. If you had to download a lot (or all) of the tables or variables for every block group or block it may take quite awhile, and plugging through all of those menus to select everything would be tedious – if that’s your situation it may be easier to grab everything using the Census FTP.

nhgis

UExplore / Dexter at http://mcdc.missouri.edu/applications/uexplore.shtml

The Missouri Census Data Center’s UExplore / Dexter tool lets you choose a dataset and takes you to a window that resembles a file system, with a ton of files in it. The MCDC takes their extracts directly from the Census, so they’re structured in a similar way to the FTP site as state-based files. They begin with the state prefix and have a name that indicates geography – there are files for block groups, blocks, and one for everything else. There are national files (which don’t contain small census areas) that begin with ‘us’. The difference here is – when you click on a file, it launches a query window that let’s you customize the extract. The interface may look daunting at first, but it’s worth exploring (and there’s a tutorial to help guide you). You can choose from several output formats, specific variables or tables (if you don’t want them all), and there are a bunch of handy options that you can specify like aggregation or percent totals. In addition to the complete datasets, they’ve also created ‘Standard Extracts’ that have the most common variables, if you want just a core subset. While the NHGIS was the best choice for my specific need, the customization abilities in Dexter may fit your needs – and the state-level block group and block data is conveniently broken out from the other files.

Lastly…

There are a few others tools – I’ll give an honorable mention to the Summary File Retrieval tool, which is an Excel plugin that lets you tap directly into the American Community Survey from a spreadsheet. So if you wanted tracts or block groups for a wide area for but a small number of variables (I think 20 is the limit) that could be a winner, provided you’re using Excel 2007 or later and are just looking at the ACS. No dice in my case, as I needed Decennial Census data and use OpenOffice at home.

The Geography of US Public Libraries

March 18th, 2013

Last month my article on the geographic disribution of US public libraries was pre-published online in JOLIS, with a print date pending. I can’t share this one freely on-line, so if you don’t have access via a library database (Sage Journals, ERIC, various LIS dbs) you can contact me if you’re interested in getting a copy.

Title: The geographic distribution of United States public libraries : An analysis of locations and service areas
Author: Francis P. Donnelly
Journal: Journal of Librarianship and Information Science (JOLIS)
Year: 2013 Volume, Issue, and Pages pending print release
ISSN: 0961-0006
DOI: 10000612470276.1177/0961
Publisher: Sage

Abstract

This article explores the geography of public libraries in the United States. The distribution of libraries is examined using maps and spatial statistics to analyze spatial patterns. Methods for delineating and studying library service areas used in previous LIS research are modified and applied using geographic information systems to study variations in library accessibility by state and by socio-economic group at the national level. A history of library development is situated within the broader economic and demographic history of the US to provide insight to contemporary patterns, and Louis Round Wilson’s Geography of Reading is used as a focal point for studying historical trends. Findings show that while national library coverage is extensive, the percentage of the population that lives in a library’s geographic service area varies considerably by region and state, with Southern and Western states having lower values than Northeastern and Midwestern states.

Keywords

Geographic information systems, geography, public libraries, service areas, spatial equity, United States

This OCLC flier (How Public Libraries Stack Up) piqued my interest in public libraries as community resources, public goods, and placemaking institutions. If the presence of a public library brings so much value to a community, then by extension the lack of a public library could leave a community at a disadvantage. This led to the next set of logical questions: how are libraries distributed across the country, and which people and communties are being served and which aren’t?

I took a few different approaches to answer these questions. The first approach was to use (and learn) spatial statistics so the overall distribution could be characterized, and the second was to use spatial extraction methods to select census areas and populations that were within the service areas of each library, to see differences in how populations were served and to study these differences across different states. The LIS literature is rich with research that uses GIS to study library use, so I provide a thorough summary of what’s come before. Then after I had the results I spent a good deal of time researching how the contemporary pattern came to be, and coupled the research on the history of public libraries with the broader history of urban and economic development in the United States.

I had a few unstated motives – one of them was to learn spatial statistics, with the help of: OpenGeoda and its documentation, this excellent book on Spatial Data Analysis (for theory), these great examples from Spatial Justice, and invaluable advice from Deborah Balk, a Professor of Spatial Demography with the CUNY Institute for Demographic Research.

One of my other goals was to use only open source software – QGIS, GRASS, and OpenGeoda, which was also a success. Although in my next study I’ll probably rely on QGIS and Spatialite; I found I was doing a lot of attribute data crunching using the SQLite Manager, since the attributes of GRASS vectors can be stored in SQLite, and I could probably save time (and frustration) by using Spatialite’s features instead. I did get to learn a lot about GRASS, but for my purposes it was overkill and I would have been just fine with a spatial database. I was definetely able to sharpen my Python skills, as processing the American Community Survey data for every census tract in the US manually would have been crazy.

In a project this size there are always some pieces that end up on the cutting room floor, so I thought I’d share one here – a dot map that shows where all 16,700 public libraries are. In the article I went with a county choropleth map to show the distribution, because I was doing other county-level stuff and because the dimension restrictions on the graphic made it a more viable option. The dot map reveals that libraries are typically where people are, except that the south looks emptier and the midwest fuller than it should be, if libraries were in fact evenly distributed by population. As my research shows – they’re not.

US Public Libraries

Creating Reports with SQLite, Python, and prettytable

February 8th, 2013

In addition to providing the NYC Geodatabase as a resource, I also wanted to use it to generate reports and build applications. None of the open source SQLite GUIs that I’m familiar with have built in report generating capabilities, so I thought I could use Python to connect to the database and generate them. I have some grand ambitions here, but decided to start out small.

Python has a built-in module, sqlite3, that you can use to work with SQLite databases. This is pretty well documented – do a search and you’ll find a ton of brief tutorials. Take a look at this great post for a comprehensive intro.

For generating reports I gave prettytable a shot: it lets you create nice looking ASCII text tables that you can copy and paste from the prompt or export out to a file. The tutorial for the module was pretty clear and covers the basics quite nicely. In the examples he directly embeds the data in the script and generates the table from it, which makes the tutorial readily understandable. For my purposes I wanted to pull data out of a SQLite database and into a formatted table, so that’s what I’ll demonstrate here.

Initially I had some trouble getting the module to load, primarily (I think) because I’m using Python 3.x and the setup file for the module was written for Python 2.x; the utility you use for importing 3rd party modules has changed between versions. I’m certainly no Python expert, so instead of figuring it out I just downloaded the module, dumped it into the site-packages folder (as suggested in the prettytable installation instructions under “The Harder Way” – but it wasn’t hard at all) and unzipped it. In my script I couldn’t get the simple “import prettytable” to work without throwing an error, but when I added the name of the specific function “import PrettyTable from prettytable” it worked. Your mileage may vary.

So here was my first go at it. I created a test database and loaded a table of population estimates from the US Census Bureau into it (you can download it if you want to experiment):

from prettytable import PrettyTable
import sqlite3

conn = sqlite3.connect('pop_test.sqlite')
curs = conn.cursor()
curs.execute('SELECT State, Name, ESTIMATESBASE2010 AS Est2010 FROM pop_est WHERE region="1" ORDER BY Name')

col_names = [cn[0] for cn in curs.description]
rows = curs.fetchall()

x = PrettyTable(col_names)
x.align[col_names[1]] = "l" 
x.align[col_names[2]] = "r" 
x.padding_width = 1    
for row in rows:
    x.add_row(row)

print (x)
tabstring = x.get_string()

output=open("export.txt","w")
output.write("Population Data"+"\n")
output.write(tabstring)
output.close()

conn.close()

The first piece is the standard SQLite piece – connect, activate a cursor, and execute a SQL statement. Here I’m grabbing three columns from the table for records that represent Northeastern states (Region 1). I read in the names of the columns from the first row into the col_names list, and I grab everything else and dump them into rows, a list that contains a tuple for each record:

>>> col_names
['State', 'Name', 'Est2010']
>>> rows
[('09', 'Connecticut', 3574097), ('23', 'Maine', 1328361), ('25', 'Massachusetts', 6547629),
 ('33', 'New Hampshire', 1316469), ('34', 'New Jersey', 8791898), ('36', 'New York', 19378104),
 ('42', 'Pennsylvania', 12702379), ('44', 'Rhode Island', 1052567), ('50', 'Vermont', 625741)]
>>> 

The second piece will make sense after you have a quick look at the prettytable tutorial. Here I grab the list of columns names and specify how cells for the columns should be aligned (default is center) and padded (default is one space). Then I add each row from the nested list of tuples to the table, row by row. There are two outputs: print directly to the screen, and dump the whole table into a string. That string can then be dumped into a text file, along with a title. Here’s the screen output:

+-------+---------------+----------+
| State | Name          |  Est2010 |
+-------+---------------+----------+
|   09  | Connecticut   |  3574097 |
|   23  | Maine         |  1328361 |
|   25  | Massachusetts |  6547629 |
|   33  | New Hampshire |  1316469 |
|   34  | New Jersey    |  8791898 |
|   36  | New York      | 19378104 |
|   42  | Pennsylvania  | 12702379 |
|   44  | Rhode Island  |  1052567 |
|   50  | Vermont       |   625741 |
+-------+---------------+----------+

The one hangup I had was the formatting for the numbers: I really want some commas in there since the values are so large. I couldn’t figure out how to do this using the approach above – I’m writing all the rows in one swoop, and couldn’t step in and and format the last value for each row.

Unless – instead of constructing the table by rows, I construct it by columns. Here’s my second go at it:

from prettytable import PrettyTable
import sqlite3

conn = sqlite3.connect('pop_test.sqlite')
curs = conn.cursor()
curs.execute('SELECT State, Name, ESTIMATESBASE2010 AS Est2010 FROM pop_est WHERE region="1" ORDER BY Name')

col_names = [cn[0] for cn in curs.description]
rows = curs.fetchall()

y=PrettyTable()
y.padding_width = 1
y.add_column(col_names[0],[row[0] for row in rows])
y.add_column(col_names[1],[row[1] for row in rows])
y.add_column(col_names[2],[format(row[2],',d') for row in rows])
y.align[col_names[1]]="l"
y.align[col_names[2]]="r"

print(y)
tabstring = y.get_string()

output=open("export.txt","w")
output.write("Population Data"+"\n")
output.write(tabstring)
output.close()

conn.close()

To add by column, you don’t provide any arguments to the PrettyTable function. You just add the columns one by one: here I call the appropriate values using the index, first for the column name and then for all of the values from the rows that are in the same position. For the last value (the population estimate) I use format to display the value like a decimal number (this works in Python 3.1+ – for earlier versions there’s a similar command – see this post for details). I tried this in my first example but I couldn’t get the format to stick, or got an error. Since I’m specifically calling these row values and then writing them I was able to get it to work in this second example. In this version the alignment specifications have to come last. Here’s the result:

+-------+---------------+------------+
| State | Name          |    Est2010 |
+-------+---------------+------------+
|   09  | Connecticut   |  3,574,097 |
|   23  | Maine         |  1,328,361 |
|   25  | Massachusetts |  6,547,629 |
|   33  | New Hampshire |  1,316,469 |
|   34  | New Jersey    |  8,791,898 |
|   36  | New York      | 19,378,104 |
|   42  | Pennsylvania  | 12,702,379 |
|   44  | Rhode Island  |  1,052,567 |
|   50  | Vermont       |    625,741 |
+-------+---------------+------------+

prettytable gives you a few other options, like the ability to sort records by a certain column or to return only the first “n” records from a table. In this example, since we’re pulling the data from a database we could (and did) specify sorting and other constraints in the SQL statement instead. prettytable also gives you the option of exporting the table as HTML, which can certainly come in handy.

Altering Tables in SQLite / Spatialite

February 7th, 2013

In building the Spatialite geodatabase (see previous post), one of the fundamental things I learned was how to manage table creation and alteration in SQLite, which was quite different from my previous experience working with MS Access and ArcGIS.

In SQLite, the ALTER TABLE statement is limited to changing the name of a table or adding new columns. If you want to make any other changes, the process is: create a new, blank table that’s structured the way you want, and then write an INSERT statement to copy the data you want from the existing table into the new table. So if you have a table that has a bunch of columns you want to drop or change, create a new table that has your desired structure, then insert what you want into that new table. The same thing goes for primary keys or other constraints. If your existing table doesn’t have a specified key you can’t alter it by specifying one: create a new table that does, then copy your data over.

Let’s say we have this table (de_data) with basic population data for Delaware’s three counties:

USPS	GEOID	NAME			POP10	HU10	ALAND_SQMI	AWATER_SQMI
DE	10001	Kent County		162310	65338	586.179		212.152
DE	10003	New Castle County	538479	217511	426.286		67.717
DE	10005	Sussex County		197145	123036	936.079		260.312

And let’s say that we want to change some of the column names, drop the field for housing units, and specify our data types and GEOID as our key. First, create the table:

CREATE TABLE de_pop (STATE TEXT, GEOID TEXT NOT NULL PRIMARY KEY,
COUNTY TEXT, POP10 INTEGER, ALAND REAL, AWATER REAL)

GEOID is the FIPS/ANSI code that uniquely identifies each state. Since these codes may have leading zeros (the codes for all states from AL through CT do), we designate it as text.

Second, insert the data we want from the existing table:

INSERT INTO de_pop (STATE, GEOID, COUNTY, POP10, ALAND, AWATER)
SELECT USPS, GEOID, NAME, POP10, ALAND_SQMI, AWATER_SQMI
FROM de_data

Order here matters – it’s going to insert columns from the original table into the new one in sequence: USPS into STATE, GEOID into GEOID, etc. Sometimes it’s possible to use an * as a shortcut to insert and copy everything, instead of listing every field, if both tables contain the same number of columns and they’re in the right order. But this is always a bit risky.

Lastly, if we don’t need that original table we could delete it:

DROP TABLE de_data

SELECT * FROM de_pop

STATE	GEOID	COUNTY			POP10	ALAND	AWATER
DE	10001	Kent County		162310	586.179	212.152
DE	10003	New Castle County	538479	426.286	67.717
DE	10005	Sussex County		197145	936.079	260.312

The process is similar if we want to take a query or view and turn it into a permanent table. SQLite does support CREATE TABLE AS, followed by a select query, so you can create a table out of a query. However – this is usually NOT the best course of action. If you do this, you won’t be able to specify a primary key for the new table (you really never want a table that lacks a key). Furthermore, if you create a new, calculated field you won’t be able to specify a type for it. This is particularly a problem if you’re working in Spatialite and want to create a spatial view or table that you want to join to some features and map in QGIS. If no type is specified, QGIS won’t know how to handle that new field, and it will be unviewable.

So, we take the same approach as before. Let’s say we want to create a table with population density as a calculated field. First, create the table:

CREATE TABLE de_popdens (STATE TEXT, GEOID TEXT NOT NULL PRIMARY KEY,
COUNTY TEXT, POP10 INTEGER, ALAND REAL, AWATER REAL, POPDENS REAL)

Then we write our insert statement, and in the insert we create the calculated field:

INSERT INTO de_popdens (STATE, GEOID, COUNTY, POP10, ALAND, AWATER, POPDENS)
SELECT STATE, GEOID, COUNTY, POP10, ALAND, AWATER, (ROUND(POP10/ALAND),1)
FROM de_pop

SELECT * FROM de_density:

STATE	GEOID	COUNTY			POP10	ALAND	AWATER	POPDENS
DE	10001	Kent County		162310	586.179	212.152	276.9
DE	10003	New Castle County	538479	426.286	67.717	1263.2
DE	10005	Sussex County		197145	936.079	260.312	210.6

This gives us a new table with density properly typed. Once again, this is only necessary if you want the calculated field to be permanent and function with other objects in the database, and particularly if you want to join this table to a geodatabase feature or shapefile to map the attributes. If you simply want the new field to answer a specific question or to export the data to output, you can just do a SELECT query and save it as a view.

I also had to do this procedure for every table and shapefile that I imported to the geodatabase. In the Spatialite GUI you don’t have the option to specify a primary key or data types for columns when you do an import; for the latter it takes its best guess. So to get it well-structured, I imported (or created a virtual link), created a new table, inserted the data over, then deleted the original table or severed the link. If it was a shapefile I went through the extra step of activating the geometry (check out the Spatialite Cookbook or the tutorial I wrote for the NYC geodatabase for details).

Since I was dealing with some enormous tables with hundreds of columns, I used some trickery to avoid typing all the statements by hand. If the data was small enough and came from a spreadsheet, I used a series of concatenate formulas to build the CREATE TABLE and INSERT statements by copying the field names and stringing them together with type names and necessary syntax, so I could just copy and paste a statement into the SQL dialog box. For larger datasets I used Python to do the processing, and had Python grab field names and write statements that I could copy and paste.

concatentaing

The import issue was particular to the Spatialite GUI, and not SQLite in general. If you’re dealing with just data tables and use the SQLite Manager (Firefox plugin), it asks you to specify column names, keys and constraints, and types for the columns you’re importing. It does the latter by having you select from a dropdown box for each column – this works fine if you have 10 or 20 columns, but it’s rather tedious if you have hundreds. The Manager also gives you the ability to alter additional elements of the table, like column names, by essentially performing the same operations (create table, insert records, drop table) behind the scenes while holding things temporarily in memory. It prefaces this with a warning message that this operation is not part of the standard SQLite commands, and there’s a chance something could go awry.

NYC Geodatabase in Spatialite

February 6th, 2013

I spent much of the fall semester and winter interim compiling and creating the NYC geodatabase (nyc_gdb), a desktop geodatabase resource for doing basic mapping and analysis at a neighborhood level – PUMAs, ZIP Codes / ZCTAs, and census tracts. There were several motivations for doing this. First and foremost, as someone who is constantly introducing new people to GIS it’s a pain sending people to a half dozen different websites to download shapefiles and process basic features and data before actually doing a project. By creating this resource I hoped to lower the hurdles a bit for newcomers; eventually they still need to learn about the original sources and data processing, but this gives them a chance to experiment and see the possibilities of GIS before getting into nitty gritty details.

Second, for people who are already familiar with GIS and who have various projects to work on (like me) this saves a lot of duplicated effort, as the db provides a foundation to build on and saves the trouble of starting from scratch each time.

Third, it gave me something new to learn and will allow me to build a second part to my open source GIS workshops. I finally sat down and hammered away with Spatialite (went through the Spatialite Cookbook from start to finish) and learned spatial SQL, so I could offer a resource that’s open source and will compliment my QGIS workshop. I was familiar with the Access personal geodatabases in ArcGIS, but for the most part these serve as simple containers. With the ability to run all the spatial SQL operations, Spatialite expands QGIS functionality, which was something I was really looking for.

My original hope was to create a server-based PostGIS database, but at this point I’m not set up to do that on my campus. I figured Spatialite was a good alternative – the basic operations and spatial SQL commands are relatively the same, and I figured I could eventually scale up to PostGIS when the time comes.

I also created an identical, MS Access version of the database for ArcGIS users. Once I got my features in Spatialite I exported them all out as shapefiles and imported them all via ArcCatalog – not too arduous as I don’t have a ton of features. I used the SQLite ODBC driver to import all of my data tables from SQLite into Access – that went flawlessly and was a real time saver; it just took a little bit of time to figure out how to set up (but this blog post helped).

The databases are focused on NYC features and resources, since that’s what my user base is primarily interested in. I purposefully used the Census TIGER files as the base, so that if people wanted to expand the features to the broader region they easily could. I spent a good deal of time creating generalized layers, so that users would have the primary water / coastline and large parks and wildlife areas as reference features for thematic maps, without having every single pond and patch of grass to clutter things up. I took several features (schools, subway stations, etc) from the City and the MTA that were stored in tables and converted them to point features so they’re readily useable.

Given that focus, it’s primarily of interest to NYC folks, but I figured it may be useful for others who wish to experiment with Spatialite. I assumed that most people who would be interested in the database would not be familiar with this format, so I wrote a tutorial that covers the database and it’s features, how to add and map data in QGIS, how to work with the data and do SQL / spatial SQL in the Spatialite GUI, and how to map data in ArcGIS using the Access Geodb. It’s Creative Commons, Attribution, Non-Commercial, Share-alike, so feel free to give it a try.

I spent a good amount of time building a process rather than just a product, so I’ll be able to update the db twice a year, as city features (schools, libraries, hospitals, transit) change and new census data (American Community Survey, ZIP Business Patterns) is released. Many of the Census features, as well as the 2010 Census data, will be static until 2020.

New Version of Introductory GIS Tutorial Now Available

October 7th, 2012

The latest version of my Introduction to GIS tutorial using QGIS is now available. I’ve completely revised how it’s organized and presented; I wrote the first two manuals in HTML, since I wanted something that gave me flexibility with inserting many images in a large document (word processors are notoriously poor at this). Over the summer I learned how to use LaTeX, and the result for this 3rd edition is an infintely better document, for classroom use or self study.

I also updated the manual for use with QGIS 1.8. I’m thinking that the addition of the Data Browser and the ability to simply select the CRS of the current layer or project when you’re doing a Save As (rather than having to select the CRS from the master list) will save a lot of valuable time in class. With every operation that we perform we’re constantly creating new files as the result of selections and geoprocessing, and I always lose a few people each time we’re forced to crawl through the file system to add new layers we’ve created. These simple changes should speed things up. I’ve updated the manual throughout to reflect these changes, and have also updated the datasets to reflect what’s currently available. I provide a summary of the most salient changes in the introduction.

Adding a Scale Bar in the QGIS Map Composer

August 31st, 2012

I recently finished revisions for the next edition of the manual for my GIS workshop, and incorporated a new section on adding a scale bar in the QGIS map composer. I thought I’d share that piece here. I’m using the latest version of QGIS, 1.8 Lisboa.

The scale bar in the map composer automatically takes the units used in the coordinate reference system (CRS) that your layers are in. In order to get a scale bar that makes sense, your layers will need to use a CRS that’s in meters or feet. If you’re using a geographic coordinate system that’s in degrees, like WGS 84 or NAD 83, the result isn’t going to make sense. It’s possible at large scales (covering small areas) to convert degrees to meters or feet but it’s a pain. Most projected coordinate systems for countries, continents, and the globe use meters. Regional systems like UTM also use meters, and in the US you can use a state plane system and choose between meters or feet. Transform your layers to something that’s appropriate for your map.

Once you have layers that are in a CRS that uses meters or feet, you’ll need to convert the units to kilometers or miles using the scale bar’s menu. Let’s say I have a map of the 50 states that’s in the US National Atlas Equal Area Projection, which is a continental projected coordinate system of the US, number 2163 in the EPSG library (this system is also known as the Lambert Azimuthal Equal-Area projection). It’s in meters. In the QGIS Map Composer I select the Add New Scalebar button, and click on the map to add it. The resulting scale bar is rather small and the units are clumped together.

Scalebar button on the toolbar

With the scale bar active, in the menus on the right I click on the Item Properties. First, we have to decide how large we want the individual segments on the scale bar to be. I’m making a thematic map of the US, so for my purposes 200 miles would be OK. The conversion factor is 1,609 meters in a mile. Since we want each segment on the scale bar to represent 200 miles, we multiply 1,609 by 200 to get 321,800 meters. In the Item tab, this is the number that I type in the segment size box: 321,800, replacing the default 100,000. Then, in the map units by bar, I change this from 1 to 1,609. Now the units of the scale bar start to make sense. Increase the number of right segments from 2 to 3, so I have a bar that goes from 0 to 600 miles, in 200 mile segments. I like to decrease the height of the bar a bit, from 5mm to 3mm. In the Unit label box I type Miles. Lastly, I switch to the General options tab just below, and turn off the outline for the bar. Now I have a scale bar that’s appropriate for this map!

Most people in the rest of the world will be using kilometers. That conversion is even simpler. If we wanted the scale bar segments to represent 200 km, we would multiply 200 by 1,000 (1,000 meters in a kilometer) to get 200,000 meters. 200,000 goes in the segment size box and 1,000 goes in the map units per bar. Back in the US, if you’re working in a local state plane projection that uses feet, the conversion will be the familiar 5,280 feet to a mile. So, if we wanted a scale bar that has segments of 20 miles, multiply 5,280 by 20 to get 105,600 feet. 105,600 goes in the segment size box, and 5,280 goes in the map units per bar box.

A reality check is always a good idea. Take your scale bar and compare it to some known distance between features on your map. In our first example, I would drag the scale bar over the map of the US and measure the width of Illinois at its widest point, which is just over 200 miles. In the Item Properties tab for the scale bar I turn the opacity down to zero, so the bar doesn’t hide the map features underneath. If the bar matches up I know I’m in good shape. Remember that not all map projections will preserve distance as a property, and distortion becomes an issue for small scale maps that cover large areas (i.e. continents and the globe). On small scale maps distance will be true along standard parallels, and will become less accurate the further away you go.

American Factfinder Tutorial & Census Geography Updates

July 23rd, 2012

I’ve been en-meshed in the census lately as I’ve been writing a paper about the American Community Survey. Here are a few a things to share:

  • Since I frequently receive questions about how to use the American Factfinder, I’ve created a brief tutorial with screenshots demonstrating a few ways to navigate it. I illustrate how to download a profile for a single census tract from the American Community Survey, and how to download a table for all ZIP Code Tabulation Areas (ZCTAs) in a county using the 2010 Census.
  • New boundaries for PUMAs based on 2010 census geography have been released; they’re not available from the TIGER web-based interface yet but you get can state-based files from the FTP site. I’ve downloaded the boundaries for New York and there are small changes here and there from the 2000 Census boundaries; not surprising as PUMAs are built from tracts and tract boundaries have changed. One big bonus is that PUMAs now have names associated with them, based on local government suggestions. In NY State they either take the name of counties with some directional element (east, central, south, etc), or the name of MCDs that are contained within them. In NYC they’ve been given the names of community districts.
  • I’ve done some digging through the FAQs at https://askacs.census.gov/ and discovered that the census is going to stick with the old 2000 PUMA boundaries for the next release of the American Community Survey – the 2011 ACS will be released at the end of this year. 2010 PUMAs won’t be used until the 2012 ACS, to be released at the end of 2013.
  • Urban Areas are the other holdovers in the ACS that use 2000 vintage boundaries. The ACS will also transition to the 2010 boundaries for urban areas in the 2012 ACS.
  • In the course of my digging I discovered that the census will begin including ZCTA-level data as part of the 5-year ACS estimates, beginning with the 2011 release this year. 2010 ZCTA boundaries are already available, and 2010 Census data has already been released for ZCTAs. The ACS will use the 2010 vintage ZCTAs for each release until they’re redrawn for 2020.

Plan Your Trip through the Roman Empire with ORBIS

June 27th, 2012

If you wanted to know the fastest route from Roma to Londinium in June of 300 AD or how much it would cost to ride the shortest distance at ox cart speed to Constantinopolis, check out ORBIS. Researchers at Stanford have created a model of Ancient Roman transport networks over land and sea composed of 751 points (cities, landmarks, mountain passes) and thousands of linkages.

The model simulates the average distance of a large group of travelers taking a given route in a given month. The frictions of distance, terrain, climate, and monetary expense are all accounted for in the model and you have the ability to set many of the options. The technical aspects of the project as well as its historical bases are thoroughly documented. The output consists of route maps (which you can download as KML or as CSV) and interactive cartograms. The platform is an open source stack – PostgreSQL with PostGIS, Open Layers, Geoserver, and some JavaScript libraries.

Check it out at http://orbis.stanford.edu.

The fastest route from Roma to Londinium in June? A boat ride across the Mediterranean to Narbo, foot/army/pack animal across southern Gaul, and a coast-hugging boat ride from Burdigala will get you there in 26.6 days and 2,974 kilometers. That carriage to Constantinopolis would cost you about 2,087 denarii and would take 128 days at ox cart speed – perhaps you should consider a fast military march instead?


Copyright © 2013 Gothos. All Rights Reserved.
No computers were harmed in the 0.470 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.