Posts Tagged ‘Tutorial’

Creating Reports with SQLite, Python, and prettytable

Friday, February 8th, 2013

In addition to providing the NYC Geodatabase as a resource, I also wanted to use it to generate reports and build applications. None of the open source SQLite GUIs that I’m familiar with have built in report generating capabilities, so I thought I could use Python to connect to the database and generate them. I have some grand ambitions here, but decided to start out small.

Python has a built-in module, sqlite3, that you can use to work with SQLite databases. This is pretty well documented – do a search and you’ll find a ton of brief tutorials. Take a look at this great post for a comprehensive intro.

For generating reports I gave prettytable a shot: it lets you create nice looking ASCII text tables that you can copy and paste from the prompt or export out to a file. The tutorial for the module was pretty clear and covers the basics quite nicely. In the examples he directly embeds the data in the script and generates the table from it, which makes the tutorial readily understandable. For my purposes I wanted to pull data out of a SQLite database and into a formatted table, so that’s what I’ll demonstrate here.

Initially I had some trouble getting the module to load, primarily (I think) because I’m using Python 3.x and the setup file for the module was written for Python 2.x; the utility you use for importing 3rd party modules has changed between versions. I’m certainly no Python expert, so instead of figuring it out I just downloaded the module, dumped it into the site-packages folder (as suggested in the prettytable installation instructions under “The Harder Way” – but it wasn’t hard at all) and unzipped it. In my script I couldn’t get the simple “import prettytable” to work without throwing an error, but when I added the name of the specific function “import PrettyTable from prettytable” it worked. Your mileage may vary.

So here was my first go at it. I created a test database and loaded a table of population estimates from the US Census Bureau into it (you can download it if you want to experiment):

from prettytable import PrettyTable
import sqlite3

conn = sqlite3.connect('pop_test.sqlite')
curs = conn.cursor()
curs.execute('SELECT State, Name, ESTIMATESBASE2010 AS Est2010 FROM pop_est WHERE region="1" ORDER BY Name')

col_names = [cn[0] for cn in curs.description]
rows = curs.fetchall()

x = PrettyTable(col_names)
x.align[col_names[1]] = "l" 
x.align[col_names[2]] = "r" 
x.padding_width = 1    
for row in rows:
    x.add_row(row)

print (x)
tabstring = x.get_string()

output=open("export.txt","w")
output.write("Population Data"+"\n")
output.write(tabstring)
output.close()

conn.close()

The first piece is the standard SQLite piece – connect, activate a cursor, and execute a SQL statement. Here I’m grabbing three columns from the table for records that represent Northeastern states (Region 1). I read in the names of the columns from the first row into the col_names list, and I grab everything else and dump them into rows, a list that contains a tuple for each record:

>>> col_names
['State', 'Name', 'Est2010']
>>> rows
[('09', 'Connecticut', 3574097), ('23', 'Maine', 1328361), ('25', 'Massachusetts', 6547629),
 ('33', 'New Hampshire', 1316469), ('34', 'New Jersey', 8791898), ('36', 'New York', 19378104),
 ('42', 'Pennsylvania', 12702379), ('44', 'Rhode Island', 1052567), ('50', 'Vermont', 625741)]
>>> 

The second piece will make sense after you have a quick look at the prettytable tutorial. Here I grab the list of columns names and specify how cells for the columns should be aligned (default is center) and padded (default is one space). Then I add each row from the nested list of tuples to the table, row by row. There are two outputs: print directly to the screen, and dump the whole table into a string. That string can then be dumped into a text file, along with a title. Here’s the screen output:

+-------+---------------+----------+
| State | Name          |  Est2010 |
+-------+---------------+----------+
|   09  | Connecticut   |  3574097 |
|   23  | Maine         |  1328361 |
|   25  | Massachusetts |  6547629 |
|   33  | New Hampshire |  1316469 |
|   34  | New Jersey    |  8791898 |
|   36  | New York      | 19378104 |
|   42  | Pennsylvania  | 12702379 |
|   44  | Rhode Island  |  1052567 |
|   50  | Vermont       |   625741 |
+-------+---------------+----------+

The one hangup I had was the formatting for the numbers: I really want some commas in there since the values are so large. I couldn’t figure out how to do this using the approach above – I’m writing all the rows in one swoop, and couldn’t step in and and format the last value for each row.

Unless – instead of constructing the table by rows, I construct it by columns. Here’s my second go at it:

from prettytable import PrettyTable
import sqlite3

conn = sqlite3.connect('pop_test.sqlite')
curs = conn.cursor()
curs.execute('SELECT State, Name, ESTIMATESBASE2010 AS Est2010 FROM pop_est WHERE region="1" ORDER BY Name')

col_names = [cn[0] for cn in curs.description]
rows = curs.fetchall()

y=PrettyTable()
y.padding_width = 1
y.add_column(col_names[0],[row[0] for row in rows])
y.add_column(col_names[1],[row[1] for row in rows])
y.add_column(col_names[2],[format(row[2],',d') for row in rows])
y.align[col_names[1]]="l"
y.align[col_names[2]]="r"

print(y)
tabstring = y.get_string()

output=open("export.txt","w")
output.write("Population Data"+"\n")
output.write(tabstring)
output.close()

conn.close()

To add by column, you don’t provide any arguments to the PrettyTable function. You just add the columns one by one: here I call the appropriate values using the index, first for the column name and then for all of the values from the rows that are in the same position. For the last value (the population estimate) I use format to display the value like a decimal number (this works in Python 3.1+ – for earlier versions there’s a similar command – see this post for details). I tried this in my first example but I couldn’t get the format to stick, or got an error. Since I’m specifically calling these row values and then writing them I was able to get it to work in this second example. In this version the alignment specifications have to come last. Here’s the result:

+-------+---------------+------------+
| State | Name          |    Est2010 |
+-------+---------------+------------+
|   09  | Connecticut   |  3,574,097 |
|   23  | Maine         |  1,328,361 |
|   25  | Massachusetts |  6,547,629 |
|   33  | New Hampshire |  1,316,469 |
|   34  | New Jersey    |  8,791,898 |
|   36  | New York      | 19,378,104 |
|   42  | Pennsylvania  | 12,702,379 |
|   44  | Rhode Island  |  1,052,567 |
|   50  | Vermont       |    625,741 |
+-------+---------------+------------+

prettytable gives you a few other options, like the ability to sort records by a certain column or to return only the first “n” records from a table. In this example, since we’re pulling the data from a database we could (and did) specify sorting and other constraints in the SQL statement instead. prettytable also gives you the option of exporting the table as HTML, which can certainly come in handy.

Altering Tables in SQLite / Spatialite

Thursday, February 7th, 2013

In building the Spatialite geodatabase (see previous post), one of the fundamental things I learned was how to manage table creation and alteration in SQLite, which was quite different from my previous experience working with MS Access and ArcGIS.

In SQLite, the ALTER TABLE statement is limited to changing the name of a table or adding new columns. If you want to make any other changes, the process is: create a new, blank table that’s structured the way you want, and then write an INSERT statement to copy the data you want from the existing table into the new table. So if you have a table that has a bunch of columns you want to drop or change, create a new table that has your desired structure, then insert what you want into that new table. The same thing goes for primary keys or other constraints. If your existing table doesn’t have a specified key you can’t alter it by specifying one: create a new table that does, then copy your data over.

Let’s say we have this table (de_data) with basic population data for Delaware’s three counties:

USPS	GEOID	NAME			POP10	HU10	ALAND_SQMI	AWATER_SQMI
DE	10001	Kent County		162310	65338	586.179		212.152
DE	10003	New Castle County	538479	217511	426.286		67.717
DE	10005	Sussex County		197145	123036	936.079		260.312

And let’s say that we want to change some of the column names, drop the field for housing units, and specify our data types and GEOID as our key. First, create the table:

CREATE TABLE de_pop (STATE TEXT, GEOID TEXT NOT NULL PRIMARY KEY,
COUNTY TEXT, POP10 INTEGER, ALAND REAL, AWATER REAL)

GEOID is the FIPS/ANSI code that uniquely identifies each state. Since these codes may have leading zeros (the codes for all states from AL through CT do), we designate it as text.

Second, insert the data we want from the existing table:

INSERT INTO de_pop (STATE, GEOID, COUNTY, POP10, ALAND, AWATER)
SELECT USPS, GEOID, NAME, POP10, ALAND_SQMI, AWATER_SQMI
FROM de_data

Order here matters – it’s going to insert columns from the original table into the new one in sequence: USPS into STATE, GEOID into GEOID, etc. Sometimes it’s possible to use an * as a shortcut to insert and copy everything, instead of listing every field, if both tables contain the same number of columns and they’re in the right order. But this is always a bit risky.

Lastly, if we don’t need that original table we could delete it:

DROP TABLE de_data

SELECT * FROM de_pop

STATE	GEOID	COUNTY			POP10	ALAND	AWATER
DE	10001	Kent County		162310	586.179	212.152
DE	10003	New Castle County	538479	426.286	67.717
DE	10005	Sussex County		197145	936.079	260.312

The process is similar if we want to take a query or view and turn it into a permanent table. SQLite does support CREATE TABLE AS, followed by a select query, so you can create a table out of a query. However – this is usually NOT the best course of action. If you do this, you won’t be able to specify a primary key for the new table (you really never want a table that lacks a key). Furthermore, if you create a new, calculated field you won’t be able to specify a type for it. This is particularly a problem if you’re working in Spatialite and want to create a spatial view or table that you want to join to some features and map in QGIS. If no type is specified, QGIS won’t know how to handle that new field, and it will be unviewable.

So, we take the same approach as before. Let’s say we want to create a table with population density as a calculated field. First, create the table:

CREATE TABLE de_popdens (STATE TEXT, GEOID TEXT NOT NULL PRIMARY KEY,
COUNTY TEXT, POP10 INTEGER, ALAND REAL, AWATER REAL, POPDENS REAL)

Then we write our insert statement, and in the insert we create the calculated field:

INSERT INTO de_popdens (STATE, GEOID, COUNTY, POP10, ALAND, AWATER, POPDENS)
SELECT STATE, GEOID, COUNTY, POP10, ALAND, AWATER, (ROUND(POP10/ALAND),1)
FROM de_pop

SELECT * FROM de_density:

STATE	GEOID	COUNTY			POP10	ALAND	AWATER	POPDENS
DE	10001	Kent County		162310	586.179	212.152	276.9
DE	10003	New Castle County	538479	426.286	67.717	1263.2
DE	10005	Sussex County		197145	936.079	260.312	210.6

This gives us a new table with density properly typed. Once again, this is only necessary if you want the calculated field to be permanent and function with other objects in the database, and particularly if you want to join this table to a geodatabase feature or shapefile to map the attributes. If you simply want the new field to answer a specific question or to export the data to output, you can just do a SELECT query and save it as a view.

I also had to do this procedure for every table and shapefile that I imported to the geodatabase. In the Spatialite GUI you don’t have the option to specify a primary key or data types for columns when you do an import; for the latter it takes its best guess. So to get it well-structured, I imported (or created a virtual link), created a new table, inserted the data over, then deleted the original table or severed the link. If it was a shapefile I went through the extra step of activating the geometry (check out the Spatialite Cookbook or the tutorial I wrote for the NYC geodatabase for details).

Since I was dealing with some enormous tables with hundreds of columns, I used some trickery to avoid typing all the statements by hand. If the data was small enough and came from a spreadsheet, I used a series of concatenate formulas to build the CREATE TABLE and INSERT statements by copying the field names and stringing them together with type names and necessary syntax, so I could just copy and paste a statement into the SQL dialog box. For larger datasets I used Python to do the processing, and had Python grab field names and write statements that I could copy and paste.

concatentaing

The import issue was particular to the Spatialite GUI, and not SQLite in general. If you’re dealing with just data tables and use the SQLite Manager (Firefox plugin), it asks you to specify column names, keys and constraints, and types for the columns you’re importing. It does the latter by having you select from a dropdown box for each column – this works fine if you have 10 or 20 columns, but it’s rather tedious if you have hundreds. The Manager also gives you the ability to alter additional elements of the table, like column names, by essentially performing the same operations (create table, insert records, drop table) behind the scenes while holding things temporarily in memory. It prefaces this with a warning message that this operation is not part of the standard SQLite commands, and there’s a chance something could go awry.

NYC Geodatabase in Spatialite

Wednesday, February 6th, 2013

I spent much of the fall semester and winter interim compiling and creating the NYC geodatabase (nyc_gdb), a desktop geodatabase resource for doing basic mapping and analysis at a neighborhood level – PUMAs, ZIP Codes / ZCTAs, and census tracts. There were several motivations for doing this. First and foremost, as someone who is constantly introducing new people to GIS it’s a pain sending people to a half dozen different websites to download shapefiles and process basic features and data before actually doing a project. By creating this resource I hoped to lower the hurdles a bit for newcomers; eventually they still need to learn about the original sources and data processing, but this gives them a chance to experiment and see the possibilities of GIS before getting into nitty gritty details.

Second, for people who are already familiar with GIS and who have various projects to work on (like me) this saves a lot of duplicated effort, as the db provides a foundation to build on and saves the trouble of starting from scratch each time.

Third, it gave me something new to learn and will allow me to build a second part to my open source GIS workshops. I finally sat down and hammered away with Spatialite (went through the Spatialite Cookbook from start to finish) and learned spatial SQL, so I could offer a resource that’s open source and will compliment my QGIS workshop. I was familiar with the Access personal geodatabases in ArcGIS, but for the most part these serve as simple containers. With the ability to run all the spatial SQL operations, Spatialite expands QGIS functionality, which was something I was really looking for.

My original hope was to create a server-based PostGIS database, but at this point I’m not set up to do that on my campus. I figured Spatialite was a good alternative – the basic operations and spatial SQL commands are relatively the same, and I figured I could eventually scale up to PostGIS when the time comes.

I also created an identical, MS Access version of the database for ArcGIS users. Once I got my features in Spatialite I exported them all out as shapefiles and imported them all via ArcCatalog – not too arduous as I don’t have a ton of features. I used the SQLite ODBC driver to import all of my data tables from SQLite into Access – that went flawlessly and was a real time saver; it just took a little bit of time to figure out how to set up (but this blog post helped).

The databases are focused on NYC features and resources, since that’s what my user base is primarily interested in. I purposefully used the Census TIGER files as the base, so that if people wanted to expand the features to the broader region they easily could. I spent a good deal of time creating generalized layers, so that users would have the primary water / coastline and large parks and wildlife areas as reference features for thematic maps, without having every single pond and patch of grass to clutter things up. I took several features (schools, subway stations, etc) from the City and the MTA that were stored in tables and converted them to point features so they’re readily useable.

Given that focus, it’s primarily of interest to NYC folks, but I figured it may be useful for others who wish to experiment with Spatialite. I assumed that most people who would be interested in the database would not be familiar with this format, so I wrote a tutorial that covers the database and it’s features, how to add and map data in QGIS, how to work with the data and do SQL / spatial SQL in the Spatialite GUI, and how to map data in ArcGIS using the Access Geodb. It’s Creative Commons, Attribution, Non-Commercial, Share-alike, so feel free to give it a try.

I spent a good amount of time building a process rather than just a product, so I’ll be able to update the db twice a year, as city features (schools, libraries, hospitals, transit) change and new census data (American Community Survey, ZIP Business Patterns) is released. Many of the Census features, as well as the 2010 Census data, will be static until 2020.

New Version of Introductory GIS Tutorial Now Available

Sunday, October 7th, 2012

The latest version of my Introduction to GIS tutorial using QGIS is now available. I’ve completely revised how it’s organized and presented; I wrote the first two manuals in HTML, since I wanted something that gave me flexibility with inserting many images in a large document (word processors are notoriously poor at this). Over the summer I learned how to use LaTeX, and the result for this 3rd edition is an infintely better document, for classroom use or self study.

I also updated the manual for use with QGIS 1.8. I’m thinking that the addition of the Data Browser and the ability to simply select the CRS of the current layer or project when you’re doing a Save As (rather than having to select the CRS from the master list) will save a lot of valuable time in class. With every operation that we perform we’re constantly creating new files as the result of selections and geoprocessing, and I always lose a few people each time we’re forced to crawl through the file system to add new layers we’ve created. These simple changes should speed things up. I’ve updated the manual throughout to reflect these changes, and have also updated the datasets to reflect what’s currently available. I provide a summary of the most salient changes in the introduction.

Adding a Scale Bar in the QGIS Map Composer

Friday, August 31st, 2012

I recently finished revisions for the next edition of the manual for my GIS workshop, and incorporated a new section on adding a scale bar in the QGIS map composer. I thought I’d share that piece here. I’m using the latest version of QGIS, 1.8 Lisboa.

The scale bar in the map composer automatically takes the units used in the coordinate reference system (CRS) that your layers are in. In order to get a scale bar that makes sense, your layers will need to use a CRS that’s in meters or feet. If you’re using a geographic coordinate system that’s in degrees, like WGS 84 or NAD 83, the result isn’t going to make sense. It’s possible at large scales (covering small areas) to convert degrees to meters or feet but it’s a pain. Most projected coordinate systems for countries, continents, and the globe use meters. Regional systems like UTM also use meters, and in the US you can use a state plane system and choose between meters or feet. Transform your layers to something that’s appropriate for your map.

Once you have layers that are in a CRS that uses meters or feet, you’ll need to convert the units to kilometers or miles using the scale bar’s menu. Let’s say I have a map of the 50 states that’s in the US National Atlas Equal Area Projection, which is a continental projected coordinate system of the US, number 2163 in the EPSG library (this system is also known as the Lambert Azimuthal Equal-Area projection). It’s in meters. In the QGIS Map Composer I select the Add New Scalebar button, and click on the map to add it. The resulting scale bar is rather small and the units are clumped together.

Scalebar button on the toolbar

With the scale bar active, in the menus on the right I click on the Item Properties. First, we have to decide how large we want the individual segments on the scale bar to be. I’m making a thematic map of the US, so for my purposes 200 miles would be OK. The conversion factor is 1,609 meters in a mile. Since we want each segment on the scale bar to represent 200 miles, we multiply 1,609 by 200 to get 321,800 meters. In the Item tab, this is the number that I type in the segment size box: 321,800, replacing the default 100,000. Then, in the map units by bar, I change this from 1 to 1,609. Now the units of the scale bar start to make sense. Increase the number of right segments from 2 to 3, so I have a bar that goes from 0 to 600 miles, in 200 mile segments. I like to decrease the height of the bar a bit, from 5mm to 3mm. In the Unit label box I type Miles. Lastly, I switch to the General options tab just below, and turn off the outline for the bar. Now I have a scale bar that’s appropriate for this map!

Most people in the rest of the world will be using kilometers. That conversion is even simpler. If we wanted the scale bar segments to represent 200 km, we would multiply 200 by 1,000 (1,000 meters in a kilometer) to get 200,000 meters. 200,000 goes in the segment size box and 1,000 goes in the map units per bar. Back in the US, if you’re working in a local state plane projection that uses feet, the conversion will be the familiar 5,280 feet to a mile. So, if we wanted a scale bar that has segments of 20 miles, multiply 5,280 by 20 to get 105,600 feet. 105,600 goes in the segment size box, and 5,280 goes in the map units per bar box.

A reality check is always a good idea. Take your scale bar and compare it to some known distance between features on your map. In our first example, I would drag the scale bar over the map of the US and measure the width of Illinois at its widest point, which is just over 200 miles. In the Item Properties tab for the scale bar I turn the opacity down to zero, so the bar doesn’t hide the map features underneath. If the bar matches up I know I’m in good shape. Remember that not all map projections will preserve distance as a property, and distortion becomes an issue for small scale maps that cover large areas (i.e. continents and the globe). On small scale maps distance will be true along standard parallels, and will become less accurate the further away you go.

American Factfinder Tutorial & Census Geography Updates

Monday, July 23rd, 2012

I’ve been en-meshed in the census lately as I’ve been writing a paper about the American Community Survey. Here are a few a things to share:

  • Since I frequently receive questions about how to use the American Factfinder, I’ve created a brief tutorial with screenshots demonstrating a few ways to navigate it. I illustrate how to download a profile for a single census tract from the American Community Survey, and how to download a table for all ZIP Code Tabulation Areas (ZCTAs) in a county using the 2010 Census.
  • New boundaries for PUMAs based on 2010 census geography have been released; they’re not available from the TIGER web-based interface yet but you get can state-based files from the FTP site. I’ve downloaded the boundaries for New York and there are small changes here and there from the 2000 Census boundaries; not surprising as PUMAs are built from tracts and tract boundaries have changed. One big bonus is that PUMAs now have names associated with them, based on local government suggestions. In NY State they either take the name of counties with some directional element (east, central, south, etc), or the name of MCDs that are contained within them. In NYC they’ve been given the names of community districts.
  • I’ve done some digging through the FAQs at https://askacs.census.gov/ and discovered that the census is going to stick with the old 2000 PUMA boundaries for the next release of the American Community Survey – the 2011 ACS will be released at the end of this year. 2010 PUMAs won’t be used until the 2012 ACS, to be released at the end of 2013.
  • Urban Areas are the other holdovers in the ACS that use 2000 vintage boundaries. The ACS will also transition to the 2010 boundaries for urban areas in the 2012 ACS.
  • In the course of my digging I discovered that the census will begin including ZCTA-level data as part of the 5-year ACS estimates, beginning with the 2011 release this year. 2010 ZCTA boundaries are already available, and 2010 Census data has already been released for ZCTAs. The ACS will use the 2010 vintage ZCTAs for each release until they’re redrawn for 2020.

GIS Workshops This Apr & May

Sunday, March 25th, 2012

This semester I’ll be teaching three workshops with Prof. Deborah Balk in spatial tools and analysis. Sponsored by the CUNY Institute of Demographic Research (CIDR), the workshops will be held on Baruch College’s campus in midtown NYC on Friday afternoons. The course is primarily intended for data and policy analysts who want to gain familiarity with the basics of map making and spatial analysis; registration is open to anyone. The workshops progress from basic to intermediate skills that cover making a map (Apr 27th), geospatial calculations (May 4th), and geospatial analysis (May 11th). We’ll be using QGIS and participants will work off of their own laptops; we’ll also be demonstrating some of the processes in ArcGIS and participants will receive an evaluation copy of that software. Each workshop is $300 or you can register for all three for $750.

For full details check out this flier. You can register via the College’s CAPS website; do a search for DEM and register for each session (DEM0003, DEM0004, and DEM0005).

Screen Scraping Data with Python

Friday, March 9th, 2012

I had a request recently for population centers (aka population centroids) for all the counties in the US. The Census provides the 2010 centroids in state level files and in one national file for download, but the 2000 centroids were provided in HTML tables on individual web pages for each state. Rather than doing the tedious work of copying and pasting 51 web pages into a spreadsheet, I figured this was my chance to learn how to do some screen scraping with Python. I’m certainly no programmer, but based on what I’ve learned (I took a three day workshop a couple years ago) and by consulting books and crawling the web for answers when I get stuck, I’ve been able to write some decent scripts for processing data.

For screen scraping there’s a must-have module called Beautiful Soup which easily let’s you parse web pages, well or ill-formed. After reading the Beautiful Soup Quickstart and some nice advice I found on a post on Stack Overflow, I was able to build a script that looped through each of the state web pages, scraped the data from the tables, and dumped it into a delimited text file. Here’s the code:

## Frank Donnelly Feb 29, 2012
## Scrapes 2000 centers of population for counties from individual state web pages
## and saves in one national-level text file.

from urllib.request import urlopen
from bs4 import BeautifulSoup

output_file=open('CenPop2000_Mean_CO.txt','a')
header=['STATEFP','COUNTYFP','COUNAME','STNAME','POPULATION','LATITUDE','LONGITUDE']
output_file.writelines(",".join(header)+"\n")

url='http://www.census.gov/geo/www/cenpop/county/coucntr%s.html'

fips=['01','02','04','05','06','08','09','10',
'11','12','13','15','16','17','18','19','20',
'21','22','23','24','25','26','27','28','29','30',
'31','32','33','34','35','36','37','38','39','40',
'41','42','44','45','46','47','48','49','50',
'51','53','54','55','56']

for i in fips:
  soup = BeautifulSoup(urlopen(url %i).read())
  titleTag = soup.html.head.title
  list=titleTag.string.split()
  name=(list[4:])
  state=' '.join(name)  
 
  for row in soup('table')[1].tbody('tr'):
    tds = row('td')
    line=tds[0].string, tds[1].string, tds[2].string, state, 
    tds[3].string.replace(',',''), tds[4].string, tds[5].string

    output_file.writelines(",".join(line)+"\n")     

output_file.close()

After installing the modules step 1 is to import them into the script. I initially got a little stuck here, because there are also some standard modules for working with urls (urllib and urlib2) that I’ve seen in books and other examples that weren’t working for me. I discovered that since I’m using Python 3.x and not the 2.x series, something had changed recently and I had to change how I was referencing urllib.

With that out of the way I created a a text file, a list with the column headings I want, and then wrote those column headings to my file.

Next I read in the url. Since the Census uses a static URL that varies for each state by FIPS code, I was able to assign the URL to a variable and inserted the % symbol to substitute where the FIPS code goes. I created a list of all the FIPS codes, and then I run through a loop – for every FIPS code in the list I pass that code into the url where the % place holder is, and process that page.

The first bit of info I need to grab is the name of the state, which doesn’t appear in the table. I grab the title tag from the page and save it as a list, and then grab everything from the fourth element (fifth word) to the end of the list to capture the state name, and then collapse those list elements back into one string (have to do this for states that have multiple words – New, North, South, etc.).

So we go from the HTML Title tag:

County Population Centroids for New York

To a list with elements 0 to 5:

list=["County", "Population", "Centroids", "for", "New", "York"]

To a shorter list with elements 4 to end:

name=["New","York"]

To a string:

state=”New York”

But the primary goal here is to grab everything in the table. So we identify the table in the HTML that we want – the first table in those pages [0] is just an empty frame and the second one [1] is the one with the data. For every row (tr) in the table we can reference and grab each cell (td), and string those cells together as a line by referencing them in the list. As I string these together I also insert the state name so that it appears on every line, and for the third list element (total population in 2000) I strip out any commas (numbers in the HTML table included commas, a major no-no that leads to headaches in a csv file). After we grab that line we dump it into the output file, with each value separated by a comma and each record on it’s own line (using the new line character). Once we’ve looped through each table on each page for each state, we close the file.

There are a few variations I could have tried; I could have read the FIPS codes in from a table rather than inserting them into the script, but I preferred to keep everything together. I could have read the state names in as a list, or coupled them with the codes in a dictionary. This would have been less risky then relying on the state name in the title tag, but since the pages were well-formed and I wanted to experiment a little I went the title tag route. Instead of typing the codes in by hand I used Excel trickery to concatenate commas to the end of each code, and then concatenated all the values together in one cell so I could copy and paste the list into the script.

You can go here to see an individual state page and source, and here to see what the final output looks like. Or if you’re just looking for a national level file of 2000 population centroids for counties that you can download, look no further!

Updates for QGIS 1.7 Wroclaw

Tuesday, July 5th, 2011

The latest version of QGIS, 1.7 “Wroclaw” was released a few weeks ago. Some of the recent updates render parts of my GIS Practiucm out of date (unless you’re sticking with versions 1.5 or 1.6), so I’ll be making updates later this summer for the upcoming workshops this academic year. In the meantime, I wanted to summarize the most salient changes here for the participants in this past spring’s workshops, and for anyone else who may be interested. Here are the big two changes that affect the tutorial / manual:

Transforming Projections – In previous versions you would go under Vector – > Data Management Tools > Export to New Projection. In 1.7 this has been dropped from the ftools vector menu. To transform the projection of a file, you select it in the Map Legend (ML), right-click, hit Save As, give it a new name and specify the new CRS there. The QGIS developers have provided some info on how QGIS handles projections that’s worth checking out. You can go in the settings and have QGIS transform projections on the fly, which is fine depending on what you’re going to do. My preference is to play it safe – do the transformations and make sure all your files and the window share the same CRS. It can save you headaches later on.

Table Joins – In previous versions you would also accomplish this under Vector – > Data Management Tools > Join Attributes, where you’d join a DBF or CSV to a shapefile to create a new file with both the geometry and the data. Now that’s out, and QGIS can support dynamic joins, similar to ArcGIS where you couple the attribute table to the shapefile without permanently fusing the two. To do this you must add your DBF or CSV directly to your project; do this the same way you’d add a vector layer. Hit the Add Vector button, and change the drop down at the bottom so you can see all files (not just shapefiles) in your directory. Add your table. Then select your layer in the ML and double click to open the properties menu. You’ll see a new tab for Joins. Hit the tab and hit the plus button to add a new join. You’ll select your table, the table’s join field, and the layer’s join field. Hit OK to join em, and it’s done. Open the attribute table for the layer and you’ll see all columns for the layer and the joined field.
New Join Menu QGIS 1.7

This sounds great, but I had some trouble when I went to symbolize my data. Using the old symbology tab, I couldn’t classify any of my columns from my attribute table using Equal Intervals; it populated each class with zeros. Quantiles worked fine. If I switched to the new symbology, I still couldn’t use Equal Intervals, and Quantiles and Natural Breaks only worked partially – my dataset contained negative values which were simply dropped instead of classified. Grrrrr. I got around this by selecting my layer in the ML (after doing the join), right clicked, and saved it as a new layer. This permanently fused the shapefile with the attributes, and I had no problem classifying and mapping data in the new file. I went to the forum and asked for help to see if this is a bug – will report back with the results.

Here are some other updates worth noting:

  • Feature Count – if you right click on a layer in the ML, you can select the Feature Count option and the number of features will be listed under your layer. If you’ve classified your features, it will do a separate count for each classification.
  • Feature Count QGIS 1.7

  • Measuring Tool – it looks like it’s now able to convert units on the fly, so even if you’re using a CRS with units in degrees, it can convert it to meters and kilometers (you can go into the options menu if you want to switch to feet and miles).
  • Labels – it looks like the experimental labelling features have been made default in this verson, and they do work better, especially with polygons.
  • Map Composer Undo – undo and redo buttons have been added to the map composer, which makes it much easier to work with.
  • Undo Redo Button Map Composer

  • Map Composer Symbols – if you go to insert an image in the map composer, you’ll have a number of default SVG symbols you can load, incuding north arrows
  • Export to PDF – from the map composer, this was pretty buggy in the past but I was able to export a map today without any problems at all.

Define Projection for a Batch of Shapefiles

Sunday, June 12th, 2011

I was working on a project where I had downloaded 51 shapefiles (state-based census tract files) from the Census Generalized Cartographic Boundary Files. Each file lacked a projection .prj file, so I had to define each one as NAD83. Not wanting to do this one at a time, I used the GDAL / OGR tools and a bash script to process them all in a batch. I wrote a little script in a text file and then pasted it in the command line:

#!/bin/bash
for i in $(ls *.shp); do
ogr2ogr -f “ESRI Shapefile” -a_srs “EPSG:4269″ ./nad83 $i
done

It iterates through a list of all the shapefiles in a directory, uses OGR to define them as NAD83, then writes them to a new subdirectory called NAD83.

After searching through the web for some guidance on this, I later realized that there was a nice, succinct example of this in a book that I had (yeah – remember books? They’re still great!)

#!/bin/bash
# from Sherman (2008) Desktop GIS Mapping the Planet With Open Source Tools pp 243-44

for shp in *.shp
do
echo “Processing $shp”
ogr2ogr -f “ESRI Shapefile” -t_srs EPSG:4326 geo/$shp $shp
done

This does the same thing, difference here is that it prints a message to the command line for each file that’s processed and uses the -t_srs switch (transform projection) rather than the -a_srs (assign an output projection), which in this case seems to do the same thing. Of course you could tweak this a little to transform projections from one system to another as well.

This is fine and good if you’re using Linux and can use bash (go here for more info about bash). If you’re using Windows, you can do this if you’re using a Linux / UNIX terminal emulator like MSYS; otherwise you can use the DOS Command Prompt and write a batch (.bat) file to do this instead – the post on this forum is the first thing I found in my quest to figure all of this out.


Copyright © 2013 Gothos. All Rights Reserved.
No computers were harmed in the 0.729 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.