Posts Tagged ‘sql’

Giving GRASS GIS a Try

Saturday, July 30th, 2011

I’ve been taking the plunge in learning GRASS GIS this summer, as I’ve been working at home (and thus don’t have access to ArcGIS) on a larger and more long-term project (and thus don’t want to mess around with shapefiles and csv tables). I liked the idea of working with GRASS vectors, as everything is contained in one folder and all my attributes are stored rather neatly in a SQLite database.

I started out using QGIS to create my mapset and to connect it to my SQLite db which I had created and loaded with some census data. Then I thought, why not give the GRASS interface a try? I started using the newer Python-wx GUI and as I’m trying different things, I bounce back and forth between using the GUI for launching commands and the command line for typing them in – all the while I have Open Source GIS A GRASS GIS Approach at my side and the online manual a click away . So far, so good.

I loaded and cleaned a shapefile with the GRASS GUI (the GUI launches v.in.ogr, v.build, abd v.clean) and it’s attributes were loaded into the SQLite database I had set (using db.connect – need to do this otherwise a DBF is created by default for storing attributes). Then I had an age-old task to perform – the US Census FIPS / ANSI codes where stored in separate fields, and in order to join them to my attribute tables I had to concatenate them. I also needed to append some zeros to census tract IDs that lacked them – FIPS codes for states are two digits long, counties are three digits, and tracts are between four and six digits, but to standardize them four digit tracts should have two zeros appended.

Added the new JOIN_ID column using v.db.addcol, then did the following using db.execute:

UPDATE tracts_us99_nad83
SET JOIN_ID = STATE || COUNTY || TRACT

Then:

UPDATE tracts_us99_nad83
SET JOIN_ID = JOIN_ID || ’00′
WHERE length(JOIN_ID)=9

So this:

STATE COUNTY TRACT
01 077 0113
01 077 0114
01 077 011502
01 077 011602

Becomes this:

JOIN_ID
01077011300
01077011400
01077011502
01077011602

db.execute GRASS GUI

I could have done this a few different ways from within GRASS: instead of the separate v.db.addcol command I could have written a SQL statement in db.execute to alter the table and add a column. Or, instead of db.execute I could have used the v.db.update command.

My plan is to use GRASS for geoprocessing and analysis (will be doing some buffering, geographic selection, and basic spatial stats), and QGIS for displaying and creating final maps. I just used v.in.db to transform an attribute table with coordinates in my db to a point vector. But now I’m realizing that in order to draw buffers, I’ll need a projected coordinate system that uses meters or feet, as inputting degrees for a buffer distance (for points throughout the US) isn’t going to work too well. I’m also having trouble figuring out how to link my attribute data to my vectors – I can easily use v.db.join to fuse the two together, but there is a way to link them more loosely using the CAT ID number for each vector, but I’m getting stuck. We’ll see how it goes.

Some final notes – since I’m working with large datasets (every census tract in the US) and GRASS uses a topological data model where common node and boundaries between polygons are shared, geoprocessing can take awhile. I’ve gotten in the habit of testing things out on a small subset of my vectors, and once I get it right I run the process on the total set.

Lastly, there are times where I read about commands in the manual and for the life of me I can’t find them in the GUI – for example, finding a menu to delete (i.e. permanently remove) layers. But if you type the command without any of its parameters in the command line (in this case, g.remove) it will launch the right window in the GUI.

GRASS GIS Interface

Calculated Fields in SpatiaLite / SQLite

Wednesday, February 3rd, 2010

After downloading data, it’s pretty common that you’ll want to create calculated fields, such as percent totals or change, to use for analysis and mapping. The next step in my QGIS / SpatiaLite experiment was to create a calculated field (aka derived field). I’ll run through three ways of accomplishing this, using my subway commuter data to calculate the percentage of workers in each NYC PUMA who commute to work. Just to keep everything straight:

  • sub_commuters is a census data table for all PUMAs in NY State
    • [SUBWAY] field that has the labor force that commutes by subway
    • [WORKERS_16] field with the total labor force
    • [SUB_PER] a calculated field with the % of labor force that commutes by subway
    • [GEO_ID2] the primary key field, FIPS code that is the unqiue identifier
  • nyc_pumas is a feature class with all PUMAs in NYC
    • [PUMA5ID00] is the primary key field, FIPS code that is the unqiue identifier
  • pumas_nyc_subcom is the data table that results from joining sub_commuters and nyc_pumas; it can be converted to a feature class for mapping

Spreadsheet

The first method would be to add the calculated field to the data after downloading it from the census in a spreadsheet, as part of the cleaning / preparation stage. You could then save it as a delimited text file for import to SpatiaLite. No magic there, so I’ll skip to the second method.

SpatiaLite

The second method would be to create the calculated field in the SpatiaLite database. I’ll go through the steps I used to figure this out. The basic SQL select query:

SELECT *, (SUBWAY / WORKERS_16) AS SUB_PER FROM sub_commuters

This gives us the proper result, but there are two problems. First, the data in my SUBWAY and WORKERS_16 field are stored as integers, and when you divide the result is rounded to the nearest whole number. Not very helpful here, as my percentage results get rounded to 0 or 1. There are many ways to work around this: set the numeric fields as double, real, or float in the spreadsheet before import (didn’t work for me), specify the field types when importing (didn’t get that option with the SpatiaLite GUI, but maybe you can with the command line), add * 100 to the expression to multiply the percentage to a whole number (ok unless you need decimals in your result) or use the CAST operator. CAST converts the current data type of a field to a specified data type in the result of the expression. So:

SELECT *, (CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL)) AS SUB_PER FROM sub_commuters

This gave me the percentages with several decimal places (since we’re casting the fields as real instead of integer), which is what I needed. The second problem is that this query just produces a temporary view; in order to map this data, we need to create a new table to make the calculated field permanent and join it to a feature class. Here’s how we do that:

CREATE TABLE pumas_nyc_subcom AS
SELECT *, (CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL)) AS SUB_PER
FROM sub_commuters, nyc_pumas
WHERE nyc_pumas.PUMA5ID00=sub_commuters.geo_id2

The CREATE TABLE AS statement let’s us create a new table from the existing two tables – the data table of subway commuters and the feature class table for NYC PUMAs. We select all the fields in both while throwing in the new calculated field, and we join the data table to the feature class all in one step, and via the join we end up with just data from NYC (the data for the rest of the state gets dropped). After that, it’s just a matter of taking our new table and enabling the geometry to make it a feature class (as explained in the previous post).

This seems like it should work – but I discovered another problem. The resulting calculated field that has the percentage of subway commuters per PUMA, SUB_PER, has no data type associated with it. Looking at the schema for the table in SpatiaLite shows that the data type is blank. If I bring this into QGIS, I’m not able to map this field as a numeric value, because QGIS doesn’t know what it is. I have to define the data type for this field. SpatiaLite (SQLite really) doesn’t allow you to re-define an existing field – we have to create and define a new blank field, and the set the value of our calculated field equal to it. Here are the SQL statements to make it all happen:

ALTER TABLE sub_commuters ADD SUB_PER REAL

UPDATE sub_commuters SET SUB_PER=(CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL))

CREATE TABLE pumas_nyc_subcom AS
SELECT * FROM sub_commuters, nyc_pumas
WHERE nyc_pumas.PUMA5ID00=sub_commuters.geo_id2

So, we add a new blank field to our data table and define it as real. Then we update our data table by seting that blank field equal to our expression, thus filling the field with the result of our expression. Once we have the defined calculated field, we can create a new table from the data plus the features based on the ID they share in common. Once the table is created, then we can activate the geometry (right click on geometry field in the feature class and activate – see previous post for details) so we can map it in QGIS. Phew!

QGIS

The third method is to create the calculated field within QGIS, using the new field calculator. It’s pretty easy to do – you select the layer in the table of contents and go into an edit mode. Open the attribute table for the features and click the last button in the row of buttons underneath the table – this is the field calculator button. Once we’re in the field calculator window, we can choose to update an existing field or create a new field. We give the output field a name and a data type, enter our expression SUBWAY / WORKERS_16, hit OK, and we have our new field. Save the edits and we should be good to go. HOWEVER – I wasn’t able to add a calculated fields to features in a SpatiaLite geodatabase without getting errors. I posted to the QGIS forum – initially it was thought that the SpatiaLite driver was read only, but it turns out that’s not the case and so and the developers are investigating a possible bug. The investigation continues – stay tuned. I have tried the field calculator with shapefiles and it works perfectly (incidentally, you can export SpatiaLite features out of the database as shapefiles).

I’m providing the database I created here for download, if anyone wants to experiment.


Copyright © 2013 Gothos. All Rights Reserved.
No computers were harmed in the 0.810 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.