After downloading data, it’s pretty common that you’ll want to create calculated fields, such as percent totals or change, to use for analysis and mapping. The next step in my QGIS / SpatiaLite experiment was to create a calculated field (aka derived field). I’ll run through three ways of accomplishing this, using my subway commuter data to calculate the percentage of workers in each NYC PUMA who commute to work. Just to keep everything straight:
- sub_commuters is a census data table for all PUMAs in NY State
- [SUBWAY] field that has the labor force that commutes by subway
- [WORKERS_16] field with the total labor force
- [SUB_PER] a calculated field with the % of labor force that commutes by subway
- [GEO_ID2] the primary key field, FIPS code that is the unqiue identifier
- nyc_pumas is a feature class with all PUMAs in NYC
- [PUMA5ID00] is the primary key field, FIPS code that is the unqiue identifier
- pumas_nyc_subcom is the data table that results from joining sub_commuters and nyc_pumas; it can be converted to a feature class for mapping
The first method would be to add the calculated field to the data after downloading it from the census in a spreadsheet, as part of the cleaning / preparation stage. You could then save it as a delimited text file for import to SpatiaLite. No magic there, so I’ll skip to the second method.
The second method would be to create the calculated field in the SpatiaLite database. I’ll go through the steps I used to figure this out. The basic SQL select query:
SELECT *, (SUBWAY / WORKERS_16) AS SUB_PER FROM sub_commuters
This gives us the proper result, but there are two problems. First, the data in my SUBWAY and WORKERS_16 field are stored as integers, and when you divide the result is rounded to the nearest whole number. Not very helpful here, as my percentage results get rounded to 0 or 1. There are many ways to work around this: set the numeric fields as double, real, or float in the spreadsheet before import (didn’t work for me), specify the field types when importing (didn’t get that option with the SpatiaLite GUI, but maybe you can with the command line), add * 100 to the expression to multiply the percentage to a whole number (ok unless you need decimals in your result) or use the CAST operator. CAST converts the current data type of a field to a specified data type in the result of the expression. So:
SELECT *, (CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL)) AS SUB_PER FROM sub_commuters
This gave me the percentages with several decimal places (since we’re casting the fields as real instead of integer), which is what I needed. The second problem is that this query just produces a temporary view; in order to map this data, we need to create a new table to make the calculated field permanent and join it to a feature class. Here’s how we do that:
CREATE TABLE pumas_nyc_subcom AS
SELECT *, (CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL)) AS SUB_PER
FROM sub_commuters, nyc_pumas
The CREATE TABLE AS statement let’s us create a new table from the existing two tables – the data table of subway commuters and the feature class table for NYC PUMAs. We select all the fields in both while throwing in the new calculated field, and we join the data table to the feature class all in one step, and via the join we end up with just data from NYC (the data for the rest of the state gets dropped). After that, it’s just a matter of taking our new table and enabling the geometry to make it a feature class (as explained in the previous post).
This seems like it should work – but I discovered another problem. The resulting calculated field that has the percentage of subway commuters per PUMA, SUB_PER, has no data type associated with it. Looking at the schema for the table in SpatiaLite shows that the data type is blank. If I bring this into QGIS, I’m not able to map this field as a numeric value, because QGIS doesn’t know what it is. I have to define the data type for this field. SpatiaLite (SQLite really) doesn’t allow you to re-define an existing field – we have to create and define a new blank field, and the set the value of our calculated field equal to it. Here are the SQL statements to make it all happen:
ALTER TABLE sub_commuters ADD SUB_PER REAL
UPDATE sub_commuters SET SUB_PER=(CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL))
CREATE TABLE pumas_nyc_subcom AS
SELECT * FROM sub_commuters, nyc_pumas
So, we add a new blank field to our data table and define it as real. Then we update our data table by seting that blank field equal to our expression, thus filling the field with the result of our expression. Once we have the defined calculated field, we can create a new table from the data plus the features based on the ID they share in common. Once the table is created, then we can activate the geometry (right click on geometry field in the feature class and activate – see previous post for details) so we can map it in QGIS. Phew!
The third method is to create the calculated field within QGIS, using the new field calculator. It’s pretty easy to do – you select the layer in the table of contents and go into an edit mode. Open the attribute table for the features and click the last button in the row of buttons underneath the table – this is the field calculator button. Once we’re in the field calculator window, we can choose to update an existing field or create a new field. We give the output field a name and a data type, enter our expression SUBWAY / WORKERS_16, hit OK, and we have our new field. Save the edits and we should be good to go. HOWEVER – I wasn’t able to add a calculated fields to features in a SpatiaLite geodatabase without getting errors. I posted to the QGIS forum – initially it was thought that the SpatiaLite driver was read only, but it turns out that’s not the case and so and the developers are investigating a possible bug. The investigation continues – stay tuned. I have tried the field calculator with shapefiles and it works perfectly (incidentally, you can export SpatiaLite features out of the database as shapefiles).
I’m providing the database I created here for download, if anyone wants to experiment.