Posts Tagged ‘new york city’

NYC Geodatabase in Spatialite

Wednesday, February 6th, 2013

I spent much of the fall semester and winter interim compiling and creating the NYC geodatabase (nyc_gdb), a desktop geodatabase resource for doing basic mapping and analysis at a neighborhood level – PUMAs, ZIP Codes / ZCTAs, and census tracts. There were several motivations for doing this. First and foremost, as someone who is constantly introducing new people to GIS it’s a pain sending people to a half dozen different websites to download shapefiles and process basic features and data before actually doing a project. By creating this resource I hoped to lower the hurdles a bit for newcomers; eventually they still need to learn about the original sources and data processing, but this gives them a chance to experiment and see the possibilities of GIS before getting into nitty gritty details.

Second, for people who are already familiar with GIS and who have various projects to work on (like me) this saves a lot of duplicated effort, as the db provides a foundation to build on and saves the trouble of starting from scratch each time.

Third, it gave me something new to learn and will allow me to build a second part to my open source GIS workshops. I finally sat down and hammered away with Spatialite (went through the Spatialite Cookbook from start to finish) and learned spatial SQL, so I could offer a resource that’s open source and will compliment my QGIS workshop. I was familiar with the Access personal geodatabases in ArcGIS, but for the most part these serve as simple containers. With the ability to run all the spatial SQL operations, Spatialite expands QGIS functionality, which was something I was really looking for.

My original hope was to create a server-based PostGIS database, but at this point I’m not set up to do that on my campus. I figured Spatialite was a good alternative – the basic operations and spatial SQL commands are relatively the same, and I figured I could eventually scale up to PostGIS when the time comes.

I also created an identical, MS Access version of the database for ArcGIS users. Once I got my features in Spatialite I exported them all out as shapefiles and imported them all via ArcCatalog – not too arduous as I don’t have a ton of features. I used the SQLite ODBC driver to import all of my data tables from SQLite into Access – that went flawlessly and was a real time saver; it just took a little bit of time to figure out how to set up (but this blog post helped).

The databases are focused on NYC features and resources, since that’s what my user base is primarily interested in. I purposefully used the Census TIGER files as the base, so that if people wanted to expand the features to the broader region they easily could. I spent a good deal of time creating generalized layers, so that users would have the primary water / coastline and large parks and wildlife areas as reference features for thematic maps, without having every single pond and patch of grass to clutter things up. I took several features (schools, subway stations, etc) from the City and the MTA that were stored in tables and converted them to point features so they’re readily useable.

Given that focus, it’s primarily of interest to NYC folks, but I figured it may be useful for others who wish to experiment with Spatialite. I assumed that most people who would be interested in the database would not be familiar with this format, so I wrote a tutorial that covers the database and it’s features, how to add and map data in QGIS, how to work with the data and do SQL / spatial SQL in the Spatialite GUI, and how to map data in ArcGIS using the Access Geodb. It’s Creative Commons, Attribution, Non-Commercial, Share-alike, so feel free to give it a try.

I spent a good amount of time building a process rather than just a product, so I’ll be able to update the db twice a year, as city features (schools, libraries, hospitals, transit) change and new census data (American Community Survey, ZIP Business Patterns) is released. Many of the Census features, as well as the 2010 Census data, will be static until 2020.

GIS Workshops This Apr & May

Sunday, March 25th, 2012

This semester I’ll be teaching three workshops with Prof. Deborah Balk in spatial tools and analysis. Sponsored by the CUNY Institute of Demographic Research (CIDR), the workshops will be held on Baruch College’s campus in midtown NYC on Friday afternoons. The course is primarily intended for data and policy analysts who want to gain familiarity with the basics of map making and spatial analysis; registration is open to anyone. The workshops progress from basic to intermediate skills that cover making a map (Apr 27th), geospatial calculations (May 4th), and geospatial analysis (May 11th). We’ll be using QGIS and participants will work off of their own laptops; we’ll also be demonstrating some of the processes in ArcGIS and participants will receive an evaluation copy of that software. Each workshop is $300 or you can register for all three for $750.

For full details check out this flier. You can register via the College’s CAPS website; do a search for DEM and register for each session (DEM0003, DEM0004, and DEM0005).

ZIP Code KML Map for NYC Census Data

Saturday, September 10th, 2011

With the release of both the 2010 Census profiles for ZCTAs (ZIP Code Tabulation Areas) and the TIGER line files for 2010 Census geographies, I created another Google Map finding aid for NYC neighborhood data by ZIP code (I previously created one for PUMAs with American Community Survey data). Once again I used the Export to KML plugin that was created for ArcGIS. This allowed me to use the TIGER shapefile in ArcGIS to create the map I wanted and then export it as a KML, while using fields in the attribute table of each feature to insert the ZCTA number into stable links for the census profiles, automatically generating unique urls for each feature. Click on the ZCTA in the map, and then click on a link to open a profile directly from the new American Factfinder.

There were two new obstacles I had to contend with this time. The first was that my department has finally migrated to Windows 7 from Windows XP, and I upgraded from ArcGIS 9.3 to 10. I had to reinstall the Export to KML plugin (version 2.5.5) and ran into trouble; fortunately all the work-arounds were included in the plugin’s documentation. I don’t have administrator rights on my machine, so I had to have someone install the plugin as an administrator; this included running the initial setup file AND running Arc as an administrator as you add and turn the plugin on. That was straightforward, but when I ran it the first time I got an error message – there’s a particular Windows dll or ocx file that the plugin needs and it was missing (presumably something that was included in XP but not in 7). I downloaded the necessary file, and with administrator rights moved it into the system32 folder and registered the file via the command line. After that I was good to go.

The second issue was with the Census Bureau’s new American Factfinder. With the old Factfinder the urls that were generated as you built and accessed tables were static and you could simply save and bookmark them. Not the case in the new Factfinder; you can bookmark some basic tables but most of them are “too complex to bookmark”; you can save and download queries from the online ap but that’s it. After some digging I found a CB document that tells you how you can create deep links to any query you run and table you create. The url consists of a fixed series of codes that identify the dataset, year, table, and geography. So this link:

http://factfinder2.census.gov/bkmk/table/1.0/en/DEC/10_DP/DPDP1/8600000US10010

Tells us that were getting a table from version 1.0 of the American Factfinder in English. It’s from the Decennial Census, 2010 Demographic Profiles, Demographic Profile Table 1, for ZCTA 10010 (860 is the summary level code that indicates we’re looking at ZCTAs). So for the plugin to create the links, I just included this URL but for the last five digits I specified the attribute from the ZCTA shapefile that held the ZCTA code. So when the plugin creates the KML, each KML feature has a link generated that is specific to it:

http://factfinder2.census.gov/bkmk/table/1.0/en/DEC/10_DP/DPDP1/8600000US[ZCTA5CE10]

You can see this previous post for details on how the Export to KML plugin works.

For now, the 2010 and 2000 Census are in the new American Factfinder. The American Community Survey, the Economic Census, population estimates, and a few other datasets are still in the older, legacy Factfinder. According to the CB all of this data will be migrated to the new Factfinder by the end of 2011 and the legacy version will disappear. At that point I’ll have to update my PUMA map so that it points to the profiles in the new Factfinder.

You can take a look at the ZCTA map and profiles below (I’m hosting it on the NYC data resource guide I’ve created for my college). As I’ve written before, ZCTAs are odd Census geographies since they are approximations of residential USPS ZIP Codes created by aggregating census blocks based on addresses; you can see in many instances where boundaries have a blocky teeth-like appearance instead of straight lines. Since they’re created directly by aggregating blocks, ZCTAs don’t correspond or mesh with other census boundaries like tracts or PUMAs, or even legal boundaries like counties. In some cases my assignment of county-based colors doesn’t ring true. For example, ZCTA 11370 includes part of the East Elmhurst neighborhood in Queens and Rikers Island, which is in the Bronx. ZCTA 10463 includes the Bronx neighborhoods of Kingsbridge and Spuyten Duyvil and the Manhattan neighborhood of Marble Hill (a geographic anomaly; it’s not on the Island of Manhattan but it’s part of Manhattan borough).

The most salient issue with ZCTAs is that they are only tabulated for the decennial census and not the American Community Survey; the currency of data and spectrum of census variables will be limited compared to other types of geography.


View Larger Map

Relating ZIP Codes / ZCTAs to PUMAs

Saturday, March 19th, 2011

Ever since I created the Google Maps finding aid for census data for NYC PUMAs and the associated PUMA – NYC neighborhood names maps, I’ve received several requests for tables or maps that relate PUMAs to ZIP Codes. These are usually from non-profits in NYC who have lists of donors, members, or constituents with addresses, and they want to relate the addresses (using the ZIP) to recent demographic data from American Community Survey (ACS) for the broader neighborhood where the ZIP is located.

The problem is that ZIP Codes are an all around pain. They actually don’t exist as areas with distinct boundaries; ZIP Codes are all address based, with ZIPs tied to addresses along street segments. The USPS doesn’t publish these tables or create maps; they contract this out for private companies to do, who turn around and sell these products for hefty fees.

Fortunately the Census Bureau has used these address tables to create approximations of ZIP Codes that they call ZCTAs or ZIP Code Tabulation Areas. ZCTAs are aggregates of census blocks that attempt to mimic ZIP Codes that exist as areas; codes associated with specific single-point firms or organization are dropped. Since ZIPS were created by the USPS, ZCTAs do not nest or mesh with any census geography; they cross PUMA, county, and in some cases even state boundaries. They are also less stable than census geography, with frequent changes, and as statistical areas they vary widely in area and population. For this reason ZCTA data is only published every ten years in the decennial census; it’s not included in the ACS (so far).

With these caveats in mind, I used the Missouri Census Data Center’s MABLE/GEOCORR engine to correlate ZCTAs with PUMAs. While the interface looks a little retro and daunting, it’s actually pretty simple. You choose the state, the two geographies you want to relate, the weighting method for allocating one to the other, and an output format that includes CSV or HTML. I also used an option that lets you type in FIPS codes for the counties you want, so I didn’t end up with the entire state.

This method was the way to go, as they give you the option to allocate geographies based on population and not simply land area; each ZCTA was allocated to PUMAs based on where the majority of the ZCTA’s population lived using 2000 census block data. The final output contains one row for each ZCTA to PUMA combination. So you had multiple rows for ZCTAs that weren’t contained within a single PUMA, and for each of those ZCTAs you had fields that showed the percentage of the ZCTA’s population that lived in each PUMA (along with the actual population number) as well as the percentage of the PUMA’s population that lived in that ZCTA.

I took that table and cleaned it up in a spreadsheet, so that I was left with one row for each ZCTA, where the ZCTA was allocated to one PUMA based on where the majority of it’s population lives. I used some ZCTA and PUMA boundaries that I had originally downloaded and subsequently cleaned up from the 2009 TIGER shapefiles page, added them to QGIS, joined the ZCTA allocation table to the ZCTA geography, and mapped the result. I color-coded ZCTAs so that clusters of ZCTAs within a particular PUMA had the same color. Then I overlaid the PUMA boundaries on top to see how well they corresponded.

In the end, they didn’t correspond all that well. There was a fairly good relationship in Manhattan, ok relationship in Queens and Staten Island, and a rather lousy relationship in the Bronx and Brooklyn. I overlaid greenspace and facilities (airports, shipyards, etc) boundaries I had, and that made some difference; you could see in some areas where ZCTAs overlapped two PUMAs that the overlap coincided with parks, cemeteries, or other areas with low or no residential population in one of the PUMAs.

I’ve posted both sets of tables, maps, and some instructions on the NYC neighborhoods resource page. You can use the original MABLE / GEOCORR table to judge where allocations were good and were they were not so good based on population. For now, the engine is still based on 2000 Census geography and data. Even though the Census has started releasing 2010 TIGER files based on 2010 Census geography, ZCTAs and PUMAs are often some of the last geographies to be updated; current releases of the ACS are still based on the 2000 geographies. Stay tuned to the Census Bureau and MCDC websites for news on updates, and keep the MABLE / GEOCORR in mind if you want to create lists to relate census geographies by population or land area.

NYC Subway and Transit GIS Layers

Saturday, July 24th, 2010

I’ve started outlining a one-day, introductory GIS practicum / workshop that I hope to offer in the coming academic year. One of the primary examples I want to use in the workshop is site selection for a retail store, and I thought it would be great to use a subway layer as part of the exercise. But alas, I searched high and low for a layer late last year (for a site selection project) and couldn’t find a publicly available one. I had purchased some proprietary layers, but really don’t want to use them for this workshop because I want to be able to freely distribute all of the materials to anyone; the layer I purchased is also outdated now because the MTA cut many services (including two subway lines) last month.

But thanks to Steve Romalewski at the CUNY Mapping Service, there’s now an alternative! Steve’s work is a HUGE contribution to the GIS community in New York and fills a glaring hole in the city’s collection of freely available GIS data. The MTA does host a data feed service (based on the General Transit Feed Specification created by Google) where it provides the geography of all its transit services, among other things. Steve downloaded and processed this raw data and turned it into shapefiles. He quickly discovered that it required a fair amount of scrubbing to be usable, and he’s cleaned it up and documented the entire process in great detail in several posts on his blog (Spatiality). Links to download individual shapefiles are available at the bottom of each post, following his discussion of issues and methodology for each set of layers. The CUNY Center for Urban Research has created an index page with each post, which you can access here.

In addition, he’s created a lyr file for the subway lines in order to symbolize them correctly by color and a separate mxd file for labels. While the shapefiles represent where the lines are, there are some problems representing them as they appear cartographically on the MTA’s subway maps. Many lines, including some with different colors, share the same trunk line. For example the A and C trains (blue lines) share the same trunk with the B and D trains (orange lines) along 8th Ave from 59th St to 145th St. Depending on how you sort your symbol categories, you’ll only see one color (and line) depending on which one you have on top. Steve points out two ways for solving this issue – you can edit the geography and offset one of the lines, which is tedious and creates problems as you change scale (he has some great screen shots that depict this). If you’re using ArcGIS, he shows off some cartographic tools that you can use to offest lines by prioritizing values in the attribute table. This is more ideal, as it gives the illusion that the lines are side by side cartographically while keeping the geometry of the shapefile intact.

So if you’re using ArcGIS you’ll be good to go. I’ve downloaded the files to play around with, but as I’m at home and using QGIS I had some more work to do, since lyr and mxd files are proprietary ESRI formats that the open source packages can’t handle. I’ve assigned the appropriate colors to each subway line and saved them a QGIS style file (.qml), which you can import in the symbology window to quickly and easily get the right colors (which I plucked from the MTA’s website). I’ve also saved the RGB and hex values for each line in a text file, if you’re using some other GIS software and need to input them manually. As far as I know there isn’t an easy way to circumvent the shared-line subway problem if you’re using QGIS (see screenshot below), so you’d have your work cut out for you if you want to faithfully represent the lines the way they appear on the MTA maps. But if you’re using the layers for analysis (which is what I’ll be doing) or you don’t need to emulate “the” subway map in exact detail, it shouldn’t matter.

NYC subway layers from CUNY Mapping Service in QGIS

NYC subway layers from CUNY Mapping Service in QGIS

Footnote – for anyone who is interested, the proprietary data that I purchase for the college is from a company called Halcrow. The entire NYC transportation package costs $465. It includes NYC subways and buses (lines and stations for each, along with ridership statistics from 2008 and a historical bus stops layer from 1998), LIRR and Metro North (lines and stations), but also includes the PATH train, freight lines, and truck routes.

Google Maps to Create a Census Finding Aid

Thursday, May 13th, 2010

Yikes! It’s been quite awhile since my last post (the past couple months have been a little tough for me), but I just finished an interesting project that I can share.

I constantly get questions from students who are interested in getting recent demographic and socio-economic profiles for neighborhoods in New York City. The problem is that neighborhoods are not officially defined, so we have to look for a surrogate. The City has created neighborhood-like areas out of census tracts called community districts and they publish profiles for them, but this data is from the decennial census  and not current enough for their needs.  ZIP code data is also only available from the decennial census.

We can use PUMAs (Public Use Microdata Areas) to approximate neighborhoods in large cities, and they are published as part of the 3 year estimates of the American Community Survey. The problem is, in order to look up the data from the census you need to search by PUMA number – there are no qualitative place names. The city and the census have worked together to assign names to neighborhoods as part of the NYC Housing and Vacancy Survey, but this is the only place (I’ve found) that uses these names. You need to look in several places to figure out what the PUMA number and boundaries for an area are and then navigate through the census site to find it. Too much for the average student who visits me at the reference desk or emails me looking for data.

My solution was to create a finding aid in Google maps that tied everything together:

View Larger Map

I downloaded PUMA boundaries from the Census TIGER file site in a shapefile format. I opened them up in ArcGIS and used an excellent script that I downloaded called Export to KML. ArcGIS 9.3 does support KML exports via the toolbox, and there are a number of other scripts and stand-alone programs that can do this (I tried several) but Export to KML was best (assuming you have access to ArcGIS) in terms of the level of customization and the thoroughness of the user documentation. I symbolized the PUMAs in ArcGIS using the colors and line thickness that I wanted and fired up the tool. It allows you to automatically group and color features based on the layer’s symbology. I was able to add a “snippet” to each feature to help identify it (I used the PUMA number as the attribute name and the neighborhood name as my snippet, so both appear in the legend) and added a description that would appear in the pop up window when that feature is clicked. In that description, I added the URL from the ACS census profile page for a particular PUMA – the cool part here is that the URL is consistent and contains the PUMA number. So, I replaced the specific number and inserted the [field] name from the PUMAs attribute table that contained the number. When I did the export, the URLs for each individual feature were created with their PUMA number inserted into the link.

There were a few quirks – I discovered that you can’t automatically display labels on a Google Map without subterfuge, like creating the labels as images and not text. Google Earth (but not Maps) supports labels if you create multi-geometry where you have a point for a label and a polygon for the feature. If you select a labeling attribute on the initial options screen of the Export to KML tool, you create an icon in the middle of each polygon that has a different description pop-up (which I didn’t want so I left it to none and lived without labels). I made my features 75% transparent (a handy feature of Export to KML) so that you could see the underlying Google Map features through the PUMA, but this made the fill AND the lines transparent, making the features too difficult to see. After the export I opened the KML in a text editor and changed the color values for the lines / boundaries by hand, which was easy since the styles are saved by feature group (boroughs) and not by individual feature (pumas). I also manually changed the value of the folder open element (from 0 to 1) so that the feature and feature groups (pumas and boroughs) are expanded by default when someone opens the map.

After making the manual edits, I uploaded the KML to my webserver and pasted the url for it into the Google Maps search box, which overlayed my KML on the map. Then I was able to get a persistent link to the map and code for embedding it into websites via the Google Map Interface. No need to add it to Google My Maps, as I have my own space. One big quirk – it’s difficult to make changes to an existing KML once you’ve uploaded and displayed it. After I uploaded what I thought would be my final version I noticed a typo. So I fixed it locally, uploaded the KML and overwrote the old one. But – the changes I made didn’t appear. I tried reloading and clearing the cache in my browser, but no good – once the KML is uploaded and Google caches it, you won’t see any of your changes until Google re-caches. The conventional wisdom is to change the name of the file every single time – which is pretty dumb as you’ll never be able to have a persistent link to anything. There are ways to circumvent the problem, or you can just wait it out. I waited one day and by the next the file was updated; good enough for me, as I’ll only need to update it once a year.

I’m hosting the map, along with some static PDF maps and a spreadsheet of PUMA names and neighborhood numbers, from the NYC Data LibGuide I created (part of my college’s collection of research guides). If you’re looking for neighborhood names to associate with PUMA numbers for your city, you’ll have to hunt around and see if a local planning agency or non-profit has created them for a project or research study (as the Census Bureau does not create them). For example, the County of Los Angeles Department of Mental Health uses pumas in a large study they did where they associated local place names with each puma.

If you’re interested in dabbling in some KML, there’s Google’s KML tutorial. I’d also recommend The KML Handbook by Josie Wernecke. The catch for any guide to KML is that while all KML elements are supported by Google Earth, there’s only partial support for Google Maps.

Mapping Hard to Count Areas for Census 2010

Tuesday, February 23rd, 2010

There was an interesting article in the New York Times today about neighborhoods in New York that typically get under-counted in the Census. These include areas with high immigrant populations as well as places that have had new construction since the last census, as the buildings haven’t been added to the Census Bureau’s master address file.

What the article didn’t mention is that CUNY’s Center for Urban Research has created a great online ap called the Census 2010 Hard to Count mapping site. The site is built on the Census Bureau’s Tract Level Planning Database, which identified twelve population and housing variables, such as language isolation, recent movers, poverty, and crowded housing, that were associated with low mail response in the 2000 Census. This tool was designed to help Census reps, local government officials, and community activists identify traditionally under-counted areas to insure a more complete count this time around.

The database is national in scope, and you can easily map tracts for a particular state, county, city, metro area, or tribal area, and you can search for an area using an individual address. The map is built on a Google Maps interface, and zooming in will change the units mapped from larger units (states, counties, etc) to tracts. You can easily select one of the twelve variables color-coded in the menu to the left of the map, or a Hard to Count index of all the variables.

Calculated Fields in SpatiaLite / SQLite

Wednesday, February 3rd, 2010

After downloading data, it’s pretty common that you’ll want to create calculated fields, such as percent totals or change, to use for analysis and mapping. The next step in my QGIS / SpatiaLite experiment was to create a calculated field (aka derived field). I’ll run through three ways of accomplishing this, using my subway commuter data to calculate the percentage of workers in each NYC PUMA who commute to work. Just to keep everything straight:

  • sub_commuters is a census data table for all PUMAs in NY State
    • [SUBWAY] field that has the labor force that commutes by subway
    • [WORKERS_16] field with the total labor force
    • [SUB_PER] a calculated field with the % of labor force that commutes by subway
    • [GEO_ID2] the primary key field, FIPS code that is the unqiue identifier
  • nyc_pumas is a feature class with all PUMAs in NYC
    • [PUMA5ID00] is the primary key field, FIPS code that is the unqiue identifier
  • pumas_nyc_subcom is the data table that results from joining sub_commuters and nyc_pumas; it can be converted to a feature class for mapping

Spreadsheet

The first method would be to add the calculated field to the data after downloading it from the census in a spreadsheet, as part of the cleaning / preparation stage. You could then save it as a delimited text file for import to SpatiaLite. No magic there, so I’ll skip to the second method.

SpatiaLite

The second method would be to create the calculated field in the SpatiaLite database. I’ll go through the steps I used to figure this out. The basic SQL select query:

SELECT *, (SUBWAY / WORKERS_16) AS SUB_PER FROM sub_commuters

This gives us the proper result, but there are two problems. First, the data in my SUBWAY and WORKERS_16 field are stored as integers, and when you divide the result is rounded to the nearest whole number. Not very helpful here, as my percentage results get rounded to 0 or 1. There are many ways to work around this: set the numeric fields as double, real, or float in the spreadsheet before import (didn’t work for me), specify the field types when importing (didn’t get that option with the SpatiaLite GUI, but maybe you can with the command line), add * 100 to the expression to multiply the percentage to a whole number (ok unless you need decimals in your result) or use the CAST operator. CAST converts the current data type of a field to a specified data type in the result of the expression. So:

SELECT *, (CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL)) AS SUB_PER FROM sub_commuters

This gave me the percentages with several decimal places (since we’re casting the fields as real instead of integer), which is what I needed. The second problem is that this query just produces a temporary view; in order to map this data, we need to create a new table to make the calculated field permanent and join it to a feature class. Here’s how we do that:

CREATE TABLE pumas_nyc_subcom AS
SELECT *, (CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL)) AS SUB_PER
FROM sub_commuters, nyc_pumas
WHERE nyc_pumas.PUMA5ID00=sub_commuters.geo_id2

The CREATE TABLE AS statement let’s us create a new table from the existing two tables – the data table of subway commuters and the feature class table for NYC PUMAs. We select all the fields in both while throwing in the new calculated field, and we join the data table to the feature class all in one step, and via the join we end up with just data from NYC (the data for the rest of the state gets dropped). After that, it’s just a matter of taking our new table and enabling the geometry to make it a feature class (as explained in the previous post).

This seems like it should work – but I discovered another problem. The resulting calculated field that has the percentage of subway commuters per PUMA, SUB_PER, has no data type associated with it. Looking at the schema for the table in SpatiaLite shows that the data type is blank. If I bring this into QGIS, I’m not able to map this field as a numeric value, because QGIS doesn’t know what it is. I have to define the data type for this field. SpatiaLite (SQLite really) doesn’t allow you to re-define an existing field – we have to create and define a new blank field, and the set the value of our calculated field equal to it. Here are the SQL statements to make it all happen:

ALTER TABLE sub_commuters ADD SUB_PER REAL

UPDATE sub_commuters SET SUB_PER=(CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL))

CREATE TABLE pumas_nyc_subcom AS
SELECT * FROM sub_commuters, nyc_pumas
WHERE nyc_pumas.PUMA5ID00=sub_commuters.geo_id2

So, we add a new blank field to our data table and define it as real. Then we update our data table by seting that blank field equal to our expression, thus filling the field with the result of our expression. Once we have the defined calculated field, we can create a new table from the data plus the features based on the ID they share in common. Once the table is created, then we can activate the geometry (right click on geometry field in the feature class and activate – see previous post for details) so we can map it in QGIS. Phew!

QGIS

The third method is to create the calculated field within QGIS, using the new field calculator. It’s pretty easy to do – you select the layer in the table of contents and go into an edit mode. Open the attribute table for the features and click the last button in the row of buttons underneath the table – this is the field calculator button. Once we’re in the field calculator window, we can choose to update an existing field or create a new field. We give the output field a name and a data type, enter our expression SUBWAY / WORKERS_16, hit OK, and we have our new field. Save the edits and we should be good to go. HOWEVER – I wasn’t able to add a calculated fields to features in a SpatiaLite geodatabase without getting errors. I posted to the QGIS forum – initially it was thought that the SpatiaLite driver was read only, but it turns out that’s not the case and so and the developers are investigating a possible bug. The investigation continues – stay tuned. I have tried the field calculator with shapefiles and it works perfectly (incidentally, you can export SpatiaLite features out of the database as shapefiles).

I’m providing the database I created here for download, if anyone wants to experiment.

Creating a New Shapefile in ArcGIS: Part II

Friday, May 15th, 2009

In my previous post I gave an overview of how to create a shapefile from scratch, where we created a point layer to identify places and neighborhoods in NYC. In this post, I’ll pick up where we left off.

Whenever you create new features in a shapefile, ArcGIS automatically adds a couple of fields, including an auto-number ID field that uniquely identifies each feature. This was sufficient for our example as the 291 place names we were working with do not have a standard ID number that represents them. If we were creating features that did have a recognized ID number or code, we certainly would want to add an additional field to hold that number. This would allow us to share and relate our data to other datasets that use that conventional ID. For example, if we had a layer with the 50 states, we would want to have a FIPS number or the two digit postal code for each state in the attribute table, so we could relate our states feature to the zillions of other state-based data tables out there that also use these codes.

It’s also helpful to add other identifiers to relate our place names to some larger geographic area. Why? Let’s say we want to filter our neighborhoods by borough – perhaps we just want to label neighborhoods in Manhattan or calculate distances only between places that appear in the Bronx. It would be useful to have a borough code or some other code associated with each of our place names for running queries.

scrnshot6As it turns out, the City of New York does use a standardized system of three digit codes to identify all boroughs and community districts in the city. In our example, the code for Manhattan Community District 12, which contains Inwood and Washington Heights, is 112. The first digit identifies the borugh and the second two digits identify the district. It would be a good idea to assign each of our neighborhoods this district code, so we could filter our features by either borough or district.

When we create each feature, we could manually type in the code in it’s own field just like we added the neighborhood names, but that would be rather tedious – and unnecessary. A better choice would be to do a spatial join. Whereas a “regular” join allows us to join attribute tables based on a common ID field, a spatial join allows us to assign attributes to one layer based on their geographical relationship to another layer.

scrnshot7In the Table of Contents, right click on the neighborhoods layer and choose Joins and Relates – Joins. We’ll get the familiar Join dialog box. However, if you hit the first drop down box that says Join Attributes From a Table and choose Join Data Based on Spatial Location, we’ll get the options for doing a spatial join. Choose the community districts as the layer to join to the neighborhoods, and since we’re joining points to polygons we’ll choose parameters that are relevant for relating these two features. In this case, give each point (neighborhood / place) the attributes of the polygons (districts) that it falls inside. ArcGIS will create a new point layer with the joined fields when you hit OK. Open the attribute table of the new point layer, and you’ll see the additional fields, including the community district numbers. You’ll also get some rather useless fields from the district layer, like the length and area of each district, which you can safely delete.

So instead of tediously entering these numbers by hand for each neighborhood, we simply run the spatial join process once (after we’ve finished adding the points for all 291 neighborhoods) and the IDs are automatically added.

Creating a New Shapefile in ArcGIS: Part I

Thursday, May 14th, 2009

I’m working with a grad student who needs to create a new shapefile from scratch, and thought I’d turn the instructions for doing this in ArcGIS into a tutorial / post for creating new point layers. The idea in this example is to create a point layer that shows the relative center of 291 neighborhoods in New York City. Since many of these neighborhoods are place names without finite boundaries, we’ll have to use various sources (NYC Planning map and Rand McNally street maps) to pinpoint the relative center of each neighborhood.

These points will be used for labeling each neighborhood. In this case, creating a new, georeferenced layer is preferable to creating 291 text labels on a map that are not tied to geography in any way.

  • The first step is to download some layers from the NYC Department of Planning to use for reference, such as a layer for boroughs and community districts. Community districts are used by the city to approximate neighborhoods. Many of the neighborhoods that we are trying to plot are, in many cases, smaller areas or places within these boundaries.
  • scrnshot1Next, open ArcCatalog and create a folder to store the data. Then, right click on the folder in the table of contents and select New – Shapefile. In the Create New Shapefile window, we give the shapefile a name, select Point as the feature type, and hit Edit to change the
    coordinate system. In the Spatial Reference Properties menu, we’ll import a coordinate system from one of the files we downloaded from NYC Planning, which uses New York State Plane for Long Island. Click OK and OK again, and we’ll have a new shapefile.
  • scrnshot2Right now, our new shapefile isn’t very exciting because it’s empty – you can preview it in the catalog to see for yourself. If you preview the table, you’ll see that Arc created three fields – FID, Shape, and ID, which it will automatically fill in when we start creating features. Before we do that, we’ll have to add an additional column to store the name of the neighborhood. To do that, open ArcMap and add the neighborhood layer to the map. Then, right click on the layer in the Table of Contents and open the attribute table. Hit the Options button and choose Add Field. In the Add Field menu, name the new field, choose Text as the type, and change the length to 80 (in case we have some neighborhoods with long names). Hit OK, and you’ll have a new field.
  • scrnshot3Let’s add our reference layers next. Hit the Add Data button (or File – Add Data), and add the borough boundaries and community districts (if you don’t see anything after you add them, right click on one of these layers and choose Zoom to Layer). Go into the symbology tab for each layer and change their display to make the areas appear more distinctive. Make sure your neighborhood layer is on top of your other layers.
  • Now it’s time to start plotting neighborhoods. Go to the Selection menu – Set selectable Layers, and turn off all the layers except the neighborhood layer. Then, use the dropdown on the Editor Toolbar and Select Start Editing (if you don’t see the Editor Toolbar, make sure it’s activated by going to View – Toolbars and select it). scrnshot4On the Editor Toolbar, make sure the Create New Feature task is activated and that the target layer is the neighborhood layer, and not any of the reference layers. Zoom in to the top of Manhattan. With the Pencil tool selected in the toolbar, and using your sources (NYC planning map, Rand McNally street map, whatever), click on the map to approximate where the center of the Inwood neighborhood would be. A blue dot should appear on the map. Then right-click on the neighborhoods layer in the Table of Contents and open the attributes table. You’ll see a brand new record for your new dot. Click in the empty field for Name, type in the name of the neighborhood, and press enter.
  • That’s the process! Next, locate the area for Washington Heights and click on the map to create the point for that neighborhood. The new dot will appear hi-lighted, while the previous dot for Inwood will now appear as a regular point symbol. Now it’s just a matter of plugging away. Make sure to occasionally save your edits by clicking Editor and choosing Save Edits. If you make a mistake, you can delete a feature by selecting the Select Feature tool in the regular tool bar (white arrow with a blue and white feature box next to it), select the particular point, and hit the delete key. If you’re having trouble pinpointing the right location for the neighborhood, try downloading additional reference layers to guide you. The NYC DOITT also has a page with GIS layers for the city with features like parks and streets that may be helpful. When you’re finished editing, choose Stop Editing under the Editor Toolbar.

    scrnshot5

  • The ultimate goal of this exercise was to get neighborhood labels to appear without the actual point. To accomplish this, change the point symbol for the neighborhood to nothing by going into the Symbology tab for the layer and reducing the fill to no color, the outline to nothing, and the size to zero. Then open the Labels tab under the Properties menu, turn labels on using the name field as the label field, select Placement Properties and choose the setting to place the labels on top of the point, hit ok, and voila! Perfectly centered neighborhood names that are part of a georeferenced layer.

This covers the basics. In the next post, I’ll go a little further and discuss adding additional fields to the new file, without having to type them in manually.


Copyright © 2013 Gothos. All Rights Reserved.
No computers were harmed in the 0.483 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.