This semester I’ll be teaching three workshops with Prof. Deborah Balk in spatial tools and analysis. Sponsored by the CUNY Institute of Demographic Research (CIDR), the workshops will be held on Baruch College’s campus in midtown NYC on Friday afternoons. The course is primarily intended for data and policy analysts who want to gain familiarity with the basics of map making and spatial analysis; registration is open to anyone. The workshops progress from basic to intermediate skills that cover making a map (Apr 27th), geospatial calculations (May 4th), and geospatial analysis (May 11th). We’ll be using QGIS and participants will work off of their own laptops; we’ll also be demonstrating some of the processes in ArcGIS and participants will receive an evaluation copy of that software. Each workshop is $300 or you can register for all three for $750.
Posts Tagged ‘ArcGIS’
I was helping someone with a project recently that I thought would be straightforward but turned out to be rather complex. We had a list of about 10,000 addresses that had to be plotted as coordinates, and then we needed to create Thiessen or Voroni polygons for each point to create market areas. Lastly we needed to generate an adjacency table or list of neighbors; for every polygon list all the neighboring polygons.
For step one I turned to the USC Geocoding service to geocode the addresses; I became a partner a ways back so I could batch geocode datasets for students and faculty on my campus. Once I had coordinates I plotted them in ArcGIS 10 (and learned that the Add XY data feature had been moved to File > Add Data > Add XY Data). Step 2 seemed easy enough; in Arc you go to ArcToolbox > Analysis Tools > Proximity > Create Thiessen Polygons. This creates a polygon for each point and assigns the attributes of each point to the polygon.
I hit a snag with Step 3 – Arc didn’t have a tool for generating the adjacency table. After a thorough search of the ESRI and Stack Exchange forums, I stumbled on the Find Adjacent Features Script by Ken Buja which did exactly what I wanted in ArcGIS 9.2 and 9.3, but not in 10. I had used this script before on a previous project, but I’ve since upgraded and can’t go back. So I searched some more until I found the Find Adjacent & Neighboring Polygons Tool by cmaene. I was able to add this custom toolbox directly to ArcToolbox, and it did exactly what I wanted in ArcGIS 10. I get to select the unique identifying field, and for every ID I get a list of the IDs of the neighboring polygons in a text file (just like Ken’s tool). This tool also had the option of saving the list of neighbors for each feature directly in the attribute table of a shapefile (which is only OK for small files with few neighbors; fields longer than 254 characters get truncated), and it gave you the option of listing neighbors to the next degree (a list of all the neighbor’s neighbors).
Everything seemed to run fine, so I re-ran the tool on a second set of Thiessen polygons that I had clipped with an outline of the US to create something more geographically realistic (so polygons that share a boundary only in the ocean or across the Great Lakes are not considered neighbors).
THEN – TROUBLE. I took some samples of the output table and checked the neighbors of a few features visually in Arc. I discovered two problems. First, I was missing about a thousand records or so in the output. When I geocoded them I couldn’t get a street-level address match for every record; the worse case scenario was a plot to the ZCTA / ZIP code centroid for the address, which was an acceptable level of accuracy for this project. The problem is that if there are many point features plotted to the same coordinate (because they share the same ZIP), a polygon was created for one feature and the overlapping ones fell away (you can’t have overlapping Thiessen polygons). Fortunately this also wasn’t an issue for the person I was helping; we just needed to join the output table back to the master one to track which ones fell out and live with the result.
The bigger problem was the output was wrong. I discovered that the neighbor list for most of the features I checked, especially polygons that had borders on the outer edge of the space, had incomplete lists; each feature had several (and in some cases, all) neighbors missing. Instead of using a shapefile of Thiessen’s I tried running the tool on polygons that I generated as feature classes within an Arc geodatabase, and got the same output. For the heck of it I tried dissolving all the Thiessen’s into one big polygon, and when I did that I noticed that I had orphaned lines and small gaps in what should have been one big, solid rectangle. I tried checking the geometry of the polygons and there were tons of problems. This led me to conclude that Arc did a lousy job when constructing the topology of the polygons, and the neighbor tool was giving me bad output as a result.
Since I’ve been working more with GRASS, I remembered that GRASS vectors have strict topology rules, where features have shared boundaries (instead of redundant overlapping ones). So I imported my points layer from a shapefile into GRASS and then used the v.voroni tool to create the polygons. The geometry looked sound, the attributes of each point were assigned to a polygon, and for overlapping points one polygon was created and attributes of the shared points were dumped. I exported the polygons out as a shapefile and brought them back into Arc, ran the Find Adjacent & Neighboring Polygons tool, spot checked the neighbors of some features, and voila! The output was good. I clipped these polygons with my US outline, ran the tool again, and everything checked out.
Morals of this story? When geocoding addresses consider how the accuracy of the results will impact your project. If a tool or feature doesn’t exist assume that someone else has encountered the same problem and search for solutions. Never blindly accept output; take a sample and do manual checks. If one tool or piece of software doesn’t work, try exporting your data out to something else that will. Open source software and Creative Commons tools can save the day!
Footnote – apparently it’s possible to create lists of adjacent polygons in GRASS using the sides option in v.to.db, although it isn’t clear to me how this is accomplished; the documentation talks about categories of areas on the right and left of a boundary, but not on all sides of an area. Since I already had a working solution I didn’t investigate further.
I’ve started outlining a one-day, introductory GIS practicum / workshop that I hope to offer in the coming academic year. One of the primary examples I want to use in the workshop is site selection for a retail store, and I thought it would be great to use a subway layer as part of the exercise. But alas, I searched high and low for a layer late last year (for a site selection project) and couldn’t find a publicly available one. I had purchased some proprietary layers, but really don’t want to use them for this workshop because I want to be able to freely distribute all of the materials to anyone; the layer I purchased is also outdated now because the MTA cut many services (including two subway lines) last month.
But thanks to Steve Romalewski at the CUNY Mapping Service, there’s now an alternative! Steve’s work is a HUGE contribution to the GIS community in New York and fills a glaring hole in the city’s collection of freely available GIS data. The MTA does host a data feed service (based on the General Transit Feed Specification created by Google) where it provides the geography of all its transit services, among other things. Steve downloaded and processed this raw data and turned it into shapefiles. He quickly discovered that it required a fair amount of scrubbing to be usable, and he’s cleaned it up and documented the entire process in great detail in several posts on his blog (Spatiality). Links to download individual shapefiles are available at the bottom of each post, following his discussion of issues and methodology for each set of layers. The CUNY Center for Urban Research has created an index page with each post, which you can access here.
In addition, he’s created a lyr file for the subway lines in order to symbolize them correctly by color and a separate mxd file for labels. While the shapefiles represent where the lines are, there are some problems representing them as they appear cartographically on the MTA’s subway maps. Many lines, including some with different colors, share the same trunk line. For example the A and C trains (blue lines) share the same trunk with the B and D trains (orange lines) along 8th Ave from 59th St to 145th St. Depending on how you sort your symbol categories, you’ll only see one color (and line) depending on which one you have on top. Steve points out two ways for solving this issue – you can edit the geography and offset one of the lines, which is tedious and creates problems as you change scale (he has some great screen shots that depict this). If you’re using ArcGIS, he shows off some cartographic tools that you can use to offest lines by prioritizing values in the attribute table. This is more ideal, as it gives the illusion that the lines are side by side cartographically while keeping the geometry of the shapefile intact.
So if you’re using ArcGIS you’ll be good to go. I’ve downloaded the files to play around with, but as I’m at home and using QGIS I had some more work to do, since lyr and mxd files are proprietary ESRI formats that the open source packages can’t handle. I’ve assigned the appropriate colors to each subway line and saved them a QGIS style file (.qml), which you can import in the symbology window to quickly and easily get the right colors (which I plucked from the MTA’s website). I’ve also saved the RGB and hex values for each line in a text file, if you’re using some other GIS software and need to input them manually. As far as I know there isn’t an easy way to circumvent the shared-line subway problem if you’re using QGIS (see screenshot below), so you’d have your work cut out for you if you want to faithfully represent the lines the way they appear on the MTA maps. But if you’re using the layers for analysis (which is what I’ll be doing) or you don’t need to emulate “the” subway map in exact detail, it shouldn’t matter.
Footnote – for anyone who is interested, the proprietary data that I purchase for the college is from a company called Halcrow. The entire NYC transportation package costs $465. It includes NYC subways and buses (lines and stations for each, along with ridership statistics from 2008 and a historical bus stops layer from 1998), LIRR and Metro North (lines and stations), but also includes the PATH train, freight lines, and truck routes.
In my previous post I gave an overview of how to create a shapefile from scratch, where we created a point layer to identify places and neighborhoods in NYC. In this post, I’ll pick up where we left off.
Whenever you create new features in a shapefile, ArcGIS automatically adds a couple of fields, including an auto-number ID field that uniquely identifies each feature. This was sufficient for our example as the 291 place names we were working with do not have a standard ID number that represents them. If we were creating features that did have a recognized ID number or code, we certainly would want to add an additional field to hold that number. This would allow us to share and relate our data to other datasets that use that conventional ID. For example, if we had a layer with the 50 states, we would want to have a FIPS number or the two digit postal code for each state in the attribute table, so we could relate our states feature to the zillions of other state-based data tables out there that also use these codes.
It’s also helpful to add other identifiers to relate our place names to some larger geographic area. Why? Let’s say we want to filter our neighborhoods by borough – perhaps we just want to label neighborhoods in Manhattan or calculate distances only between places that appear in the Bronx. It would be useful to have a borough code or some other code associated with each of our place names for running queries.
As it turns out, the City of New York does use a standardized system of three digit codes to identify all boroughs and community districts in the city. In our example, the code for Manhattan Community District 12, which contains Inwood and Washington Heights, is 112. The first digit identifies the borugh and the second two digits identify the district. It would be a good idea to assign each of our neighborhoods this district code, so we could filter our features by either borough or district.
When we create each feature, we could manually type in the code in it’s own field just like we added the neighborhood names, but that would be rather tedious – and unnecessary. A better choice would be to do a spatial join. Whereas a “regular” join allows us to join attribute tables based on a common ID field, a spatial join allows us to assign attributes to one layer based on their geographical relationship to another layer.
In the Table of Contents, right click on the neighborhoods layer and choose Joins and Relates – Joins. We’ll get the familiar Join dialog box. However, if you hit the first drop down box that says Join Attributes From a Table and choose Join Data Based on Spatial Location, we’ll get the options for doing a spatial join. Choose the community districts as the layer to join to the neighborhoods, and since we’re joining points to polygons we’ll choose parameters that are relevant for relating these two features. In this case, give each point (neighborhood / place) the attributes of the polygons (districts) that it falls inside. ArcGIS will create a new point layer with the joined fields when you hit OK. Open the attribute table of the new point layer, and you’ll see the additional fields, including the community district numbers. You’ll also get some rather useless fields from the district layer, like the length and area of each district, which you can safely delete.
So instead of tediously entering these numbers by hand for each neighborhood, we simply run the spatial join process once (after we’ve finished adding the points for all 291 neighborhoods) and the IDs are automatically added.
I’m working with a grad student who needs to create a new shapefile from scratch, and thought I’d turn the instructions for doing this in ArcGIS into a tutorial / post for creating new point layers. The idea in this example is to create a point layer that shows the relative center of 291 neighborhoods in New York City. Since many of these neighborhoods are place names without finite boundaries, we’ll have to use various sources (NYC Planning map and Rand McNally street maps) to pinpoint the relative center of each neighborhood.
These points will be used for labeling each neighborhood. In this case, creating a new, georeferenced layer is preferable to creating 291 text labels on a map that are not tied to geography in any way.
- The first step is to download some layers from the NYC Department of Planning to use for reference, such as a layer for boroughs and community districts. Community districts are used by the city to approximate neighborhoods. Many of the neighborhoods that we are trying to plot are, in many cases, smaller areas or places within these boundaries.
Next, open ArcCatalog and create a folder to store the data. Then, right click on the folder in the table of contents and select New – Shapefile. In the Create New Shapefile window, we give the shapefile a name, select Point as the feature type, and hit Edit to change the
coordinate system. In the Spatial Reference Properties menu, we’ll import a coordinate system from one of the files we downloaded from NYC Planning, which uses New York State Plane for Long Island. Click OK and OK again, and we’ll have a new shapefile.
- Right now, our new shapefile isn’t very exciting because it’s empty – you can preview it in the catalog to see for yourself. If you preview the table, you’ll see that Arc created three fields – FID, Shape, and ID, which it will automatically fill in when we start creating features. Before we do that, we’ll have to add an additional column to store the name of the neighborhood. To do that, open ArcMap and add the neighborhood layer to the map. Then, right click on the layer in the Table of Contents and open the attribute table. Hit the Options button and choose Add Field. In the Add Field menu, name the new field, choose Text as the type, and change the length to 80 (in case we have some neighborhoods with long names). Hit OK, and you’ll have a new field.
- Let’s add our reference layers next. Hit the Add Data button (or File – Add Data), and add the borough boundaries and community districts (if you don’t see anything after you add them, right click on one of these layers and choose Zoom to Layer). Go into the symbology tab for each layer and change their display to make the areas appear more distinctive. Make sure your neighborhood layer is on top of your other layers.
- Now it’s time to start plotting neighborhoods. Go to the Selection menu – Set selectable Layers, and turn off all the layers except the neighborhood layer. Then, use the dropdown on the Editor Toolbar and Select Start Editing (if you don’t see the Editor Toolbar, make sure it’s activated by going to View – Toolbars and select it). On the Editor Toolbar, make sure the Create New Feature task is activated and that the target layer is the neighborhood layer, and not any of the reference layers. Zoom in to the top of Manhattan. With the Pencil tool selected in the toolbar, and using your sources (NYC planning map, Rand McNally street map, whatever), click on the map to approximate where the center of the Inwood neighborhood would be. A blue dot should appear on the map. Then right-click on the neighborhoods layer in the Table of Contents and open the attributes table. You’ll see a brand new record for your new dot. Click in the empty field for Name, type in the name of the neighborhood, and press enter.
- That’s the process! Next, locate the area for Washington Heights and click on the map to create the point for that neighborhood. The new dot will appear hi-lighted, while the previous dot for Inwood will now appear as a regular point symbol. Now it’s just a matter of plugging away. Make sure to occasionally save your edits by clicking Editor and choosing Save Edits. If you make a mistake, you can delete a feature by selecting the Select Feature tool in the regular tool bar (white arrow with a blue and white feature box next to it), select the particular point, and hit the delete key. If you’re having trouble pinpointing the right location for the neighborhood, try downloading additional reference layers to guide you. The NYC DOITT also has a page with GIS layers for the city with features like parks and streets that may be helpful. When you’re finished editing, choose Stop Editing under the Editor Toolbar.
- The ultimate goal of this exercise was to get neighborhood labels to appear without the actual point. To accomplish this, change the point symbol for the neighborhood to nothing by going into the Symbology tab for the layer and reducing the fill to no color, the outline to nothing, and the size to zero. Then open the Labels tab under the Properties menu, turn labels on using the name field as the label field, select Placement Properties and choose the setting to place the labels on top of the point, hit ok, and voila! Perfectly centered neighborhood names that are part of a georeferenced layer.
This covers the basics. In the next post, I’ll go a little further and discuss adding additional fields to the new file, without having to type them in manually.
Last week I shared my adventures evaluating open source software. Why bother looking at alternatives to ArcGIS? There are significant barriers of entry to ArcGIS. Whenever I give an introductory GIS presentation to anyone, I inevitably have to answer the question of “How can I get access to this software?” Inevitably, the answer is you have to spend a lot of money, or if your institution already has a subscription, you need to go through a lengthy process to get access.
- Price. A single, stand-alone copy of ArcView costs $1500. Not only is that prohibitively expensive for me, it’s impossible for students. Which means that students who are taking a GIS class have to use the software in a computer lab on campus to complete assignments. This is not always convenient for many students, and is particularly problematic where I work since we are primarily a commuter campus.
- License limitations. If you’re running Arc through a central license server, PCs have to be connected to the server through a hardwired connection – no wireless. Our library has a laptop checkout program for students which would give students an alternative to using a computer lab. But not being able to install the software on a laptop eliminates this possibility. It also makes it a pain for me to give presentations, as I always have to make sure that the room I’ll be presenting in has the software. My short term solution is to use an eval copy on a laptop. You can purchases USB keys that have the license info on them, but if you work in a large, complex academic or government setting, getting one can be a challenge. And every year we have to go through the process of getting the license renewed.
- Installation and Bugs. As Arc users know, installation can be time consuming, particularly since you can’t have two versions of Arc installed concurrently – you have to uninstall one before installing the new one. And how many service packs have been issued for version 9.2? Six. IT people love it when they have to install fixes in a dozen labs / classrooms in the middle of a semester, particularly when they have to do it 5 or 6 times a year. In reality, we skip several service packs and live with the bugs.
- Forced Obsolescence. This is particularly aggravating. Every year or two, we all have to go through the ritual of making an upgrade, which involves time consuming un-installation and installation. And you need to make sure that different branches of your organization that use GIS are on the same page, otherwise you’ll run into incompatibility issues (like when mxd files created in versionÂ 9.2 don’t work in 9.1).
- Cross platform. I run a linux box at home and occasionally would like to take my work with me. There are a number of students and faculty members at my school who are ardent Mac users. But ArcGIS runs only on Windows.
The open source alternatives are free, easy to install (usually), can be installed anywhere without restrictions, the software doesn’t expire, and upgrades are a rather simple affair. The obvious downside is that none of them have the power, scope, or usability that ArcGIS has. At least, not yet.
I’ve posted the tutorials from the workshop I gave the other day for the NYCRDC. I’ve created a Resources page to hold resources hosted on this site – you can find them there, along with the datasets.
Overall I think it went rather well, but it was way too much material for a three hour workshop! We covered the intro slides, and Part I (Intro to GIS and ArcMap). I did an abridged version of Parts II (Intro to Layout View) and III (finding and downloading data, ArcCatalog, preprocessing in Excel) rather than doing all of II and none of III. The third part covers a lot that the standard ArcGIS texts gloss over (or leave out all together), so I really wanted to cover some of that material. But I couldn’t omit any of the basics in the first two parts, because you really need to know them before you can delve further (and understand why you’re delving). Ahhh, the steep learning curve of GIS!