Posts Tagged ‘gis data’

Update Your Links to the New Baruch Geoportal

Thursday, August 13th, 2015

A few weeks ago I launched a new version of our college’s GIS data repository, the Baruch Geoportal. At the back end I have a simplified process for getting data onto our server, and on the front end we did away with manually updating HTML and CSS webpages in favor of using a Confluence wiki. My college has a subscription to Confluence, and I’ve been using an internal wiki for documenting and administering all aspects of our projects. A public, external wiki for providing our data seemed like a nice way to go – we can focus more on the content and it’s easier for my team and I to collaborate.

Since it’s a new site with a new address, many of the links to projects I’ve referred to throughout the years on this blog are no longer valid. Redirects are in place, but they won’t last forever. Some notable links to update:

The new site has a dedicated blog that you can follow (via RSS) for the latest updates to the portal. The portal also has a number of relatively new and publicly accessible datasets that we’ve posted over the last year (but that I haven’t had time to post about). These include the NYC Mass Transit Spatial Layers series and population centroids for US census geographies. We’ve been creating ISO spatial metadata for all of our new layers, but we still need to create XML stylesheets to make them more human-readable. That will be one of many projects to do for this academic year.


NYC Geodatabase Updates: Spatialite Upgrade & ZIPs to ZCTAs

Wednesday, July 30th, 2014

I released the latest version of the NYC geodatabase (nyc_gdb) a few weeks ago. In addition to simply updating the data (for subway stations and ridership, city point features, and ZIP Code Business Patterns data) I had to make a couple of serious upgrades.

Spatialite Updates

The first was that is was time for me to update the version of Spatialite I was using, from 2.4 to 4.1, and to update my documentation and tutorial from the Spatialite GUI 1.4 to 1.7. I used the spatialite_convert tool (see the bottom of this page for info)to upgrade and had no problem. There were some major benefits to making the switch. For one, writing statements that utilize spatial indexes is much simpler – this was version 2.4, generating a neighbor list of census tracts:

SELECT tract1.tractid AS tract, tract2.tractid AS neighbor
FROM a_tracts AS tract1, a_tracts AS tract2
WHERE ST_Touches(tract1.geometry, tract2.geometry) AND tract2.ROWID IN (
SELECT pkid FROM idx_a_tracts_geometry
WHERE pkid MATCH RTreeIntersects (MbrMinX(tract1.geometry), MbrMinY(tract1.geometry),
MbrMaxX(tract1.geometry), MbrMaxY(tract1.geometry)))

And here’s the same statement in 4.1 (for zctas instead of tracts):

SELECT zcta1.zcta AS zcta, zcta2.zcta AS neighbor
FROM a_zctas AS zcta1, a_zctas AS zcta2
WHERE ST_Touches(zcta1.geometry, zcta2.geometry)
AND zcta1.rowid IN (
SELECT rowid FROM SpatialIndex
WHERE f_table_name=’a_zctas’ AND search_frame=zcta2.geometry)
ORDER BY zcta, neighbor

There are also a number of improvements in the GUI. Tables generated by the user are now grouped under one heading for user data, and the internal tables are grouped under subsequent headings, so that users don’t have to sift through all the objects in the database to see what they need. The import options have improved – with shapefiles and dbfs you can now designate your own primary keys on import. You also have the option of importing Excel spreadsheets of the 97-2003 variety (.xls). In practice, if you want the import to go smoothly you have to designate data types (format-cells) in the Excel sheet (including number of decimal places) prior to importing.


I was hesitant to make the leap, because version 2.4 was the last version where they made pre-compiled binaries for all operating systems; after that, the only binaries are for MS Windows and for Mac and Linux you have to compile from source – which is daunting for many Mac users that I am ill-equipped to help. But now that Spatialite seems to be more fully integrated with QGIS (you can create databases with Q and using the DB Manager you can export layers to an existing database) I can always steer folks there as an alternative. As for Linux, more distros are including updated version of the GUI in their repositories which makes installation simple.

One of the latest features in Spatialite 4.1.1 is the ability to import XML ISO metadata into the database, where it’s stored as an XML-type blob in a dedicated table. Now that I’m doing more work with metadata this is something I’ll explore for the future.


The other big change was how the ZIP Code Business Patterns data is represented in the database. The ZBP data is reported for actual ZIP Codes that are taken from the addresses of the business establishments, while the boundaries in the nyc_gdb database are for ZIP Code Tabulation Areas (ZCTAs) from the Census. Previously, the ZBP data in the database only represented records for ZIP Codes that had a matching ZCTA number. As a result, ZIP Codes that lacked a corollary because they didn’t have any meaningful geographic area – the ZIP represented clusters of PO Boxes or large organizations that process a lot of mail – were omitted entirely.

In order to get a more accurate representation of business establishments in the City, I instituted a process to aggregate the ZIP Code data in the ZBP to the ZCTA level. I used the crosswalk table provided by the MCDC which assigns ZIPs to ZCTAs, so those PO Boxes and large institutions are assigned to the ZCTA where they are physically located. I used SQLite to import that crosswalk, imported the ZBP data, joined the two on the ZIP Code and did a group by on the ZCTA to sum employment, establishments, etc. For ZIPs that lacked data due to disclosure regulations, I added some note or flag columns that indicate how many businesses in a ZCTA are missing data. So now the data tables represent records for ZCTAs instead of ZIPs, and they can be joined to the ZCTA features and mapped.

The latest ZBP data in the new database is for 2012. I also went back and performed the same operation on the 2011 and 2010 ZBP data that was included in earlier databases, and have provided that data in CSV format for download in the archives page (in case anyone is using the old data and wants to go back and update it).

Notes from the Open Geoportal National Summit

Wednesday, October 30th, 2013

This past weekend I had the privilege of attending the Open Geoportal (OGP) National Summit in Boston, hosted by Tufts University and funded by the Sloan Foundation. The Open Geoportal (OGP) is a map-based search engine that allows users to discover and retrieve geospatial data from many repositories. The OGP serves as the front-end of a three-tiered system that includes a spatial database (like PostGIS) at the back and some middleware (Like OpenLayers) to communicate between the two.

Users navigate via a web map (Google by default but you can choose other options), and as they change the extent by panning or zooming a list of available spatial layers appears in a table of contents beside the map. Hovering over a layer in the contents reveals a bounding box that indicates its spatial extent. Several algorithms determine the ranking order of the results based on the spatial intersection of bounding boxes with the current map view. For instance, layers that are completely contained in the map view have priority over those that aren’t, and layers that have their geographic center in the view are also pushed higher in the results. Non-spatial search filters for date, data type, institution, and keywords help narrow down a search. Of course, the quality of the results is completely dependent on the underlying metadata for the layers, which is stored in the various repositories.


The project was pioneered by Tufts, Harvard, and MIT , and now about a dozen other large research universities are actively working with it, and others are starting to experiment. The purpose of the summit was to begin creating a cohesive community to manage and govern the project, and to increase and outline the possibilities for collaborating across institutions. At the back end, librarians and metadata experts are loading layers and metadata into their repositories; metadata creation is an exacting and time-consuming process, but the OGP will allow institutions to share their metadata records in the hope of avoiding duplicated effort. The OGP also allows for the export of detailed spatial metadata from FGDC and ISO to MODS and MARC, so that records for the spatial layers can be exported to other content management systems and library catalogs.

The summit gave metadata experts the opportunity to discuss best practices for metadata creation and maintenance, in the hopes of providing a consistent pool of records that can be shared; it also gave software developers the chance to lay out their road map for how they’ll function as an open source project (the OGP community could look towards the GeoNetwork opensource project, a forerunner in spatial metadata and search that’s used in Europe and by many international organizations). Series of five-minute talks called Ignite sessions gave librarians and developers the ability to share the work they were doing at their institutions, either with OGP in particular or with metadata and spatial search in general, which sparked further discussion.

The outcome of all the governance, resource sharing, and best practices discussions are available on a series of pages dedicated to the summit, on the project website. You can also experiment with the OGP via, Tuft’s gateway to their repository. As you search for data you can identify which repository the data is coming from (Tufts, Harvard, or MIT) based on the little icon that appears beside each layer name. Public datasets (like US census layers) can be downloaded by anyone, while copyrighted sets that the schools’ purchased for their users require authentication.

OGP is a simple yet elegant open source project that operates under OGC standards and is awesome for spatial search, but the real gem here is the community of people that are forming around it. I was blown away by the level of expertise, dedication, and over all professionalism that each of the librarians, information specialists, and software developers exuded, via the discussions and particularly by the examples of the work they were doing at their institutions. Beyond just creating software, this project is poised to enhance the quality and compatibility of spatial metadata to keep our growing pile of geospatial stuff find-able.

NYC Subway and Transit GIS Layers

Saturday, July 24th, 2010

I’ve started outlining a one-day, introductory GIS practicum / workshop that I hope to offer in the coming academic year. One of the primary examples I want to use in the workshop is site selection for a retail store, and I thought it would be great to use a subway layer as part of the exercise. But alas, I searched high and low for a layer late last year (for a site selection project) and couldn’t find a publicly available one. I had purchased some proprietary layers, but really don’t want to use them for this workshop because I want to be able to freely distribute all of the materials to anyone; the layer I purchased is also outdated now because the MTA cut many services (including two subway lines) last month.

But thanks to Steve Romalewski at the CUNY Mapping Service, there’s now an alternative! Steve’s work is a HUGE contribution to the GIS community in New York and fills a glaring hole in the city’s collection of freely available GIS data. The MTA does host a data feed service (based on the General Transit Feed Specification created by Google) where it provides the geography of all its transit services, among other things. Steve downloaded and processed this raw data and turned it into shapefiles. He quickly discovered that it required a fair amount of scrubbing to be usable, and he’s cleaned it up and documented the entire process in great detail in several posts on his blog (Spatiality). Links to download individual shapefiles are available at the bottom of each post, following his discussion of issues and methodology for each set of layers. The CUNY Center for Urban Research has created an index page with each post, which you can access here.

In addition, he’s created a lyr file for the subway lines in order to symbolize them correctly by color and a separate mxd file for labels. While the shapefiles represent where the lines are, there are some problems representing them as they appear cartographically on the MTA’s subway maps. Many lines, including some with different colors, share the same trunk line. For example the A and C trains (blue lines) share the same trunk with the B and D trains (orange lines) along 8th Ave from 59th St to 145th St. Depending on how you sort your symbol categories, you’ll only see one color (and line) depending on which one you have on top. Steve points out two ways for solving this issue – you can edit the geography and offset one of the lines, which is tedious and creates problems as you change scale (he has some great screen shots that depict this). If you’re using ArcGIS, he shows off some cartographic tools that you can use to offest lines by prioritizing values in the attribute table. This is more ideal, as it gives the illusion that the lines are side by side cartographically while keeping the geometry of the shapefile intact.

So if you’re using ArcGIS you’ll be good to go. I’ve downloaded the files to play around with, but as I’m at home and using QGIS I had some more work to do, since lyr and mxd files are proprietary ESRI formats that the open source packages can’t handle. I’ve assigned the appropriate colors to each subway line and saved them a QGIS style file (.qml), which you can import in the symbology window to quickly and easily get the right colors (which I plucked from the MTA’s website). I’ve also saved the RGB and hex values for each line in a text file, if you’re using some other GIS software and need to input them manually. As far as I know there isn’t an easy way to circumvent the shared-line subway problem if you’re using QGIS (see screenshot below), so you’d have your work cut out for you if you want to faithfully represent the lines the way they appear on the MTA maps. But if you’re using the layers for analysis (which is what I’ll be doing) or you don’t need to emulate “the” subway map in exact detail, it shouldn’t matter.

NYC subway layers from CUNY Mapping Service in QGIS

NYC subway layers from CUNY Mapping Service in QGIS

Footnote – for anyone who is interested, the proprietary data that I purchase for the college is from a company called Halcrow. The entire NYC transportation package costs $465. It includes NYC subways and buses (lines and stations for each, along with ridership statistics from 2008 and a historical bus stops layer from 1998), LIRR and Metro North (lines and stations), but also includes the PATH train, freight lines, and truck routes.

Natural Earth Vector and Raster Data

Tuesday, December 15th, 2009

I haven’t been posting regularly as I’ve been swamped this semester – but now that it’s coming to an end I should be able to crank out a post or two each month.

I recently saw a message on Maps-L about a new GIS data source, Natural Earth, and just got around to taking a look at it. It’s run by a volunteer organization dedicated to providing free, integrated, public domain map layers for producing high-quality maps at small scales. They have a pretty comprehensive website that includes a blog, feature list, contributor information, and details on how to volunteer.

Natural Earth provides smooth, generalized vector and raster layers at three scales: 1:10m, 1:50m, and 1:110m. See my screen shot of the Delmarva peninsula to see the distinctions (beige area is 110m, red line is 50m, and blue line is 10m).


Having a choice of scales with vector and raster data layers from the same source is a huge plus (many other country-level boundary files available on the web are detailed and suitable for large scale maps, but look messy when you zoom out to a smaller scale). Natural Earth also provide outlines for land and water (including legal water boundaries for all the Pacific islands), hydrographic features generalized to the different scales, ice shelves, urban areas, and several lat/long grid line layers.

For country boundaries they’ve gotten around the tangled issue of country definitions by providing different layers for different definitions, so you can choose the one that’s most appropriate – sovereign states (so, Greenland would be part of the Denmark polygon, Alaska and Puerto Rico part of the US, and French Guiana part of France), countries (Greenland separate from the Denmark polygon, Puerto Rico separate from the US, Alaska part of the US, and French Guiana part of France), and subunits (each place its own polygon). As you move down this hierarchy, places are linked back to their whole (so there are fields in the subunit file that list which country and sovereign state it’s part of).

At this point subdivisions (states / provinces) are only provided for the US and Canada. They do provide some descriptive metadata for each layer on the website, but the metadata doesn’t follow any standardized format for geographic data. The biggest missing link is unique identifiers – none of the countries have ISO or FIPS codes, so there aren’t any fields to join attribute data to for thematic mapping (except country name, which never works smoothly given the amount of variation with names).

Overall this looks like a great resource. Vector data is in shapefile format, raster data is in tiff, and everything is defined as simple WGS 84, so these files should work with almost any GIS package, ready to go.

Update on Some Data Sources

Saturday, October 31st, 2009

Here’s my last chance to squeeze in a post before the month is over. There have been a lot of changes and updates with some key data sites lately. Here’s a summary:

  • The homepage for gdata, which provides global GIS data that was created as part of UC Berkeley’s Biogeomancer project, has moved to the DIVA-GIS website. DIVA-GIS is a free GIS software project designed specifically for biology and ecology applications, with support from UC Berkeley as well as several other research institutions and independent contributors. It looks like the old download interface has been incorporated into the DIVA-GIS page.
  • The US Census Bureau has recently released its latest iteration of the TIGER shapefiles, the 2009 TIGER/Line Shapefiles. Since they seem to be making annual updates, which has involved changing the URLs around, it may be better to link to their main TIGER shapefile page where you can get to the latest and previous versions of the files.
  • The bureau has released its latest American Community Survey (ACS) data: 2008 annual estimates for geographic areas with 65,000 plus people, and three year 2006-2008 estimates for geographic areas with 20,000 plus people. Available through the American Factfinder.
  • Over the summer, UM Information Studies student Clint Newsom and I created a 2005-2007 PUMA-level New York Metropolitan ACS Geodatabase (NYMAG). It’s available for download on the new Baruch Geoportal, which was re-launched as a public website this past September. It’s a personal geodatabase in Microsoft Access format, so it can only be directly used with ArcGIS. I plan on creating the 2006-2008 version sometime between January and March 2010, and hope to release an Access and SQLite version, as the latest development versions of QGIS now offer direct support for SQlite geodatabases in the Spatialite format (which is awesome!).
  • While it’s not a source for GIS data or attribute tables, it’s still worth mentioning that the CIA World Factbook completely revised their website this past summer. The previous web versions of the factbook took their design cues from the old paper copies of the report. The CIA revamped the entire site and apparently will be using a model of continuous rather than annual updates. It’s a great site for getting country profiles – another good option is the UN World Statistics Pocketbook, which is part of the UNdata page.

Updated Links for Data and Resources

Saturday, April 25th, 2009

I recently went through my pages of suggested links for data and resources to update and clean them up. I’ve included many of the cool resources I’ve discovered since I started writing this blog, which ended up in individual posts but not in these pages. I went over the resources page in particular, to try and classify the reference materials, tools, and software into useful categories rather than just having one large blob of stuff.


Monday, August 4th, 2008

I’ve stumbled across a few good sites for GIS data lately. Check these out:

UNSDI-NCO: The United Nations Spatial Data Infrastructure site, maintained by the Netherlands Coordination Office. They have many global datasets as well as country-specific ones, often for developing countries where data is hard to come by. Includes boundaries, roads, infrastructure, and natural features. Click on the Datasets link under the Categories menu to see the list, then click on the feature of you choice. You’ll have to scroll through the metadata to the Distribution Info element to get to a download link. Not all of the datasets are available for public download.

gData: This site is housed at Berkeley as part of the Biogeomancer Project, whose goal is to share data on biodiversity. You can download boundaries, hydrography, infrastructure, topography, and climate data in vector and raster formats for any country in the world. The data is aggregated, and in some cases improved, from many public sources. Administrative boundaries include 1st, 2nd, and often 3rd level divisions. A great, comprehensive source.

CEGRP: China Earthquake Geosptial Research Portal, housed at Harvard. The goal of the site is to gather and distribute geospatial data in response to the earthquake that hit Sichuan China in May 2008. Vector and raster layers for all of China and for this particular region where the earthquake hit.

AIMS: Afghanistan Information Management Services. A non-profit group located in Afghanistan that has created and maintains a geospatial infrastructure to support the government. Vector datasets for the entire country and the city of Kabul are available for download. They also offer a number of static pdf maps.

Copyright © 2017 Gothos. All Rights Reserved.
No computers were harmed in the 0.380 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.