Relating ZIP Codes / ZCTAs to PUMAs
Ever since I created the Google Maps finding aid for census data for NYC PUMAs and the associated PUMA – NYC neighborhood names maps, I’ve received several requests for tables or maps that relate PUMAs to ZIP Codes. These are usually from non-profits in NYC who have lists of donors, members, or constituents with addresses, and they want to relate the addresses (using the ZIP) to recent demographic data from American Community Survey (ACS) for the broader neighborhood where the ZIP is located.
The problem is that ZIP Codes are an all around pain. They actually don’t exist as areas with distinct boundaries; ZIP Codes are all address based, with ZIPs tied to addresses along street segments. The USPS doesn’t publish these tables or create maps; they contract this out for private companies to do, who turn around and sell these products for hefty fees.
Fortunately the Census Bureau has used these address tables to create approximations of ZIP Codes that they call ZCTAs or ZIP Code Tabulation Areas. ZCTAs are aggregates of census blocks that attempt to mimic ZIP Codes that exist as areas; codes associated with specific single-point firms or organization are dropped. Since ZIPS were created by the USPS, ZCTAs do not nest or mesh with any census geography; they cross PUMA, county, and in some cases even state boundaries. They are also less stable than census geography, with frequent changes, and as statistical areas they vary widely in area and population. For this reason ZCTA data is only published every ten years in the decennial census; it’s not included in the ACS (so far).
With these caveats in mind, I used the Missouri Census Data Center’s MABLE/GEOCORR engine to correlate ZCTAs with PUMAs. While the interface looks a little retro and daunting, it’s actually pretty simple. You choose the state, the two geographies you want to relate, the weighting method for allocating one to the other, and an output format that includes CSV or HTML. I also used an option that lets you type in FIPS codes for the counties you want, so I didn’t end up with the entire state.
This method was the way to go, as they give you the option to allocate geographies based on population and not simply land area; each ZCTA was allocated to PUMAs based on where the majority of the ZCTA’s population lived using 2000 census block data. The final output contains one row for each ZCTA to PUMA combination. So you had multiple rows for ZCTAs that weren’t contained within a single PUMA, and for each of those ZCTAs you had fields that showed the percentage of the ZCTA’s population that lived in each PUMA (along with the actual population number) as well as the percentage of the PUMA’s population that lived in that ZCTA.
I took that table and cleaned it up in a spreadsheet, so that I was left with one row for each ZCTA, where the ZCTA was allocated to one PUMA based on where the majority of it’s population lives. I used some ZCTA and PUMA boundaries that I had originally downloaded and subsequently cleaned up from the 2009 TIGER shapefiles page, added them to QGIS, joined the ZCTA allocation table to the ZCTA geography, and mapped the result. I color-coded ZCTAs so that clusters of ZCTAs within a particular PUMA had the same color. Then I overlaid the PUMA boundaries on top to see how well they corresponded.
In the end, they didn’t correspond all that well. There was a fairly good relationship in Manhattan, ok relationship in Queens and Staten Island, and a rather lousy relationship in the Bronx and Brooklyn. I overlaid greenspace and facilities (airports, shipyards, etc) boundaries I had, and that made some difference; you could see in some areas where ZCTAs overlapped two PUMAs that the overlap coincided with parks, cemeteries, or other areas with low or no residential population in one of the PUMAs.
I’ve posted both sets of tables, maps, and some instructions on the NYC neighborhoods resource page. You can use the original MABLE / GEOCORR table to judge where allocations were good and were they were not so good based on population. For now, the engine is still based on 2000 Census geography and data. Even though the Census has started releasing 2010 TIGER files based on 2010 Census geography, ZCTAs and PUMAs are often some of the last geographies to be updated; current releases of the ACS are still based on the 2000 geographies. Stay tuned to the Census Bureau and MCDC websites for news on updates, and keep the MABLE / GEOCORR in mind if you want to create lists to relate census geographies by population or land area.