Posts Tagged ‘census data’

2010 American Community Survey Releases

Friday, September 23rd, 2011

The US Census Bureau released the new annual data for the 2010 American Community Survey; this dataset includes an extensive number of demographic, socio-economic, and housing estimates (with margins of error) for all geographic areas in the US that have a population of at least 65,000 people. This is the first ACS survey that is weighted based on the 2010 Census, and that is tabulated entirely on the new 2010 Census geography; exceptions include PUMAs and urban areas, which typically aren’t redrawn until a couple of years after a decennial census is taken. Data for these areas will be reported based on the 2000 Census geography. This will also be the first ACS that is distributed via the new American Factfinder. Previous ACS datasets should be moved to the new Factfinder by the end of this year.

According to the release schedule data for the three year ACS (2008-2010) for areas with at least 20,000 residents will be published in October and the five year ACS (2006-2010) for geography down to census tracts will be released in December. The three year dataset hits a milestone this year, as for the first time we’ll have datasets with mutually exclusive years that can be feasibly compared for historical change (the 2005-2007 dataset versus 2008-2010). It should prove interesting as the earlier dataset represents the end of the brief boom years while the current one depicts the depth of the great recession. There will be some challenges in making comparisons, as the base for weighting the estimates and the geography used to tabulate them is different for each dataset (2000 Census in the earlier dataset versus 2010 Census in the latest one).

ZIP Code KML Map for NYC Census Data

Saturday, September 10th, 2011

With the release of both the 2010 Census profiles for ZCTAs (ZIP Code Tabulation Areas) and the TIGER line files for 2010 Census geographies, I created another Google Map finding aid for NYC neighborhood data by ZIP code (I previously created one for PUMAs with American Community Survey data). Once again I used the Export to KML plugin that was created for ArcGIS. This allowed me to use the TIGER shapefile in ArcGIS to create the map I wanted and then export it as a KML, while using fields in the attribute table of each feature to insert the ZCTA number into stable links for the census profiles, automatically generating unique urls for each feature. Click on the ZCTA in the map, and then click on a link to open a profile directly from the new American Factfinder.

There were two new obstacles I had to contend with this time. The first was that my department has finally migrated to Windows 7 from Windows XP, and I upgraded from ArcGIS 9.3 to 10. I had to reinstall the Export to KML plugin (version 2.5.5) and ran into trouble; fortunately all the work-arounds were included in the plugin’s documentation. I don’t have administrator rights on my machine, so I had to have someone install the plugin as an administrator; this included running the initial setup file AND running Arc as an administrator as you add and turn the plugin on. That was straightforward, but when I ran it the first time I got an error message – there’s a particular Windows dll or ocx file that the plugin needs and it was missing (presumably something that was included in XP but not in 7). I downloaded the necessary file, and with administrator rights moved it into the system32 folder and registered the file via the command line. After that I was good to go.

The second issue was with the Census Bureau’s new American Factfinder. With the old Factfinder the urls that were generated as you built and accessed tables were static and you could simply save and bookmark them. Not the case in the new Factfinder; you can bookmark some basic tables but most of them are “too complex to bookmark”; you can save and download queries from the online ap but that’s it. After some digging I found a CB document that tells you how you can create deep links to any query you run and table you create. The url consists of a fixed series of codes that identify the dataset, year, table, and geography. So this link:

http://factfinder2.census.gov/bkmk/table/1.0/en/DEC/10_DP/DPDP1/8600000US10010

Tells us that were getting a table from version 1.0 of the American Factfinder in English. It’s from the Decennial Census, 2010 Demographic Profiles, Demographic Profile Table 1, for ZCTA 10010 (860 is the summary level code that indicates we’re looking at ZCTAs). So for the plugin to create the links, I just included this URL but for the last five digits I specified the attribute from the ZCTA shapefile that held the ZCTA code. So when the plugin creates the KML, each KML feature has a link generated that is specific to it:

http://factfinder2.census.gov/bkmk/table/1.0/en/DEC/10_DP/DPDP1/8600000US[ZCTA5CE10]

You can see this previous post for details on how the Export to KML plugin works.

For now, the 2010 and 2000 Census are in the new American Factfinder. The American Community Survey, the Economic Census, population estimates, and a few other datasets are still in the older, legacy Factfinder. According to the CB all of this data will be migrated to the new Factfinder by the end of 2011 and the legacy version will disappear. At that point I’ll have to update my PUMA map so that it points to the profiles in the new Factfinder.

You can take a look at the ZCTA map and profiles below (I’m hosting it on the NYC data resource guide I’ve created for my college). As I’ve written before, ZCTAs are odd Census geographies since they are approximations of residential USPS ZIP Codes created by aggregating census blocks based on addresses; you can see in many instances where boundaries have a blocky teeth-like appearance instead of straight lines. Since they’re created directly by aggregating blocks, ZCTAs don’t correspond or mesh with other census boundaries like tracts or PUMAs, or even legal boundaries like counties. In some cases my assignment of county-based colors doesn’t ring true. For example, ZCTA 11370 includes part of the East Elmhurst neighborhood in Queens and Rikers Island, which is in the Bronx. ZCTA 10463 includes the Bronx neighborhoods of Kingsbridge and Spuyten Duyvil and the Manhattan neighborhood of Marble Hill (a geographic anomaly; it’s not on the Island of Manhattan but it’s part of Manhattan borough).

The most salient issue with ZCTAs is that they are only tabulated for the decennial census and not the American Community Survey; the currency of data and spectrum of census variables will be limited compared to other types of geography. ***NOTE*** This is no longer the case – ZCTA-level data is now available as part of the 5-year ACS, beginning with the 2007-2011 series.


View Larger Map

2010 Census Data Being Released

Thursday, June 16th, 2011

The US Census Bureau has begin releasing data for Summary File 1, which is the primary summary data set that the Bureau tabulates. They will release data for groups of states on a weekly basis from June through September. Alabama and Hawaii were the first states released today. California, Delaware, Kansas, Pennsylvania and Wyoming are out next week.

This data is based on the 100% count of the population and is being released for geographies that nest within states: states, counties, county subdivisions, places, census tracts, ZCTAs, and congressional districts, and in some cases block groups and blocks. You can download the data table by table by building queries via the new American Factfinder, or power users can download entire datasets via the FTP site.

You’ll see how small the 2010 Census is compared to the past: we’re only going to get basic demographic variables. The extensive number of socio-economic indicators – education, income, language, employment status, etc – are no longer collected as part of the decennial census; you have to turn to the American Community Survey for this data, which is released on an annual basis.

Here’s what’s in the 2010 Census:

  • Total Population
  • Urban and Rural Population
  • Gender and Age
  • Race
  • Hispanic or Latino Origin
  • Households (Including Type and Size)
  • Group Quarters
  • Families
  • Family Relationships
  • Housing Units
  • Occupancy Status (Occupied or Vacant)
  • Tenure (Owner or Renter Occupied)

Many of these variables are cross-tabulated by age, gender, race, Hispanic or Latino Origin, Household Type, and Household Size. Once we get to the fall of 2011 we’ll start to see national level data for divsions, regions, and metropolitan areas.

2010 Census Redistricting Data

Sunday, April 17th, 2011

The Redistricting Summary Data [P.L. 94-171] from the 2010 Census has all been published for the nation, states, counties, and places, and is available via the new American Factfinder. The redistricting data includes basic demographic data: total population, race, Hispanic or Latino origin, and number of housing units occupied and vacant. Data is available down to census blocks and is available for most (but not all – no ZCTAs or PUMAs) geographies.

If you don’t want all the data for a state, don’t want to slog through the Factfinder, and are comfortable working with large text files, you can FTP the summary data from the Redistricting Data homepage. If you want basic summary data for states, counties, and places and don’t want to fuss with the Factfinder or text files, you can download Excel spreadsheets from the Redistricting Data Press Kit. They also have some pdf / jpg maps showing county level population and population change, plus interactive map widgets like the one below for the country and for each state. 2010 Redistricting TIGER Shapefiles have also been released for geographies included in the redistricting dataset.

The full 2010 Census for all geographies will be released throughout this summer and into the fall in Summary File 1 [SF1]. Stay tuned.

Some 2010 Census Updates

Monday, February 7th, 2011

Some geography updates to pass along regarding new US Census data:

  • The Census has released a few 2010 map widgets that you can embed in web pages. One shows population change, density, and apportionment for the whole country at the state level, while the other shows population, race, and Hispanic change for states at the county level. As of this post only four states are ready (LA, MS, NJ, and VA) but they’ll be adding the rest once they’re available.
  • The 2010 TIGER Line Files are starting to be released; they’ve changed the download interface a little bit based on user feedback. Most summary levels / geographic areas are available; some (like ZIP Codes and PUMAs) will be released later this year.
  • They’re also rolling out the new interface for the American Factfinder; currently you can get 2000 Census data, some population estimates, and the 2010 Census data as it becomes available. Other datasets like the American Community Survey and Economic Census will be added over time. Some maps and gov docs librarians have expressed concerned about the change – apparently when you download the data from the new interface the FIPS codes are not “ready to go” for joining to shapefiles; there’s one long geo id that has to be parsed. The other concern is that the 1990 Census won’t be carried over into the new interface at all. The original American Factfinder is slated to come down towards the end of this year.

Google Maps to Create a Census Finding Aid

Thursday, May 13th, 2010

Yikes! It’s been quite awhile since my last post (the past couple months have been a little tough for me), but I just finished an interesting project that I can share.

I constantly get questions from students who are interested in getting recent demographic and socio-economic profiles for neighborhoods in New York City. The problem is that neighborhoods are not officially defined, so we have to look for a surrogate. The City has created neighborhood-like areas out of census tracts called community districts and they publish profiles for them, but this data is from the decennial census  and not current enough for their needs.  ZIP code data is also only available from the decennial census.

We can use PUMAs (Public Use Microdata Areas) to approximate neighborhoods in large cities, and they are published as part of the 3 year estimates of the American Community Survey. The problem is, in order to look up the data from the census you need to search by PUMA number – there are no qualitative place names. The city and the census have worked together to assign names to neighborhoods as part of the NYC Housing and Vacancy Survey, but this is the only place (I’ve found) that uses these names. You need to look in several places to figure out what the PUMA number and boundaries for an area are and then navigate through the census site to find it. Too much for the average student who visits me at the reference desk or emails me looking for data.

My solution was to create a finding aid in Google maps that tied everything together:

View Larger Map

I downloaded PUMA boundaries from the Census TIGER file site in a shapefile format. I opened them up in ArcGIS and used an excellent script that I downloaded called Export to KML. ArcGIS 9.3 does support KML exports via the toolbox, and there are a number of other scripts and stand-alone programs that can do this (I tried several) but Export to KML was best (assuming you have access to ArcGIS) in terms of the level of customization and the thoroughness of the user documentation. I symbolized the PUMAs in ArcGIS using the colors and line thickness that I wanted and fired up the tool. It allows you to automatically group and color features based on the layer’s symbology. I was able to add a “snippet” to each feature to help identify it (I used the PUMA number as the attribute name and the neighborhood name as my snippet, so both appear in the legend) and added a description that would appear in the pop up window when that feature is clicked. In that description, I added the URL from the ACS census profile page for a particular PUMA – the cool part here is that the URL is consistent and contains the PUMA number. So, I replaced the specific number and inserted the [field] name from the PUMAs attribute table that contained the number. When I did the export, the URLs for each individual feature were created with their PUMA number inserted into the link.

There were a few quirks – I discovered that you can’t automatically display labels on a Google Map without subterfuge, like creating the labels as images and not text. Google Earth (but not Maps) supports labels if you create multi-geometry where you have a point for a label and a polygon for the feature. If you select a labeling attribute on the initial options screen of the Export to KML tool, you create an icon in the middle of each polygon that has a different description pop-up (which I didn’t want so I left it to none and lived without labels). I made my features 75% transparent (a handy feature of Export to KML) so that you could see the underlying Google Map features through the PUMA, but this made the fill AND the lines transparent, making the features too difficult to see. After the export I opened the KML in a text editor and changed the color values for the lines / boundaries by hand, which was easy since the styles are saved by feature group (boroughs) and not by individual feature (pumas). I also manually changed the value of the folder open element (from 0 to 1) so that the feature and feature groups (pumas and boroughs) are expanded by default when someone opens the map.

After making the manual edits, I uploaded the KML to my webserver and pasted the url for it into the Google Maps search box, which overlayed my KML on the map. Then I was able to get a persistent link to the map and code for embedding it into websites via the Google Map Interface. No need to add it to Google My Maps, as I have my own space. One big quirk – it’s difficult to make changes to an existing KML once you’ve uploaded and displayed it. After I uploaded what I thought would be my final version I noticed a typo. So I fixed it locally, uploaded the KML and overwrote the old one. But – the changes I made didn’t appear. I tried reloading and clearing the cache in my browser, but no good – once the KML is uploaded and Google caches it, you won’t see any of your changes until Google re-caches. The conventional wisdom is to change the name of the file every single time – which is pretty dumb as you’ll never be able to have a persistent link to anything. There are ways to circumvent the problem, or you can just wait it out. I waited one day and by the next the file was updated; good enough for me, as I’ll only need to update it once a year.

I’m hosting the map, along with some static PDF maps and a spreadsheet of PUMA names and neighborhood numbers, from the NYC Data LibGuide I created (part of my college’s collection of research guides). If you’re looking for neighborhood names to associate with PUMA numbers for your city, you’ll have to hunt around and see if a local planning agency or non-profit has created them for a project or research study (as the Census Bureau does not create them). For example, the County of Los Angeles Department of Mental Health uses pumas in a large study they did where they associated local place names with each puma.

If you’re interested in dabbling in some KML, there’s Google’s KML tutorial. I’d also recommend The KML Handbook by Josie Wernecke. The catch for any guide to KML is that while all KML elements are supported by Google Earth, there’s only partial support for Google Maps.

Update on Some Data Sources

Saturday, October 31st, 2009

Here’s my last chance to squeeze in a post before the month is over. There have been a lot of changes and updates with some key data sites lately. Here’s a summary:

  • The homepage for gdata, which provides global GIS data that was created as part of UC Berkeley’s Biogeomancer project, has moved to the DIVA-GIS website. DIVA-GIS is a free GIS software project designed specifically for biology and ecology applications, with support from UC Berkeley as well as several other research institutions and independent contributors. It looks like the old download interface has been incorporated into the DIVA-GIS page.
  • The US Census Bureau has recently released its latest iteration of the TIGER shapefiles, the 2009 TIGER/Line Shapefiles. Since they seem to be making annual updates, which has involved changing the URLs around, it may be better to link to their main TIGER shapefile page where you can get to the latest and previous versions of the files.
  • The bureau has released its latest American Community Survey (ACS) data: 2008 annual estimates for geographic areas with 65,000 plus people, and three year 2006-2008 estimates for geographic areas with 20,000 plus people. Available through the American Factfinder.
  • Over the summer, UM Information Studies student Clint Newsom and I created a 2005-2007 PUMA-level New York Metropolitan ACS Geodatabase (NYMAG). It’s available for download on the new Baruch Geoportal, which was re-launched as a public website this past September. It’s a personal geodatabase in Microsoft Access format, so it can only be directly used with ArcGIS. I plan on creating the 2006-2008 version sometime between January and March 2010, and hope to release an Access and SQLite version, as the latest development versions of QGIS now offer direct support for SQlite geodatabases in the Spatialite format (which is awesome!).
  • While it’s not a source for GIS data or attribute tables, it’s still worth mentioning that the CIA World Factbook completely revised their website this past summer. The previous web versions of the factbook took their design cues from the old paper copies of the report. The CIA revamped the entire site and apparently will be using a model of continuous rather than annual updates. It’s a great site for getting country profiles – another good option is the UN World Statistics Pocketbook, which is part of the UNdata page.

Formulas for Working With Census ACS Data in Excel / Calc

Friday, June 26th, 2009

After downloading US census data, you often need to reformat it before using it. It’s quite common that you download files where the population is broken down by gender and age, and you need to aggregate the data to get a total or divide a particular characteristic to get a percent total. This is pretty straightforward if you’re working with decennial census data, but data from the American Community Survey (ACS) is a little trickier to deal with since you’re working with estimates that have a margin of error. When creating new data, you also have to calculate what the margin of error is for your derived numbers. I’ll walk through some examples of how you would do this in a spreadsheet (the formulas below will work in either Excel or Calc).

Creating an Aggregate

We’ll use the following data in our example:

screenshot1

We have the total population of people three years and older who are enrolled in school, and a breakdown of this population enrolled in grades 1 through 4 and grades 5 through 8 in a few counties in New York, with margins of error for each data point. Our data is from the 3 year averaged 2005-2007 American Community Survey.

Let’s say we want to create a total for students who are enrolled in grades 1 through 8 for each county. We create a new column and sum the estimates for each county with the formula e3+g3, or sum(e3:g3).

To calculate a margin or error (MOE) for our grade 1 to 8 data, first we have to use the find and replace command to get rid of the “+/-” signs in the MOE column, so our spreadsheet will treat our values as numbers and not text (this is an issue if you downloaded the data as an Excel file – if you download a txt file the +/- is not included). Depending on the dataset you’re working with, you may also need to replace dashes, which represent data that was null or not estimated.

Once the data is cleaned up, we can insert a new column with this formula:

=SQRT((F3^2)+(H3^2))

This calculates our new margin of error by squaring the moes for each of our data points, summing the results together, and taking the square root of that sum. In other words,

=SQRT((MOE1^2)+(MOE2^2))

Once that’s done, you may want to round the new MOE to a whole number.

Creating a Percent Total

Let’s calculate the percentage of the population 3 years and older enrolled in school that are in grades 1 through 8. Based on what we have thus far (I hid the columns E,F,G, and H for grades 1-4 and 5-8 in this screenshot, as we don’t need them):

screenshot-2

We insert a new column where we divide our subgroup by the total, as you would expect – I3/C3. In the next column we insert the following formula to create a MOE for our new percent total:

=(SQRT((J3^2)-((K3^2)*(D3^2))))/C3

This one’s a little weightier than our last formula. We’re taking the square of our percent total (K3) and the square of the MOE of the total population (D3), multiplying them together, then subtracting that number from the square of the MOE of our subgroup (J3). Then we take the square root of the whole thing, then divide it by our total population (C3). If you’re saying – HUH? Maybe this is clearer:

=(SQRT((MOEsubset^2)-((PercentTotal^2)*(MOEtotalpop^2))))/TotalPop

Finally, we have something like this:

screenshot-3

Based on our data, we can say things like “There were approximately 30,556 students enrolled in 1st through 8th grade per year in Dutchess County, NY between 2005 and 2007, plus or minus 1,184 students. An estimated 37% of the population enrolled in school in the county was in the 1st through 8th grade, plus or minus 1%.” The ACS estimates have a 90% confidence interval.

Wrap Up

In this example we worked with aggregating and calculating percentages based on characteristics. We could also use these same formulas to aggregate data by geography, if we wanted to add the characteristics for all the counties together.

For the full documentation on working with ACS data, take a look at the appendix in the Census’ ACS Compass Guide, What General Data Users Need to Know. It provides you with the formulas in their proper statistical notation (for those of you more mathematically inclined than I) and includes formulas for calculating other kinds of numbers, such as ratios and percent change. It does provide you with worked-through examples, but not with spreadsheet formulas. I used their examples when I created formulas the first time around, so I could compare my formula results to their examples to insure that I was getting it right. I’d strongly recommend doing that before you start plugging away with your own data – one misplaced parentheses and you could end up with a different (and incorrect) result.

2007 Economic Census

Monday, June 8th, 2009

The US Census Bureau has begun releasing data for the 2007 Economic Census. The bureau conducts the survey of businesses every five years – all medium to large size businesses and multi-part businesses are counted, samples are taken for smaller businesses, and various administrative records are used to calculate businesses with no employees (i.e. freelancers). All businesses are categorized hierarchically by North American Industrial Classification System (NAICS) codes and the data is reported by industry and geography. The number of establishments, employees, payroll, and sales are counted for the nation, states, counties, metro areas, places, and zip codes.

At this point national industry totals for the broadest categories of NAICS are available, as are preliminary numbers for the most specific NAICS categories (six digit) at the national level. Data for smaller geographic areas will be released between October 2009 and August 2010.

The biggest change from the 2002 Economic Census is the delivery method for the data. There will be no more 90 page pdf files or HTML tables that drill down six levels. All of the data will be released via the American Factfinder only. Other changes include the addition of some new geography (CDPs with at least 5000 people), new metro area definitions, and the revised 2007 definitions for NAICS which include small changes to the Finance, Insurance, Real Estate, Professional Services, and Administrative Services categories.

Additional changes for 2007, the data release schedule, NAICS codes, and methodology docs are all available at the 2007 Economic Census homepage within the Census Bureau’s website.

All of the data is aggregated by industry and geography – you cannot get lists of businesses with names and addresses as this information is kept confidential. Furthermore, to maintain confidentiality, if one company controls a large share of the market for a specific sector within a specific geographic area, or if there few businesses within a sector in a specific geographic area, much of the data (with the exception of the number of businesses) remains classified (marked with a D for disclosure). Oftentimes this means that data for industries within small areas (big box retail in a small town) and data for industries with few establishments in an area (mining establishments in New York City) are hidden. The smaller the geography, the more likely it is that the data will not be disclosed. This becomes a technical issue if you want / need to move this data into a database, as these pesky disclosure notes are stored in the same columns as the data and prevent you for designating the fields as numeric.

Given the delay between the time the data is collected and the time it is released, it isn’t particularly helpful for analyzing our current economic climate, but it does provide a snapshot of the way the US economy looked at that moment, and is useful in understanding how the economy is evolving. Be aware that when making comparisons to past data, you have to correct for changes in geography and NAICS definitions. The differences between 2002 and 2007 are not too great, but more adjustments are necessary as you go further back in time. The Bureau provides data back to 1997 through the American Factfinder and some data from 1992 on an older page. If you need to go back further, you’ll be entering the realm of (gasp!) CD-ROMs or the paper reports.

Social Explorer and New ACS Census Data

Thursday, January 22nd, 2009

This is kind of a follow-up to my last post – the Social Explorer, a great interactive mapping site that allows you to map US Census data, has added the 2005-2007 American Community Survey data to their site at the PUMA level. This is the smallest geographic area that is available for recent data, until we get to the 2010 Census and 2010 ACS. At this point you can look at total population, race, and Hispanic ethnicity. It looks like you can make maps, but you can’t export the data unless you subscribe to the full version.

The Social Explorer allows you to map a wide selection of decennial census data all the way back to the 1790 census (they have a partnership with NHGIS, which provides historical data and boundary files for free download with registration). Tract-level data is available back to 1940. While you can map the data, and you can generate slideshows and download static maps as image files, you can only generate reports for the 2000 census. In order to get full access for report generation and other features, you’ll have to subscribe (or find access to a library that does).

Social Explorer also works with ARDA (Association of Religious Data Archives) to create maps of county-level religious affiliation (since the US Census does not collect this data by law). Of all the interactive mapping sites I’ve seen, the Social Explorer is one of the slickest and easiest to use.


Copyright © 2017 Gothos. All Rights Reserved.
No computers were harmed in the 0.453 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.