Posts Tagged ‘census’

Some 2010 Census Updates

Monday, February 7th, 2011

Some geography updates to pass along regarding new US Census data:

  • The Census has released a few 2010 map widgets that you can embed in web pages. One shows population change, density, and apportionment for the whole country at the state level, while the other shows population, race, and Hispanic change for states at the county level. As of this post only four states are ready (LA, MS, NJ, and VA) but they’ll be adding the rest once they’re available.
  • The 2010 TIGER Line Files are starting to be released; they’ve changed the download interface a little bit based on user feedback. Most summary levels / geographic areas are available; some (like ZIP Codes and PUMAs) will be released later this year.
  • They’re also rolling out the new interface for the American Factfinder; currently you can get 2000 Census data, some population estimates, and the 2010 Census data as it becomes available. Other datasets like the American Community Survey and Economic Census will be added over time. Some maps and gov docs librarians have expressed concerned about the change – apparently when you download the data from the new interface the FIPS codes are not “ready to go” for joining to shapefiles; there’s one long geo id that has to be parsed. The other concern is that the 1990 Census won’t be carried over into the new interface at all. The original American Factfinder is slated to come down towards the end of this year.

Track 2010 Census Participation Rates

Tuesday, March 30th, 2010

The 2010 Census is in full swing – the target date of April 1st is coming up soon. I mailed my form back last week. If you’re curious as to how many others have mailed theirs back, check out the bureau’s interactive Take 10 Map. Built on top of a Google Map interface, it allows you to track participation rates by state, county, place, reservation, and census tract. You can zoom in to change the scale and select different geography, or enter a zip code, city, or state to zoom to an area of choice. Clicking on an area will display it’s participation rate to date, compared to the state and national rates.

Data is updated daily, Monday through Friday. Once you click on a particular area, if you click the Track Participation Rate link it will create a widget that you can embed in a website to provide the updated rate. Unlike a lot of the other interactive web maps floating around these days, the bureau does give you the ability to download the actual data behind the map, if you want to do some analysis of your own.

Mapping Hard to Count Areas for Census 2010

Tuesday, February 23rd, 2010

There was an interesting article in the New York Times today about neighborhoods in New York that typically get under-counted in the Census. These include areas with high immigrant populations as well as places that have had new construction since the last census, as the buildings haven’t been added to the Census Bureau’s master address file.

What the article didn’t mention is that CUNY’s Center for Urban Research has created a great online ap called the Census 2010 Hard to Count mapping site. The site is built on the Census Bureau’s Tract Level Planning Database, which identified twelve population and housing variables, such as language isolation, recent movers, poverty, and crowded housing, that were associated with low mail response in the 2000 Census. This tool was designed to help Census reps, local government officials, and community activists identify traditionally under-counted areas to insure a more complete count this time around.

The database is national in scope, and you can easily map tracts for a particular state, county, city, metro area, or tribal area, and you can search for an area using an individual address. The map is built on a Google Maps interface, and zooming in will change the units mapped from larger units (states, counties, etc) to tracts. You can easily select one of the twelve variables color-coded in the menu to the left of the map, or a Hard to Count index of all the variables.

Formulas for Working With Census ACS Data in Excel / Calc

Friday, June 26th, 2009

After downloading US census data, you often need to reformat it before using it. It’s quite common that you download files where the population is broken down by gender and age, and you need to aggregate the data to get a total or divide a particular characteristic to get a percent total. This is pretty straightforward if you’re working with decennial census data, but data from the American Community Survey (ACS) is a little trickier to deal with since you’re working with estimates that have a margin of error. When creating new data, you also have to calculate what the margin of error is for your derived numbers. I’ll walk through some examples of how you would do this in a spreadsheet (the formulas below will work in either Excel or Calc).

Creating an Aggregate

We’ll use the following data in our example:


We have the total population of people three years and older who are enrolled in school, and a breakdown of this population enrolled in grades 1 through 4 and grades 5 through 8 in a few counties in New York, with margins of error for each data point. Our data is from the 3 year averaged 2005-2007 American Community Survey.

Let’s say we want to create a total for students who are enrolled in grades 1 through 8 for each county. We create a new column and sum the estimates for each county with the formula e3+g3, or sum(e3:g3).

To calculate a margin or error (MOE) for our grade 1 to 8 data, first we have to use the find and replace command to get rid of the “+/-” signs in the MOE column, so our spreadsheet will treat our values as numbers and not text (this is an issue if you downloaded the data as an Excel file – if you download a txt file the +/- is not included). Depending on the dataset you’re working with, you may also need to replace dashes, which represent data that was null or not estimated.

Once the data is cleaned up, we can insert a new column with this formula:


This calculates our new margin of error by squaring the moes for each of our data points, summing the results together, and taking the square root of that sum. In other words,


Once that’s done, you may want to round the new MOE to a whole number.

Creating a Percent Total

Let’s calculate the percentage of the population 3 years and older enrolled in school that are in grades 1 through 8. Based on what we have thus far (I hid the columns E,F,G, and H for grades 1-4 and 5-8 in this screenshot, as we don’t need them):


We insert a new column where we divide our subgroup by the total, as you would expect – I3/C3. In the next column we insert the following formula to create a MOE for our new percent total:


This one’s a little weightier than our last formula. We’re taking the square of our percent total (K3) and the square of the MOE of the total population (D3), multiplying them together, then subtracting that number from the square of the MOE of our subgroup (J3). Then we take the square root of the whole thing, then divide it by our total population (C3). If you’re saying – HUH? Maybe this is clearer:


Finally, we have something like this:


Based on our data, we can say things like “There were approximately 30,556 students enrolled in 1st through 8th grade per year in Dutchess County, NY between 2005 and 2007, plus or minus 1,184 students. An estimated 37% of the population enrolled in school in the county was in the 1st through 8th grade, plus or minus 1%.” The ACS estimates have a 90% confidence interval.

Wrap Up

In this example we worked with aggregating and calculating percentages based on characteristics. We could also use these same formulas to aggregate data by geography, if we wanted to add the characteristics for all the counties together.

For the full documentation on working with ACS data, take a look at the appendix in the Census’ ACS Compass Guide, What General Data Users Need to Know. It provides you with the formulas in their proper statistical notation (for those of you more mathematically inclined than I) and includes formulas for calculating other kinds of numbers, such as ratios and percent change. It does provide you with worked-through examples, but not with spreadsheet formulas. I used their examples when I created formulas the first time around, so I could compare my formula results to their examples to insure that I was getting it right. I’d strongly recommend doing that before you start plugging away with your own data – one misplaced parentheses and you could end up with a different (and incorrect) result.

Mapping ACS Census Data for Urban Areas With PUMAs

Tuesday, December 16th, 2008

The NY Times wrote a story recently based on the new 3 year ACS data that the Census Bureau released a couple weeks ago (see my previous post for details). They created some maps for this story using geography that I would never have thought to use.

Outside of Decennial Census years, it is difficult to map demographic patterns and trends within large cities as you’ll typically get one figure for the entire city and you can’t get a break down for areas within. Data for areas like census tracts and zip codes is not available outside the ten-year census (yet), and large cities exist as single municipal divisions that aren’t subdivided. New York City is an exception, as it is the only city composed of several counties (boroughs) and thus can be subdivided. But the borough data still doesn’t reveal much about patterns within the city.

The NY Times used PUMAS – Public Use Microdata Areas – to subdivide the city into smaller areas and mapped rents and income. PUMAs are aggregations of census tracts and were designed for aggregating and mapping public microdata. Microdata consists of a selection of actual individual responses from the census or survey with the personal identifying information (name, address, etc) stripped away. Researchers can build their own indicators from scratch, aggregate them to PUMAs, and then figure out the degree to which the sample represents the entire population.

Since PUMAs have a large population, the new three-year ACS data is available at the PUMA level. The PUMAs essentially become surrogates for neighborhoods or clusters of neighborhoods, and in fact several NYC agencies have created districts or neighborhoods based on these boundaries for statistical or planning purposes. This wasn’t the original intent for creating or using PUMAs, but it’s certainly a useful application of them.

You can check out the NY Times article and maps here – Census Shows Growing Diversity in New York City (12/9/08). I tested ACS / PUMA mapping out myself by downloading some PUMA shapefiles from the Census Bureau’s Generalized Cartographic Boundaries page, grabbing some of the new annual ACS data from the American Factfinder, and creating a map of Philly. In the map below, you’re looking at 2005-2007 averaged data that shows the percentage of residents who lived in their current home last year. If you know Philly, you can see that the PUMAs do a reasonable job of approximating regions in the city – South Philly, Center City, West Philly, etc.

The problem I ran into here was that data did not exist for all of the PUMAs – in this case, South Philly and half of North Philly had values of zero. According to the footnotes on the ACS site, there were no values for these areas because “no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution”. So even though the PUMA geography is generally available, there still may be cases where data for particular variables for individual geographies is missing.

Just for the heck of it, I tried looking at the annual ACS data which is limited to more populated areas (must have 65k population where 3 year estimates are for areas with at least 20k) and even more data was missing (in this instance, all the areas in the northeast). Even though PUMAs have a minimum population of 100k people, the ACS sampling is county based. So even if the sample size for a county is ideal, they may not have a significant threshold for individual places within a county to compute an estimate. At least, that’s my guess. Regardless, it’s still worth looking at for the city and data you’re interested in.

ACS Data for Philly Pumas

Census Bureau Releases New ACS Data

Wednesday, December 10th, 2008

The Census Bureau released its new American Community Survey data the other day. Three year averages for a variety of socio-economic variables are now available for all geographic areas that have at least 20,000 people. The ACS has been releasing annual data for most of this decade for areas with at least 65,000 people and will continue to do so. They didn’t provide data for smaller areas because the numbers were not as statistically robust. Now that they have three years of data, they can average the numbers for three years and get sound data for areas with a population of at least 20k.

Data for 2005 to 2007 is available now, and like the annual numbers, you’ll get a range of values and a confidence interval. For example, we can say with 90% confidence that the estimated population of Atlantic City, NJ between 2005 and 2007 was 35,770, plus or minus 1,749 people. The Bureau created this estimate based on a sample of 1,379 people in AC.

Next year, the census will release new annual numbers for areas with a population of at least 65k, and will update the three year averages for areas with 20k by adding the newest year of data and dropping the oldest one to calculate a new average.

All of the data is available through the American Factfinder.

If you are looking for population figures for basic indicators (population, race, gender, age, and housing units) for basic geographic areas (states, counties, places, and metro areas), you’ll probably want to consider using estimates from the Bureau’s Population Estimates program instead. Their annual estimates are based on a demographic calculation that factors in births, deaths, and migration, and is not based on a survey (according to that program, Atlantic City had 39,684 residents in 2007 – 4,090 more people than the ACS midrange estimate). If you’re looking for any other kind of data (ethnicity, immigration status, income, poverty, rent, home value, etc) the ACS is your best bet.

By 2010 the Bureau will begin releasing ACS 5 year avearges for all geographic areas. Of course, we’ll also have our next decennial census in 2010. The big change here is that, since we’ll have the ACS churning out data for all areas for every year from that point forward, the Bureau is doing away with the long form (which was sent to one in six households) that was issued in past censuses, and will only collect data using the basic short form, which gets distributed to everyone. For more info on this change, see the Bureau’s Census 2010 info page.

Searching for Foreign Census Data

Saturday, July 5th, 2008

I’ve been looking for census data for various countries, and have visited the usual suspects that aggregate this data – the CIA World Factbook and the United Nations Population Information Network. Other supra-national orgs like the IMF and World Bank also create and compile this info. These are fine sources, particularly if your goal is to look at basic data for several (or all) countries. But if you are studying or writing about one country in particular, it may seem odd to cite the UN, and even odder to cite the CIA. It would be better to go right to the source – the chief statistical agency in that particular country. In all likelihood, this agency would also have more in-depth stats than the aggregators.

But – where is the source? Rather than be left to the mercy of google, where you’ll uncover the obvious suspects and lots of commercial sites and joe-schmoes who republished some data from last decade, visit the US Census Bureau’s list of foreign statistical agencies, which will lead you right to the source.

Assuming you can find some pages with some data (census data isn’t public domain in every country and isn’t necessarily online for free, or at all, in which case you may need to go with some of the aggregate sources), the next obstacle will be overcoming the language barrier. Many countries will publish pages in several languages, including English. Some may publish only limited info in English, or no info in English at all. If you don’t read the lingua franca, you can try a translating tool like Babblefish or the Google Language Tool to translate the page for you. The translation may not be perfect, but it should be good enough where you can figure out what you need (although if the language you are translating doesn’t use the Roman alphabet and Arabic numerals – i.e. 1,2,3 etc, you may have some trouble).

The toughest obstacle to overcome may be the organizational barrier. If you are familiar with the US Census Bureau, you’ll know that it’s a large and complex organization with many subdivisions and datasets (decennial census, acs, population estimates, etc). And despite it’s enormity, it doesn’t collect all socio-economic data (religious affiliation) and may not be the best source for all data (current labor force stats). Well – other countries are just as complicated, so be wary!

Another strategy would be to visit Wikipedia – not to cite as a source, but to find what sources they use. You’ll find many country specific articles that cite the CIA Factbook or the UN, but some of the more detailed and well written ones do cite reports written by the statistical agencies for the country in question, often with a link to the page or report. If you have access to some library databases, like Gale Virtual Reference, they will (usually) cite sound references as well. Happy hunting!

Census Cartographic Boundary Files

Tuesday, May 13th, 2008

I’ve worked with these files a number of times and just used them again recently, and thought I would share the process you need to go through to prepare them for use in ArcGIS, as they are not “ready to go”. If you are not using ArcGIS, you can still follow these general steps using the specific tools that your software provides.

I would opt for the Cartographic Boundary Files (CBF) over the TIGER shapefiles (that the census just released) when making a national-level thematic map, as the generalization of the CBF makes the boundaries look cleaner at this scale. Also, the generalized files show land boundaries along coasts, while the TIGER files show the legal boundaries that extend into the water. The latter are not great for thematic maps, particularly as the Great Lakes states look distorted (as their boundaries extend into the lakes).

I’ll use the state and equivalent areas as an example, as those are the files I’ve just worked with. After downloading and unzipping the national-level shapefiles, you’ll need to take the following steps in the ArcCatalog:

  • Define the projection, as the files are undefined. According to metadata on the website, the files are in simple NAD83. In the ArcToolbox, the tool is under Data Management Tools, Projections and Transformations, Define Projection. Once you launch the tool, you will need to select the North American Datum 1983 as the coordinate system, which is stored under Geographic Coordinate Systems for North America.

  • After you define the projection, the next step is to reproject the layer to another projection that is more suitable for displaying the US. If you are making a map for basic presentation, a projected coordinate system like Albers Equal Area Conic would be a good choice (most atlases and maps of the continental US use this projection). Alaska, Hawaii, and Puerto Rico will be distorted, but we will be able to give them a separate data frame in ArcMap with their own projection later on. The tool is in the ArcToolbox under Data Management Tools, Projections and Transformations, Features, Project. Note that this is a DIFFERENT tool than the one we used in the last step. Define Projection is used to tell ArcGIS what projection a file is in if it is undefined, while Feature, Project is used to reproject a vector file from one projection to another. A file MUST have a defined projection BEFORE you can reproject it.

  • The CBF’s are stored as single part features, which means that each distinct polygon will have its own record in the attribute table. For example, each of the Hawaiian Islands will have its own record in the table. This is a problem if you plan to join state-level data to your shapefile, as the data from the join will be repeated for each record. So if you have a table with population data for each of the states and you join it to the shapefile, each individual Hawaiian island will be assigned the total population of Hawaii. If you run statistics on your data, you’ll get inflated counts. To avoid this, we need to convert the CBF to a multi-part feature, where each state will have only one record in the attribute table. To do this, we use the Dissolve tool under Data Management Tools, Generalization, Dissolve. The Dissolve fields will be the basis for dissolving the individual parts of the states into one state feature. In this case, we would choose the STATE field (FIPS code) and NAME field as the dissolve field, which will give us one feature for each state (if we chose DIVISION or REGION as the field, we would aggregate the polygons to create those larger geographic areas).

  • The next step is to decide whether you want to keep your shapefile as an independent file, or bring it into a geodatabase. The geodatabase is handy if you have lots of other tables and shapefiles that you are using in your project. Right-click in the catalog tree to create a new personal or file geodatabase. Then select your shapefile and right click to export it to your new geodatabase.

  • Whether you stick with a shapefile or go with a geodb, the next step is to open ArcMap and add your file to it. Now, you’ll have to make a decision about Puerto Rico. If you have a dataset where you want to map data for it, then you need not do anything. Since I am making presidential election maps and Puerto Rico doesn’t vote in the electoral college, I needed to delete it. To do so, go into an Edit mode under the Editor toolbar, select PR in the attribute table or map, delete it, then save. You’ll be left with a file for the 50 states and DC.

  • At this point, if you are going to join table data to your features, do so. Your features have a FIPS code, so you can use that to do the join (NEVER use names for joining – stick with codes). I often will add a new column to my features and plug in the two letter postal abbreviations, since they are commonly used for identifying states.

  • National Map With Multiple Data LayersOnce you’ve joined your data and are ready to make a finished map, the last step will be adding two new data frames for Alaska and Hawaii. Since AK and HI are distant from the continental US, it is better to create separate frames for all three rather than trying to display them in one. Copy your current data layer (not the features – the layer which is indicated by the yellow rectangles layered on top of each other) in the table of contents, and paste it below. Activate that layer, and name the layer Alaska. Then right click on the properties for the data layer and go to the coordinates tab. Modify the coordinate system of the data layer by choosing Alaska Albers Equal Area Conic. This will reproject the data on the fly and will display Alaska in a more appropriate projection (as the continental projection distorts it). Then, in the Layout View, you can resize the Alaska data frame and zoom in to focus just on AK. Repeat these steps for Hawaii (and Puerto Rico if you’re mapping it), and you’ll have a good looking US map!

Hands-on GIS Census Workshop

Thursday, April 10th, 2008

I’ve posted the tutorials from the workshop I gave the other day for the NYCRDC. I’ve created a Resources page to hold resources hosted on this site – you can find them there, along with the datasets.

Overall I think it went rather well, but it was way too much material for a three hour workshop! We covered the intro slides, and Part I (Intro to GIS and ArcMap). I did an abridged version of Parts II (Intro to Layout View) and III (finding and downloading data, ArcCatalog, preprocessing in Excel) rather than doing all of II and none of III. The third part covers a lot that the standard ArcGIS texts gloss over (or leave out all together), so I really wanted to cover some of that material. But I couldn’t omit any of the basics in the first two parts, because you really need to know them before you can delve further (and understand why you’re delving). Ahhh, the steep learning curve of GIS!

Excel COUNTIF Function to Clean ACS Data

Wednesday, March 19th, 2008

I’ve been preparing a GIS workshop for the New York Census Research Data Center’s 2nd Annual Workshop series, and have dug up some useful tips as I’ve assembled my materials. Here’s one of them:

I have a data table from the Census Bureau’s 2006 Annual Community Survey (ACS) in Excel which contains some data for Metropolitan and Micropolitan Areas. Now, I have a shapefile of Metropolitan Areas that I would like to join this data table to, but I would like to get rid of the records for the Micropolitan Areas in the data table. Unfortunately, the data table does not have a field that indicates whether an area is a Metro or Micro. Instead, this information is embedded in the name field, like “Akron, OH Metro Area” which means there is no way to sort the table to weed out the Micro Areas.

COUNTIF function to the rescue! I inserted a new column and typed in the formula:

=COUNTIF(D3, “*Metro*”)

If the formula sees the word Metro anywhere in the GEO_NAME, it counts it as a one in the new column, otherwise it counts it as zero (by default, the zeros will be the Micro areas). Copied and pasted the formula all the way down, then copied and pasted the formula column over top of itself using Paste Special (to replace the formulas with the actual values), and voila! Sorted by this column, and deleted all the records with a zero in the field (the Micro areas).


I’ve done something like this before in Microsoft Access using LIKE, but Excel doesn’t include this function. I knew about COUNTIF but didn’t connect the dots. I discovered I could apply it after stumbling across this useful post at Daily Dose of Excel.

Lastly, before you can bring this table into GIS, you have to delete that second header row (you can only have one column heading – the rest of the rows are assumed to be data). While the codes in the first row are cryptic, they are concise. The headings in the second row are too long and contain spaces, which will cause problems when you import the table into GIS.

NOTE – If you’re using Open Office’s Calc instead of Excel, and you have enabled regular expressions under the Tools – Options – Calc – Calculate menu, the same function would look like this:


Copyright © 2017 Gothos. All Rights Reserved.
No computers were harmed in the 0.441 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.