Posts Tagged ‘census’

Searching for Foreign Census Data

Saturday, July 5th, 2008

I’ve been looking for census data for various countries, and have visited the usual suspects that aggregate this data – the CIA World Factbook and the United Nations Population Information Network. Other supra-national orgs like the IMF and World Bank also create and compile this info. These are fine sources, particularly if your goal is to look at basic data for several (or all) countries. But if you are studying or writing about one country in particular, it may seem odd to cite the UN, and even odder to cite the CIA. It would be better to go right to the source – the chief statistical agency in that particular country. In all likelihood, this agency would also have more in-depth stats than the aggregators.

But – where is the source? Rather than be left to the mercy of google, where you’ll uncover the obvious suspects and lots of commercial sites and joe-schmoes who republished some data from last decade, visit the US Census Bureau’s list of foreign statistical agencies, which will lead you right to the source.

Assuming you can find some pages with some data (census data isn’t public domain in every country and isn’t necessarily online for free, or at all, in which case you may need to go with some of the aggregate sources), the next obstacle will be overcoming the language barrier. Many countries will publish pages in several languages, including English. Some may publish only limited info in English, or no info in English at all. If you don’t read the lingua franca, you can try a translating tool like Babblefish or the Google Language Tool to translate the page for you. The translation may not be perfect, but it should be good enough where you can figure out what you need (although if the language you are translating doesn’t use the Roman alphabet and Arabic numerals – i.e. 1,2,3 etc, you may have some trouble).

The toughest obstacle to overcome may be the organizational barrier. If you are familiar with the US Census Bureau, you’ll know that it’s a large and complex organization with many subdivisions and datasets (decennial census, acs, population estimates, etc). And despite it’s enormity, it doesn’t collect all socio-economic data (religious affiliation) and may not be the best source for all data (current labor force stats). Well – other countries are just as complicated, so be wary!

Another strategy would be to visit Wikipedia – not to cite as a source, but to find what sources they use. You’ll find many country specific articles that cite the CIA Factbook or the UN, but some of the more detailed and well written ones do cite reports written by the statistical agencies for the country in question, often with a link to the page or report. If you have access to some library databases, like Gale Virtual Reference, they will (usually) cite sound references as well. Happy hunting!

Census Cartographic Boundary Files

Tuesday, May 13th, 2008

I’ve worked with these files a number of times and just used them again recently, and thought I would share the process you need to go through to prepare them for use in ArcGIS, as they are not “ready to go”. If you are not using ArcGIS, you can still follow these general steps using the specific tools that your software provides.

I would opt for the Cartographic Boundary Files (CBF) over the TIGER shapefiles (that the census just released) when making a national-level thematic map, as the generalization of the CBF makes the boundaries look cleaner at this scale. Also, the generalized files show land boundaries along coasts, while the TIGER files show the legal boundaries that extend into the water. The latter are not great for thematic maps, particularly as the Great Lakes states look distorted (as their boundaries extend into the lakes).

I’ll use the state and equivalent areas as an example, as those are the files I’ve just worked with. After downloading and unzipping the national-level shapefiles, you’ll need to take the following steps in the ArcCatalog:

  • Define the projection, as the files are undefined. According to metadata on the website, the files are in simple NAD83. In the ArcToolbox, the tool is under Data Management Tools, Projections and Transformations, Define Projection. Once you launch the tool, you will need to select the North American Datum 1983 as the coordinate system, which is stored under Geographic Coordinate Systems for North America.
  •  

  • After you define the projection, the next step is to reproject the layer to another projection that is more suitable for displaying the US. If you are making a map for basic presentation, a projected coordinate system like Albers Equal Area Conic would be a good choice (most atlases and maps of the continental US use this projection). Alaska, Hawaii, and Puerto Rico will be distorted, but we will be able to give them a separate data frame in ArcMap with their own projection later on. The tool is in the ArcToolbox under Data Management Tools, Projections and Transformations, Features, Project. Note that this is a DIFFERENT tool than the one we used in the last step. Define Projection is used to tell ArcGIS what projection a file is in if it is undefined, while Feature, Project is used to reproject a vector file from one projection to another. A file MUST have a defined projection BEFORE you can reproject it.
  •  

  • The CBF’s are stored as single part features, which means that each distinct polygon will have its own record in the attribute table. For example, each of the Hawaiian Islands will have its own record in the table. This is a problem if you plan to join state-level data to your shapefile, as the data from the join will be repeated for each record. So if you have a table with population data for each of the states and you join it to the shapefile, each individual Hawaiian island will be assigned the total population of Hawaii. If you run statistics on your data, you’ll get inflated counts. To avoid this, we need to convert the CBF to a multi-part feature, where each state will have only one record in the attribute table. To do this, we use the Dissolve tool under Data Management Tools, Generalization, Dissolve. The Dissolve fields will be the basis for dissolving the individual parts of the states into one state feature. In this case, we would choose the STATE field (FIPS code) and NAME field as the dissolve field, which will give us one feature for each state (if we chose DIVISION or REGION as the field, we would aggregate the polygons to create those larger geographic areas).
  •  

  • The next step is to decide whether you want to keep your shapefile as an independent file, or bring it into a geodatabase. The geodatabase is handy if you have lots of other tables and shapefiles that you are using in your project. Right-click in the catalog tree to create a new personal or file geodatabase. Then select your shapefile and right click to export it to your new geodatabase.
  •  

  • Whether you stick with a shapefile or go with a geodb, the next step is to open ArcMap and add your file to it. Now, you’ll have to make a decision about Puerto Rico. If you have a dataset where you want to map data for it, then you need not do anything. Since I am making presidential election maps and Puerto Rico doesn’t vote in the electoral college, I needed to delete it. To do so, go into an Edit mode under the Editor toolbar, select PR in the attribute table or map, delete it, then save. You’ll be left with a file for the 50 states and DC.
  •  

  • At this point, if you are going to join table data to your features, do so. Your features have a FIPS code, so you can use that to do the join (NEVER use names for joining – stick with codes). I often will add a new column to my features and plug in the two letter postal abbreviations, since they are commonly used for identifying states.
  •  

  • National Map With Multiple Data LayersOnce you’ve joined your data and are ready to make a finished map, the last step will be adding two new data frames for Alaska and Hawaii. Since AK and HI are distant from the continental US, it is better to create separate frames for all three rather than trying to display them in one. Copy your current data layer (not the features – the layer which is indicated by the yellow rectangles layered on top of each other) in the table of contents, and paste it below. Activate that layer, and name the layer Alaska. Then right click on the properties for the data layer and go to the coordinates tab. Modify the coordinate system of the data layer by choosing Alaska Albers Equal Area Conic. This will reproject the data on the fly and will display Alaska in a more appropriate projection (as the continental projection distorts it). Then, in the Layout View, you can resize the Alaska data frame and zoom in to focus just on AK. Repeat these steps for Hawaii (and Puerto Rico if you’re mapping it), and you’ll have a good looking US map!
  •  

Hands-on GIS Census Workshop

Thursday, April 10th, 2008

I’ve posted the tutorials from the workshop I gave the other day for the NYCRDC. I’ve created a Resources page to hold resources hosted on this site – you can find them there, along with the datasets.

Overall I think it went rather well, but it was way too much material for a three hour workshop! We covered the intro slides, and Part I (Intro to GIS and ArcMap). I did an abridged version of Parts II (Intro to Layout View) and III (finding and downloading data, ArcCatalog, preprocessing in Excel) rather than doing all of II and none of III. The third part covers a lot that the standard ArcGIS texts gloss over (or leave out all together), so I really wanted to cover some of that material. But I couldn’t omit any of the basics in the first two parts, because you really need to know them before you can delve further (and understand why you’re delving). Ahhh, the steep learning curve of GIS!

Excel COUNTIF Function to Clean ACS Data

Wednesday, March 19th, 2008

I’ve been preparing a GIS workshop for the New York Census Research Data Center’s 2nd Annual Workshop series, and have dug up some useful tips as I’ve assembled my materials. Here’s one of them:

I have a data table from the Census Bureau’s 2006 Annual Community Survey (ACS) in Excel which contains some data for Metropolitan and Micropolitan Areas. Now, I have a shapefile of Metropolitan Areas that I would like to join this data table to, but I would like to get rid of the records for the Micropolitan Areas in the data table. Unfortunately, the data table does not have a field that indicates whether an area is a Metro or Micro. Instead, this information is embedded in the name field, like “Akron, OH Metro Area” which means there is no way to sort the table to weed out the Micro Areas.

COUNTIF function to the rescue! I inserted a new column and typed in the formula:

=COUNTIF(D3, “*Metro*”)

If the formula sees the word Metro anywhere in the GEO_NAME, it counts it as a one in the new column, otherwise it counts it as zero (by default, the zeros will be the Micro areas). Copied and pasted the formula all the way down, then copied and pasted the formula column over top of itself using Paste Special (to replace the formulas with the actual values), and voila! Sorted by this column, and deleted all the records with a zero in the field (the Micro areas).

Excel_COUNTIF_ACS

I’ve done something like this before in Microsoft Access using LIKE, but Excel doesn’t include this function. I knew about COUNTIF but didn’t connect the dots. I discovered I could apply it after stumbling across this useful post at Daily Dose of Excel.

Lastly, before you can bring this table into GIS, you have to delete that second header row (you can only have one column heading – the rest of the rows are assumed to be data). While the codes in the first row are cryptic, they are concise. The headings in the second row are too long and contain spaces, which will cause problems when you import the table into GIS.

NOTE – If you’re using Open Office’s Calc instead of Excel, and you have enabled regular expressions under the Tools – Options – OpenOffice.org Calc – Calculate menu, the same function would look like this:

=COUNTIF(D3;”.*Metro.*”)


Copyright © 2012 Gothos. All Rights Reserved.
No computers were harmed in the 0.327 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.