Archive for the ‘Data Sources’ Category.

GIS Data: UNSDI, gData, CEGRP, AIMS

I’ve stumbled across a few good sites for GIS data lately. Check these out:

UNSDI-NCO: The United Nations Spatial Data Infrastructure site, maintained by the Netherlands Coordination Office. They have many global datasets as well as country-specific ones, often for developing countries where data is hard to come by. Includes boundaries, roads, infrastructure, and natural features. Click on the Datasets link under the Categories menu to see the list, then click on the feature of you choice. You’ll have to scroll through the metadata to the Distribution Info element to get to a download link. Not all of the datasets are available for public download.

gData: This site is housed at Berkeley as part of the Biogeomancer Project, whose goal is to share data on biodiversity. You can download boundaries, hydrography, infrastructure, topography, and climate data in vector and raster formats for any country in the world. The data is aggregated, and in some cases improved, from many public sources. Administrative boundaries include 1st, 2nd, and often 3rd level divisions. A great, comprehensive source.

CEGRP: China Earthquake Geosptial Research Portal, housed at Harvard. The goal of the site is to gather and distribute geospatial data in response to the earthquake that hit Sichuan China in May 2008. Vector and raster layers for all of China and for this particular region where the earthquake hit.

AIMS: Afghanistan Information Management Services. A non-profit group located in Afghanistan that has created and maintains a geospatial infrastructure to support the government. Vector datasets for the entire country and the city of Kabul are available for download. They also offer a number of static pdf maps.

Adding Long / Lat XY Data to ArcMap

Here’s a tutorial I’ve been meaning to write: adding a table of longitude and latitude coordinates to ArcMap and turning them into features. For this example, I’ll be using place names from the GEOnet Names Server country files. The US National Geospatial Intelligence Agency has a pretty extensive list of geographic features for each country, with coordinates in many formats, including longitude and latitude in decimal degrees. I’ll use Botswana in southern Africa as an example, as it has a small record set and because I have some admin boundaries handy that I’ve downloaded from SAHIMS.

  • Download the file from the GNS and unzip it. It is a tab-delimited text file. If you like, you can open it in Excel or another spreadsheet to see what it looks like. This works fine for this example, but won’t work for larger or more populated countries because the files will exceed the maximum number of records that a spreadsheet can handle (65k). You’ll need to import the file into a database (Access for example) if you want to take a look in those cases. In either event, you’ll be able to add the text file directly to ArcMap, so no worries.
  • Add XY Data Open ArcMap and under the Tools menu, select Add XY Data. In the dialog box, you’ll select the file that contains your XY coordinates. Choose the text file you’ve downloaded. ArcMap will then search through the fields and look for appropriate ones to add as X and Y fields. In this case, it should correctly choose LONG for X and LAT for Y. If Arc couldn’t figure it out, you would have to specify which columns have the coordinates. Longitude is ALWAYS the X coordinate, and Latitude is ALWAYS the Y. Finally, you’ll select a projection. Choose the standard geographic coordinate system WGS 1984, which is usually a safe bet when adding long/lat data from most sources.
  • Add XY Dialog BoxHit OK, and Arc will plot the coordinates (after you click through the warning message). In this example, it looks like there is one wayward point, way to the north. When you see something like this, it often means that one of the coordinates is missing a minus sign: latitudes below the equator are negative, as are longitudes east of the international date line and west of the prime meridian. If you use the identity tool, you’ll see that the minus sign for latitude for this wayward point is missing. The easiest thing to do would be to go back into the text file, edit it, and add it to ArcMap again.
  • Even though Arc has plotted the points, they still don’t exist as features (remember the warning message? That’s essentially what it was saying). Select the plotted points in the Table of Contents, right-click, select Data, and select Export. Export the points out as a new shapefile or a feature class in a geodatabase. Then add the new features to the map.
  • At this point, it may be helpful to have a frame of reference for all of these points. Get your hands on some administrative layers, like country boundaries. I downloaded the outline of Botswana from SAHIMS. This step usually requires projecting and reprojecting, as you’ll need to get your points layer to match the projection of the other files you’re working with. I always use the ArcToolbox within ArcCatalog to fiddle with projections and then add the finished files to a new, blank map in ArcMap. In my case, the Botswana boundary was undefined - I had to consult the metadata from their website to figure out what the projection is (NAD 1927) and then define it using the ArcToolbox (Data Management Tools, Projections and Transformations, Define Projection). Then, I had to convert the Botswana points layer from WGS 1984 to match the boundary’s NAD 1927 projection (using Data Management Tools, Projections and Transformations, Feature, Project).
  • Plotted points with boundaryAdd the projected boundary and reprojected points to your map. Many of these points are point features (villages, towns, farms, mountain peaks), while others represent the geographic centers of lines (roads, rivers) or areas (administrative areas, parks, reserves). You’ll probably want to extract certain kinds of features. At this point, you’ll want to take a look at the attribute table for the points file and consult the NGS description for the names files. The description will tell you what each of the data columns represents and what all of the codes mean. The FC field will come in quite handy here, as it designates categories for each feature. So if we wanted to extract populated places, under the Selection Menu in ArcMap we could do a Select by Attribute where the field FC is equal to P, which is the code for populated place features. Once they are selected, you can do a Data, Export to create a new shapefile with just those features.
  • Alternatives do abound here. If you prefer, you could do a lot of the work of editing and creating feature subsets within a geodatabase. You can also follow these same, general procedures using open source tools (I believe that QGIS has a tool for adding XY data). And while we’re discussing a specific example here, the same basic steps would apply for any XY dataset.

Searching for Foreign Census Data

I’ve been looking for census data for various countries, and have visited the usual suspects that aggregate this data - the CIA World Factbook and the United Nations Population Information Network. Other supra-national orgs like the IMF and World Bank also create and compile this info. These are fine sources, particularly if your goal is to look at basic data for several (or all) countries. But if you are studying or writing about one country in particular, it may seem odd to cite the UN, and even odder to cite the CIA. It would be better to go right to the source - the chief statistical agency in that particular country. In all likelihood, this agency would also have more in-depth stats than the aggregators.

But - where is the source? Rather than be left to the mercy of google, where you’ll uncover the obvious suspects and lots of commercial sites and joe-schmoes who republished some data from last decade, visit the US Census Bureau’s list of foreign statistical agencies, which will lead you right to the source.

Assuming you can find some pages with some data (census data isn’t public domain in every country and isn’t necessarily online for free, or at all, in which case you may need to go with some of the aggregate sources), the next obstacle will be overcoming the language barrier. Many countries will publish pages in several languages, including English. Some may publish only limited info in English, or no info in English at all. If you don’t read the lingua franca, you can try a translating tool like Babblefish or the Google Language Tool to translate the page for you. The translation may not be perfect, but it should be good enough where you can figure out what you need (although if the language you are translating doesn’t use the Roman alphabet and Arabic numerals - i.e. 1,2,3 etc, you may have some trouble).

The toughest obstacle to overcome may be the organizational barrier. If you are familiar with the US Census Bureau, you’ll know that it’s a large and complex organization with many subdivisions and datasets (decennial census, acs, population estimates, etc). And despite it’s enormity, it doesn’t collect all socio-economic data (religious affiliation) and may not be the best source for all data (current labor force stats). Well - other countries are just as complicated, so be wary!

Another strategy would be to visit Wikipedia - not to cite as a source, but to find what sources they use. You’ll find many country specific articles that cite the CIA Factbook or the UN, but some of the more detailed and well written ones do cite reports written by the statistical agencies for the country in question, often with a link to the page or report. If you have access to some library databases, like Gale Virtual Reference, they will (usually) cite sound references as well. Happy hunting!

Census Cartographic Boundary Files

I’ve worked with these files a number of times and just used them again recently, and thought I would share the process you need to go through to prepare them for use in ArcGIS, as they are not “ready to go”. If you are not using ArcGIS, you can still follow these general steps using the specific tools that your software provides.

I would opt for the Cartographic Boundary Files (CBF) over the TIGER shapefiles (that the census just released) when making a national-level thematic map, as the generalization of the CBF makes the boundaries look cleaner at this scale. Also, the generalized files show land boundaries along coasts, while the TIGER files show the legal boundaries that extend into the water. The latter are not great for thematic maps, particularly as the Great Lakes states look distorted (as their boundaries extend into the lakes).

I’ll use the state and equivalent areas as an example, as those are the files I’ve just worked with. After downloading and unzipping the national-level shapefiles, you’ll need to take the following steps in the ArcCatalog:

  • Define the projection, as the files are undefined. According to metadata on the website, the files are in simple NAD83. In the ArcToolbox, the tool is under Data Management Tools, Projections and Transformations, Define Projection. Once you launch the tool, you will need to select the North American Datum 1983 as the coordinate system, which is stored under Geographic Coordinate Systems for North America.
  •  

  • After you define the projection, the next step is to reproject the layer to another projection that is more suitable for displaying the US. If you are making a map for basic presentation, a projected coordinate system like Albers Equal Area Conic would be a good choice (most atlases and maps of the continental US use this projection). Alaska, Hawaii, and Puerto Rico will be distorted, but we will be able to give them a separate data frame in ArcMap with their own projection later on. The tool is in the ArcToolbox under Data Management Tools, Projections and Transformations, Features, Project. Note that this is a DIFFERENT tool than the one we used in the last step. Define Projection is used to tell ArcGIS what projection a file is in if it is undefined, while Feature, Project is used to reproject a vector file from one projection to another. A file MUST have a defined projection BEFORE you can reproject it.
  •  

  • The CBF’s are stored as single part features, which means that each distinct polygon will have its own record in the attribute table. For example, each of the Hawaiian Islands will have its own record in the table. This is a problem if you plan to join state-level data to your shapefile, as the data from the join will be repeated for each record. So if you have a table with population data for each of the states and you join it to the shapefile, each individual Hawaiian island will be assigned the total population of Hawaii. If you run statistics on your data, you’ll get inflated counts. To avoid this, we need to convert the CBF to a multi-part feature, where each state will have only one record in the attribute table. To do this, we use the Dissolve tool under Data Management Tools, Generalization, Dissolve. The Dissolve fields will be the basis for dissolving the individual parts of the states into one state feature. In this case, we would choose the STATE field (FIPS code) and NAME field as the dissolve field, which will give us one feature for each state (if we chose DIVISION or REGION as the field, we would aggregate the polygons to create those larger geographic areas).
  •  

  • The next step is to decide whether you want to keep your shapefile as an independent file, or bring it into a geodatabase. The geodatabase is handy if you have lots of other tables and shapefiles that you are using in your project. Right-click in the catalog tree to create a new personal or file geodatabase. Then select your shapefile and right click to export it to your new geodatabase.
  •  

  • Whether you stick with a shapefile or go with a geodb, the next step is to open ArcMap and add your file to it. Now, you’ll have to make a decision about Puerto Rico. If you have a dataset where you want to map data for it, then you need not do anything. Since I am making presidential election maps and Puerto Rico doesn’t vote in the electoral college, I needed to delete it. To do so, go into an Edit mode under the Editor toolbar, select PR in the attribute table or map, delete it, then save. You’ll be left with a file for the 50 states and DC.
  •  

  • At this point, if you are going to join table data to your features, do so. Your features have a FIPS code, so you can use that to do the join (NEVER use names for joining - stick with codes). I often will add a new column to my features and plug in the two letter postal abbreviations, since they are commonly used for identifying states.
  •  

  • National Map With Multiple Data LayersOnce you’ve joined your data and are ready to make a finished map, the last step will be adding two new data frames for Alaska and Hawaii. Since AK and HI are distant from the continental US, it is better to create separate frames for all three rather than trying to display them in one. Copy your current data layer (not the features - the layer which is indicated by the yellow rectangles layered on top of each other) in the table of contents, and paste it below. Activate that layer, and name the layer Alaska. Then right click on the properties for the data layer and go to the coordinates tab. Modify the coordinate system of the data layer by choosing Alaska Albers Equal Area Conic. This will reproject the data on the fly and will display Alaska in a more appropriate projection (as the continental projection distorts it). Then, in the Layout View, you can resize the Alaska data frame and zoom in to focus just on AK. Repeat these steps for Hawaii (and Puerto Rico if you’re mapping it), and you’ll have a good looking US map!
  •  

Census Update: Shapefiles, ACS, Estimates

I’m in Boston at the Association of American Geographers (AAG) annual conference this week, and attended a great series that explored what the Census Bureau is currently up to. Here are some hi-lites:

The Bureau is now providing the TIGER line files in shapefile format! Before, it was only possible to get generalized cartographic boundary files directly from the bureau in shapefile format. Now, you can get the boundaries in their original detail from a public domain source. Includes 2000 census geography plus some updates for 2007 for states, counties, metros, places, zips, districts, pumas, and more. Currently, it does not include tracts, block groups, or blocks.

Census TIGER Shapefile Download

The 2008 release of the American Community Survey (ACS) will include two datasets. There will be the annual numbers for geographies that have over 65,000 people, and for the first time there will be three year averages for geographies that have over 20,000 people. In each succeeding year, this average will be recalculated by adding in the most recent year and dropping the oldest one. Data for geographies with less than 20,000 people will become available in 2011 and will be based on five year averages. The good news is that, from that point forward, data will be available for all areas every single year. The bad news is that the long form (the one in six sample of households taken in the decennial census) is being discontinued and will not be conducted in 2010. Census 2010 will consist solely of the short form questions (the 100% count that covers the basic demographic variables). The ACS will serve as the replacement to the long form, but in most cases the data will not be suitable for making historical comparisons (i.e. comparing 2010 to 2000).

Bureau reps gave an overview of their Population Estimates program. Unlike the ACS which is survey based, estimates are calculated using a cohort component analysis that accounts for births, deaths, and migration each year. Estimates are calculated nationally and at the county level. The county numbers are used to create estimates for each state, which are then adjusted to fit national numbers. Data is available for total population, race, age (broken down by gender for each year at the national level and for five year groups below that) and housing units. Some data is also available for metropolitan areas (which are county based) and county subdivisions (for total population only).

The Bureau gave an overview of Dataferret, which is a tool for data power users. It is available in two versions, as download-able software or as a browser-based JAVA applet, and allows users to gather and process data from several different government sources (unlike the Amerivan Factfinder, which focuses solely on downloading census data).

Finally, things are ramping up for the 2010 Decennial Census. The bureau is updating its master address files and has almost finished recalibrating the TIGER files for each county, so that boundaries are precise within a maximum limit of 70 meters.

GIS Data for Africa

I received an email floating around the Maps-L listserve the other day with an announcement about a website that had African GIS data. Since free, global GIS data can be hard to find, particularly for Africa, I thought I’d re-announce it here.

The Southern African Human-development Information Management Network (SAHIMS) is a UN affiliated organization that provides humanitarian and disaster relief for countries in southern Africa. They have a pretty comprehensive collection of GIS data on their site: http://www.sahims.net. The default GIS page provides layers that cover the entire region, plus some layers for Tanzania (which is technically outside the region). But you can use the menu on the left of the page to navigate to country-specific data for: Angola, Botswana, Lesotho, Madagascar, Malawi, Mozambique, Namibia, Swaziland, Zambia, and Zimbabwe. There doesn’t appear to be any data for South Africa.

The layers are in shapefile format and include administrative boundaries (including sub-national boundaries), transportation, hydrological features, climate, food, and disaster related layers. The availability and details of each layer vary by country.

Botswana

I downloaded a few of the layers, and many were missing spatial reference information. However they do provide decent metadata on their site, and you can use it to define the coordinate and projection system for each layer.