Introduction to GIS and Census Mapping: Part III
by Frank Donnelly, Geospatial Data Librarian, CUNY Baruch College. Version 1.12
Downloading, Organizing, Processing, and Querying Data
Downloading Census Data
In Parts I and II of this tutorial, you were introduced to GIS and to the ArcMap interface of ArcGIS 9.2. We dove right into the software and started making maps using a pre-assembled dataset from the American Community Survey (ACS). In reality, you will seldom be this fortunate. The dataset that we worked with required a significant amount of pre-processing before it reached its present state.
In this part of the tutorial, we will truly start from the beginning, and you will learn where you can download data and what you will need to do to prepare it for use in GIS. You will also be introduced to the ArcCatalog module of ArcGIS, which will help you organize and prepare your data. Once we have the data formatted and running in ArcGIS, we'll take a look at some additional techniques for querying your data. We will also revisit steps you took in Parts I and II for working with your data and making a finished map.
For any GIS project, you will need two components: actual GIS features and attribute data. We previously worked with a geodatabase which had both of these components in one package. However, in most cases you will need to gather these two components from separate sources and bring them together. Any project you work on will be defined (or limited) by the availability of the features you want to work with, the availability of the data, and the availability of the data at the specific level of geography that you want.
For example, as of 2008, the American Community Survey data that we worked with in the previous modules is available for states, metropolitan areas, and several counties. If you wanted to do a detailed analysis within a city or neighborhood, you would not be able to use the ACS, because the data is not available for geographies below the previously mentioned ones. The Decennial Census is available at more detailed levels, such as the tract, block group, and block levels. However, the latest data is from the year 2000, and is getting old. These are the kinds of issues you will have to weigh as you are thinking about your project.
Regardless of whether you use the ACS or the Decennial Census, you will download the data from the Census Bureau's homepage, and the process for doing so for either dataset will be similar. Let's download some data so you can see what this process looks like.
We are going to download data on poverty from the 2000 Census at the tract level for New York City. This will allow us to examine the distribution of poverty within the city. Let us begin by going to the Census Bureau's homepage at www.census.gov.
The Census Bureau offers a bewildering array of datasets and resources. To get to the download area for the datasets, click on the American Factfinder link in the left-hand column. This will bring you to the page depicted in Figure 2.
From this page, we'll have to choose a dataset that we want to access. Links to the Decennial Census, the American Community Survey, Population Estimates, and many other datasets are located on this page. Click the get data link for the Decennial Census to proceed to the page in Figure 3.
By default, we will be at the page for the 2000 Census, indicated by the blue tab in the upper left-hand corner. The 1990 Census is also available on another tab. Scroll down and you will see the options for the various datasets within the 2000 Census.
What are all of these, and which do you choose? Unlike the ACS, the Decennial Census is an actual 100% count of the population. Every person in the United States fills out a census form, known as the Short Form, and answers questions that cover basic demographic topics such as age, gender, race, household relationships, housing units, occupancy, and tenure. This data is tabulated and stored in the Summary File 1 (SF1) dataset, and the data is available down to the census block level. SF1 is the first option in the list.
In addition to the Short Form, the bureau sends out another form to 1 out of every 6 households. This form is known as the Long Form, and the data for this form is stored in Summary File 3 (SF3). SF3 includes data on ethnicity, employment status, place of birth, income, poverty, housing unit values and age, plus all of the variables stored in SF1. The data is available only down to the block group level. Even though the total values are based on a sample, the sample is so large that the bureau does not report the margin of error (unlike the ACS data).
In the course of your work, you will probably use data from SF1 and SF3 most frequently. Each dataset has a companion (SF2 and SF4) which breaks all of the data down into finer categories based on race. All of the SF datasets include the 50 states, the District of Columbia, and Puerto Rico. There are separate datasets that cover Native American areas and overseas territories.
! The 2010 Census will consist soley of the 100% count taken in the Short Form (SF1 and SF2). The Long Form (SF3 and SF4) will be discontinued, and the ACS will replace it as the data source for the Long Form variables.
Data on poverty is located within SF3. Scroll down, check the radio button for SF3, and click the Detailed Tables link on the right. This will bring you to the data selection screen. There is a two-step process here: first, you will select the level of geography that you are interested in, and second, you will select the data elements that you want.
On the geography screen, you will be on the List selection tab by default. Click on the Geo Within Geo tab on the far right. This will allow you to select all of the geographies within another geography - in this case, all census tracts within a specific place (New York City). If you use the default List tab, you would have to select all of the census tracts one by one - not a desirable option!
On the Geo Within a Geo tab screen, you want to select all census tracts that are fully or partially contained within a place (for a summary of the different census geographies, view these maps for nested and un-nested features). You will choose the State of New York first, and then scroll through the list of places to select New York City. Select the option to add all census tracts, and hit the add button to add them all to your query. Your screen should look like Figure 5.
Click next, and you will proceed to the Table selection screen. Select the Total Population table that is near the top of the window and add it your selection, and then scroll down until you see the Poverty Status in 1999 by Age table (P87) and add it to your selection. Each table will contain several data elements. Your screen should resemble Figure 6.
Determining which data tables you need for your particular project can be difficult - oftentimes you will have to select the tables that sound like they would work best, download them, and evaluate whether or not it is what you need. You can also look for tables based on subject or keyword by clicking on one of the tabs above the table list.
Click the Show Result button to save your table. The result that you see on the screen may not be what you expect (Figure 7), but don't panic yet. It will not look like this when we download it.
Click the Print / Download link on the blue menu, and you will be presented with the options in Figure 8.
You want to select one of the three Database Compatible options listed at the bottom of the screen. The Database Compatible options will give us a table where each row represents a unit of geography (census tracts in our case) and each column represents a data element (values for poverty in our case). Since we are working with Excel, select the Microsoft Excel radio button. The comma-delimited or pipe-delimited options would also be a good choices, as they are flexible formats that all spreadsheets and databases can understand. We'll stick with the Excel format for now. Click OK, and the download will begin.
!Is the download not beginning? If you are using the Internet Explorer (IE) Browser version 6 or 7, it will block your download for "security" reasons. A yellow bar should appear near the top of your browser window. Click on it, and tell IE to download the file. You will have to specify your download options all over again, and you should be able to download it the second time. Of course, you could avoid all of this by using the Mozilla Firefox browser instead.
If you are using Firefox, the file will be downloaded to the desktop of your computer. The data will be compressed in a ZIP file. Copy the ZIP file and paste it into your workspace directory in the Part3 folder. Single-right click on the file, and select the extract here option (this process may differ slightly, depending on how your computer is configured and on which ZIP utility you are using). You should see the files listed in Figure 9 once you unzip the file.
! Wait! I can't find the file I downloaded! Well, if you are using the IE browser, it usually downloads files to your My Documents folder by default instead of the desktop. You will have to navigate to that folder to find the file. Or you could just use the Firefox browser...
The Excel file that ends with the word "data" is the file we are interested in. We are going to make a few changes to this file, and we want to hold on to the original in case something goes wrong. So make a copy of the file, paste the copy in the directory, and give it a short name that makes sense, like poverty_census00.xls. Remember the file naming rules - no spaces! Then double-click on this file to open it in Excel.
In the file, we'll see pretty much what we were expecting. Each row is a census tract, each column is a data value, and we have some columns that contain names and ID numbers. Take a moment to look at the different variables.
Notice that the table has two header rows. The first one has a series of codes, and the second one has the full names of the variables. In the database world, this is a no-no. The first row is usually recognized as the header that contains the names of all the columns. All of the other rows are regarded as data. If we imported this table as-is into ArcGIS (or any database for that matter), it would consider that second row as data, and since it is text, it would store all of the columns as text. We definitely don't want this to happen! We need most of these columns to be stored as numbers. We also can't use that second row as our header column, because it violates our naming conventions. The names are too long and contain spaces. That second row will have to go.
Hi-light the second row by clicking on the row number 2 on the left, and delete it. Then, change the name of the worksheet by clicking on the worksheet tab in the lower left-hand corner, and name the worksheet poverty. Your screen should resemble Figure 11. Save your Excel file, and minimize it for now.
Downloading GIS Data
Once we have our attribute data, the next step is to obtain the GIS data. There is a lot of freely available GIS data on the web - unfortunately, it is not located in one, centralized place. The Census Bureau does provide some GIS files on their website. Individual state, county, and local governments also have online GIS repositories or data centers. Many colleges and universities maintain geospatial repositories online with data for the state and city they are located in, often in concert with state or local agencies. Some schools also provide datasets based on that university's areas of expertise. ESRI, the makers of ArcGIS, also provide online datasets and they bundle a data CD with their software. So there are many places to choose from.
When deciding which source to use, you will need to consider the following factors:
- Available geographies. Do they match the data that you wish to map?
- Currency. Are the data layers recent enough for your mapping needs, or are they outdated?
- Accuracy. Are the layers generalized? Can the source be trusted?
- Download unit. Can you download all counties in the US at once, or must you download one at a time?
- Projection. Is the projection known or defined? Will it match other layers you are using or will you have to make conversions?
- Formats. Will the available format work with my software?
- Identifiers. Do the data layers have ID fields that I can use for joining my data tables?
- Features. Are features stored as single part (every polygon has an attribute record) or multi-part (every feature has an attribute record)? The latter is desirable for table joins.
- Use. Is use of the data restricted to certain purposes?
You will be able to consider some of these factors by simply examining the information and data on the source's webpage. In other cases, you may have to download the file and examine the metadata and the actual data to see if it meets your requirements.
For this exercise, we are going to download several GIS layers from two City of New York agencies. The first site, Bytes of the Big Apple, is maintained by the Department of City Planning. Go to this site at: http://www.nyc.gov/html/dcp/html/bytes/applbyte.shtml
The Bytes page has several different data sources that you can download. Scroll down to the second option, Administrative and Political Districts, and click the download button.
The download page will provide us with thumbnail images and brief descriptions of the available datasets. Note that the datasets are available in two formats:MapInfo Tables and ArcView Shapefiles. MapInfo is another commercial GIS software product and is one of ESRI's chief competitors. MapInfo files generally will not work with ArcGIS. A shapefile is a standard GIS format that was created by ESRI, but can be used with a number of GIS systems. A shapefile is a lot like a feature class, except that it is not stored in a geodatabase; it exists as an independent file.
Scroll through the page and download the following shapefiles: boroughs (water areas included), health center districts, and 2000 census tracts (without water areas).
Many sites will offer boundaries with or without water included as part of the boundary. For example, if we were using a layer as a reference layer (like the borough boundary) we would want the boundaries to include water, to accurately depict where the boundaries lie. If we were using boundaries to make thematic maps (like the census tracts) we would want boundaries without water so we can accurately show where some phenomena is occurring. If we mapped poverty by census tracts and included water, the resulting map would look strange, as many of the boundaries extend into the water.
The files will be in a compressed, ZIP format. Once you download the files, move them into the Part3 folder in your workspace. Don't unzip them just yet. Next, let's go to the City's Department of Information and Technology and Telecommunications (DOITT) page, which is another resource for New York City GIS data: http://www.nyc.gov/html/doitt/html/eservices/eservices_gis_downloads.shtml
While the BYTES page has a a number of boundaries for administrative and statistical areas, the DOITT page has many layers related to city infrastructure. From this page, download the Open Space shapefile. We will lay this layer over our census tracts, so we can blot out tracts that consist largely of parks. Save the ZIP file to your Part3 folder.
Once you have the four layers in your folder, unzip each one. Your folder will look similar to Figure 15.
Exploring the ArcCatalog
You may be wondering, what are all of these files? I thought I was downloading one shapefile for each feature? Although we refer to a shapefile like it is a singular file, it is actually composed of multiple files that contain the actual boundaries, projection information, the attribute table, and metadata. In order for a shapefile to function properly, you need to keep all of these files together. This can cause headaches when you need to move your data around, or worse, when you want to change the names of files!
After you unzip GIS files into the directory of your choice, you should never use the Windows Explorer interface to work with your files from that point forward. ArcGIS includes a module that compliments ArcMap, called ArcCatalog. It was specifically designed for viewing and organizing your GIS files. Let's explore the ArcCatalog before we start to do any mapping.
The ArcCatalog is symbolized as a globe stuffed into a filing cabinet. If you don't see this symbol on your desktop, go to Start > All Programs > ArcGIS > ArcCatalog. You will then see the screen depicted in Figure 16.
The catalog is a split screen. The narrow screen on the left is the catalog tree, which is a drill-down list of all the directories on your computer. Clicking on a folder will reveal its contents in the larger, Content screen on the right. You can drill down through folders by hitting the plus sign beside a folder, and you can collapse them by hitting the minus sign. Use the catalog tree to drill-down to the Part3 folder in your workspace directory. Notice how the content view changes each time you click a folder.
Once you are at this folder, you will see your four shapefiles. They have the same symbolization as the feature classes from Parts I and II of this tutorial, except they are green instead of grey. Notice how we only see four files in this folder - for our convenience, the ArcCatalog hides all of the different file components of the shapefile, and just gives us one record for each. Better yet, ArcCatalog does all of the work of keeping these files together. So if we want to move or rename the shapefiles, we can do it once, and all of the components are taken care of.
Furthermore, ArcCatalog filters out any files that are not associated with GIS, and shows us just GIS related files. Remember the zip files that we originally downloaded? They are still there, in our workplace folder, but ArcCatalog is hiding them from our view. This allows us to concentrate on our GIS data without being distracted or inundated by all of the files on our system.
Single left-click on nybbwi.shp, our NYC borough boundaries file, and we will see the file symbol for this file in the contents tab. If we clicked on the file in the Catalog Tree, we would be able to rename it by writing over the existing name. Keep the same names for now.
There is nothing particularly exciting about this view, so click the Preview tab that is beside the Contents tab.
This gives us a preview of the geography of this file. This is pretty helpful, since the name of this file is a little cryptic. Viewing the geography clears up any misconception regarding what this file could be. Instead of adding all of these layers to a map to see what they are, we can take a quick look using the catalog instead. The preview tab actually has two options. Look at the dropdown box below the window that currently says Preview: Geography. Hit the dropdown and change the value to Table.
This allows us to see the attribute table of the shapefile. If we were looking at a data table instead of a shapefile or feature class, this would be the default preview option (as tables have no visual geographic component).
Now, click on the Metadata tab, which is to the right of the Preview tab.
Metadata is data about data. It describes several aspects of the dataset, such as its source, purpose, use restrictions, age, spatial properties, and table properties. There are a number of well-defined standards that are used to create geospatial metadata. Most agencies will fill out this information in great detail. However, it is possible to download datasets where an agency has not added any metadata. Certain descriptors, like the spatial extent (coordinates) of the feature and the map projection, are populated automatically by ArcGIS, which looks at the file and fills in the appropriate information.
We are currently on the Description tab of the metadata window. Click on the Spatial tab to see the spatial information.
The Spatial tab tells us the map projection and coordinate system this layer is in. In this instance, the file is mapped in the North American Datum (NAD) of 1983, New York State Plane System for Long Island and uses the North American Geographic Coordinate System (GCS) of 1983. The State Plane System is a commonly encountered system for mapping data for a particular state or for areas within a state. Each state has its own system, and in many larger states (like New York), there are subdivisions for different parts of the state. Long Island is a subdivision of the NY State Plane system. NY State Plane is automatically affiliated with GCS 1983 (which we have encountered before) and NAD 1983 (a datum attempts to account for the fact that the earth is not a perfect sphere when creating a coordinate system).
It is important to eventually understand what datums, projections, and coordinate systems are and how they work. But when you are first starting out in GIS, you simply need to know which system is best to use, and you need to make sure that all of the layers you are working with in a map share the same projection. We can use the ArcCatalog to see if they do.
All of the layers from BYTES will use the same projection, but we need to check the DOITT layer to see if its projection matches the BYTES layers. Click on the open space layer in the Catalog Tree, then click the metadata tab, then the spatial tab. Check and see what the projection and coordinate system is. In this case, we have a match. The DOITT layer uses the same system as the BYTES layer, so we can add them to ArcMap together without a problem.
Even though we do not have any projection issues in this case, let's walk through the process of reprojecting layers anyway, as you are likely to encounter these issues as you start using GIS on your own. You will frequently have layers from different sources that have different projections, and you will have to reproject some of the layers so that they all share the same projection. Otherwise, you will have problems when you go to add the layers to a map.
To access the Project tool, we will need to use the ArcToolbox. Click on the red toolbox in the toolbar, and the toolbox window will open in a center panel.
The ArcToolbox contains an array of tools for a variety of needs. Many of the tools are used for geoprocessing, which involves altering the actual geography of your files by adding, erasing, simplifying, or melding features or multiple files. There are also tools for converting file formats, performing spatial analysis, matching addresses, and projecting files. As a beginner, the projection tools are probably the first tools you will use in the toolbox. We won't be touching any of the other tools in this tutorial.
Hit the plus button next to the Data Management toolbox to expand the list of tools in this box. Then, click the button next to Projections and Transformations to expand that list. Finally, expand the Features box. Features refers to feature classes, shapefiles, and any file that is vector based. The Project tool under features is the tool we are looking for. Your window should look like Figure 23 at this point.
! At the bottom of the Projections and Transformations box is a tool called Define Projection. This is NOT the tool that we want to use, and using it instead of Features > Project is a common and confusing mistake. You would use Define Projection if the projection data for the feature you are working with is missing. You would use Features > Project when you know what your feature is projected in AND you wish to convert it to a different feature.
Click on the Features > Project tool to open the Project window.
In this window, you would select the feature that you wish to reproject, and specify the name of the new projected feature (instead of changing the projection of your feature, ArcGIS makes a copy of it and gives the copy a new projection). If you select the Output Coordinates button, you will be prompted to choose a projection for the new file, either by selecting a projection from a list or by selecting a file that has the projection that you want. In this case, we would have chosen the latter option, and specified that the DOITT file get the same projection as one of the BYTES files. In some cases, we may have to select a geographic transformation option from the bottom window, but this is not always the case. If you hit OK the transformation would begin.
! If the projection fails, the three most common reasons for the failure would be: the input file does not have a defined projection (so you'll have to define it first using the Define Projection tool), the path and file name for either the input or output file is too long or violates naming conventions (so you'll have to go to the ArcCatalog and rename or possibly move files), or you do not have permission to write to the output directory (use Windows explorer to determine if you have write permission to a folder - if not, change the permissions or try writing to a different folder).
! The ArcToolbox is available within the ArcMap module as well, and you could use the Project tool there, but the process is a lot more confusing. When you add layers to a map, ArcMap gives the Data View its own projection, based on the projection of the first layer that you add. If you add layers that have different projections, ArcMap attempts to reproject them on the fly so that they fit in the data frame - sometimes it succeeds, and sometimes it doesn't, with odd results. If you use the ArcToolbox within ArcMap to reproject layers, it adds these layers to the map, but then reprojects them on the fly again back to its previous projection, making it look like nothing has happened! Save yourself the confusion and use the ArcCatalog to reproject your data.
Now that we know that our layers are in the same projection, let's see if the poverty data that we downloaded shares a common field with our GIS census tract file. We will need to have at least one field in common in order to join the two together. Use the Catalog Tree to navigate to the census tract shapefile, and preview the attribute table in the preview tab.
The [CT2000] field contains the census tract ID. Sort the table using this column by high-lighting it, right-click, and sort.
You'll see that this number is not unique. It would be if we were looking at a single county (or borough in this case), but it is not in this case. Four of the five boroughs have a census tract with the ID 000100. The column next to this one, [BoroCT2000], is unique. In front of each tract number there is a digit that specifies which borough the tract is in. The fact that we do have a unique identifier is good news. The bad news is that it is unlikely that the census data table we downloaded uses this convention, which was invented by the city planning office. Let's take a look at the census table.
Open the census data table in Excel and examine the ID fields.
The [GEO_ID2] column looks like the best unique identifier. It is the FIPS code and contains a code for the state (36 is New York), the county (next three digits - 005 is The Bronx), and the tract (last six digits - 000100). Unfortunately, this code is not compatible with the code in our tract layer. Which means that we will have to create a new column in one of the files that contains a matching identifier that will allow us to join the tables. There are several different paths we could choose here. We'll take the easiest option for now, which is to create a new [BoroCT2000] field in the Excel file.
Excel contains a number of great functions for manipulating text, which we can use to our advantage. Insert a new column to the right of the [GEO_ID2] column, and name it [TRACT]. In the first cell below the header name, type in the formula:
This formula starts from the right of the value in column B, row 2 (the first FIPS code in the [GEO_ID2] column), and returns the first six digits, which represents just the census tract portion of the FIPS code.
Now copy this formula, and paste it in all of the empty cells in the [TRACT] column. Excel automatically updates the formula when you paste it, so that it pulls values from the adjacent cell and not from B2 all the way down. Then, high-light the column by clicking on the C column header at the top. Go up to the Edit menu, and select Copy. Then go back to the Edit menu and select Paste-Special, and select the Values radio button as seen in Figure 28.
Hit OK to do the paste. This will replace all of the formulas in each of these cells with the actual value that the formula returns - in this case, the census tract ID number.
The next thing we need to do is add the borough code to the front of each of these tract numbers. First, we will need to create a column to hold the borough code. Insert a new column to the right of the [TRACT] field and name it [BORO]. The spreadsheet should already be sorted alphabetically by borough. The codes for each borough are:
Manhattan = 1, Bronx = 2, Brooklyn = 3, Queens = 4, Staten Island = 5
The Bronx is listed first. Type the number 2 in the first [BORO] cell, then copy and paste it all the way down until you come to the last record for the Bronx. Repeat this with the appropriate code for each of the boroughs. Remember that this census table is listing county and not borough names. So Brooklyn will be listed as Kings County, Manhattan as New York County, and Staten Island as Richmond County. The county names for The Bronx and Queens are the same as their borough names.
! Isn't there an easier way to do this? If you are familiar with Excel functions, you could use a nested IF statement, but the syntax looks ugly due to all of the parentheses you need to use. Since this tutorial is designed for beginners, and we don't have that many records, we'll just stick with copying and pasting.
! Make sure that the numbers in the [BORO] field are saved as text. In Excel, cells that have text in them will have values that are indented to the left. If the values are indented to the right, they are saved as numbers. If you select the entire [BORO] field by clicking on the D column header, right click, and choose Format Cells, you will be able to change the column to Text values if they are currently not being saved as text.
Once we have completed populating the [BORO] column, we can add the borough code to the front of the tract code using the CONCATENATE function, which is used to string text from two fields together. Insert a new column to the right of the [BORO] column and name it [BOROCT2000]. In the first cell of the column, type the following formula:
This strings together the first value in the [BORO] field followed by the first value in the [TRACT] field, as illustrated in Figure 29.
Copy and paste the formula all the way down, then perform the Copy and Paste Special operation on the [BOROCT2000] field, replacing the formulas with the actual values. The end result should look like Figure 30.
At this point, our table is ready. With this new field, we can add this table to a map and join the data to our census tract boundaries. Save the Excel file and close it.
! The CONCATENATE function will only work on fields that are stored as text. You cannot concatenate fields stored as numbers. The general rule of thumb is: if you have a numeric value that is an ID number, you should store it as text. If you have a numeric value that is actually a quantitative value that you can perform arithmetic on, then you should save it as a number.
Layering and Symbolizing Data
Now that our data is ready, let's map it! (At this point, we are going to be repeating many steps that we covered in Parts I and II of this tutorial, so these instructions are not going to be as detailed. Refer back to those tutorials if you get stuck.
Open up ArcMap and add the shapefiles for the borough boundaries, the tracts, the health center districts, and the open space areas. As usual, these will be layered on top of each other in mono-color.
Let's make the layers more intelligible. Give the borough boundaries a hollow fill and a thicker, grey boundary, to resemble the image in Figure 32.
The other layers should now be more visible. Give the health district boundaries a hollow fill and a red outline, and give the open space a green fill and no outline. Then rearrange the drawing order of the layers so that the borough boundaries are on top, followed by the health district boundaries, then the open space, then the tracts. This should allow us to see all of the boundaries clearly with the exception of tracts that are covered by parks. Your view should resemble Figure 33 at this point.
Next, let's add our poverty data table to the map. Once you navigate to the folder where the Excel file is, click on it to reveal the worksheets that are in the file (as depicted in Figure 34). You cannot add an entire Excel workbook to a map - you can only add individual sheets. In this case we should only have one sheet - the poverty sheet. Add it to the map.
! Having trouble adding the Excel file? Make sure you are adding a worksheet and not the entire file. Can't even see the Excel file? ArcGIS 9.2 was the first version of ArcGIS to support Excel files. If you are using an older version of ArcGIS, you won't be able to add an Excel table. You either have to save the table as a dbf file or a delimited text file and add it to the map, or you can export it to an Access database and add the table from Access to the map.
Join the the poverty table to the census tract boundary. The specifications in your table join should look like Figure 35. You are using the [BoroCT2000] field in the tract shapefile to join it to the poverty table using the [BOROCT2000] field in that table.
If the join was successful, we should be able to map the data in the poverty table. Go to the Symbology tab in the Properties menu for the census tracts. Map the Value [P087002], which is the number of people in poverty, and normalize it by the value [PO87001], which is the number of people for whom the poverty status is known. Create four classification categories based on the Natural Breaks method, and format the labels so that they appear as percentages with one decimal place. Choose a color scheme that is to your liking. Your window should look like Figure 36.
Click OK, and you will get a map that shows the percentage of residents who are in poverty in each census tract.
Working With Tables in ArcMap - OPTIONAL
Take a few moments to zoom and pan around in the Data View, to see what the pattern of poverty looks like. If you look closely, you may begin to see some empty census tracts. Zoom in to the southern area of Manhattan, and look across the river to Brooklyn. You will see an empty census tract. What's going on here? Use the Identifier tool and click on this tract to see its attributes. Scroll through them, and notice that the value for [P087001], the number of people whose poverty status is known, is zero. If you look at some of the other tracts that appear blank, such as the islands in Jamaica Bay by the airport, they also have values of zero.
When we symbolized the tracts, we specified that the value for poverty should be normalized based on the total number of people for whom the poverty status was recorded. In other words, we are dividing the first value by the second value to get a percentage. The blank tracts that are appearing on our map are instances where the population is zero. If you are one of those mathematical types, you'll know that you can't divide numbers by zero. If you try to do this in a spreadsheet or database, you'll get error values. That is precisely what is happening here. ArcMap can't divide by zero, so these tracts are being left out of our classification scheme.
This is a problem that you will likely face time and time again, so let's look at how we would solve it so that these areas are included in our classification scheme. This will also give us the opportunity to work with ArcMap's interface for manipulating tables.
First, we will have to create a new field where we will calculate the percentage ourselves, rather than on the fly using the Normalization value in the Symbology tab. Open the attribute table for the tracts layer. Click on the Options button below the table, select the Add Field option, and you'll get the Add Field dialog box in Figure 39.
Name the field [povpercent]. Under type, choose Double. This is a type of number field that will include decimal places (integer fields, on the other hand, do not include decimals). Precision is the number of digits a number can have, and scale is the number of digits to the right of the decimal point. In this case, set the precision and scale to 5. That way, when we convert the number to a percentage in our legend (later on) we will be able to show values from 0% to 100% with two decimal places. Hit OK.
We will eventually populate our new field by calculating the percentage ourselves. But before we can do that, we must insure that we do not perform the calculations on any features that have a value of zero in total population field. Otherwise ArcMap will throw up a cryptic error message and will prevent us from performing the calculation at all. To do the calculation, we must select all of the records where the total population is NOT equal to zero, and then we can perform the calculation on just these features.
Minimize the attribute table, go to the Selection menu at the top of the screen, and choose Select By Attributes. Select the Attributes of the tracts layer (in the first dropdown box), select the [P087002] field so that it appears in the bottom window, then type < >; 0 after it (this is the notation for not equal to). Your window should look like Figure 40.
Click OK, and ArcMap will make the selection. All populated tracts should be selected.
Return to the tracts attribute table. Click on the grey box at the top of the [povpercent] field that contains its name. Single-right click on this label to reveal the menu, and select the Field Calculator, as shown in Figure 42.
In the Field Calculator window, click on the [P087002] field on the list to add it to the bottom window. Then type the division symbol / after it in the box. Then select[P087001] from the list so that it appears after the division symbol. Make sure that Calculate Selected Records Only box in the lower left-hand corner of the screen is checked. Then click OK. This will populate the [povpercent] field with the results of this calculation, which will be the percentage (in decimal form) of the population in each tract that is in poverty. ArcGIS leaves zeros for records where we would be dividing by zero.
Close the attribute table. Go to the Selection menu and Clear the Selected features. Then return to the Symbology tab for the tracts layer, change the value to [povpercent] and leave the normalization value blank. Format the labels to show percentages for 4 data classes. Hit OK, and you will see your map which now includes all of the previously missing features.
! By adding this new column and populating it, we have altered the actual shapefile. The change will be saved automatically. Remember that this change has nothing to do with the particular map (mxd) file we are working with. Even if we didn't save our mxd at this point, our shapefile would still be modified and will look the same each in every mxd file we use it in.
! This seems like a lot of work to deal with a common problem. Isn't there another way? We could have created the calculated field for poverty percentage in the Excel table instead, and replaced the errors with zeros using the Find and Replace command. To do this we would have to undo the table join in ArcMap, remove the Excel table from the map, and add it to the map again once the new field was added to the Excel file.
Select by Location and Export
At this point, we have a great dataset that we could work with. We could look at how poverty is distributed throughout the city and analyze the tract data based on what health district they are located in, so we could, for example, figure out how to allocate healthcare resources to distressed populations.
But that would be too easy! Let's throw an added complication into the mix and say that we are only interested in studying The Bronx. Is there a way to grab just the data and the portions of the shapefile that pertain to the Bronx? The answer is - absolutely. Up until now, we have selected features using attributes. We could also select features based on geography. This is a pretty common operation that you will perform time and time again.
So, let's say that we want to select all of the census tracts that are in The Bronx. This is actually a two-step operation. Before we can select all of the census tracts in The Bronx, we have to select The Bronx itself from the borough boundary layer. Use the Select Feature tool (on the toolbar, white arrow in front of a blue and white box). Click on an area inside The Bronx that contains nothing but water. This way, we'll select The Bronx and not an individual tract or park (we could get around this problem by going up to Selection, and use the Set Selectable Layers option to make the borough boundaries the only active layer - then we would be able to click anywhere inside The Bronx and would only select it). Once you select The Bronx, it should be outlined in blue.
Now we can select the tracts. Go up to Selection and choose Select by Geography. You will get the window depicted in Figure 45.
In this window, you want to: select features from - the tract layer (checkmark it in the layer window) that: Have Their Centroid In - the Features in This Layer: the New York Borough Boundary layer, and only the selected features (in this case, The Bronx). As you can see, you have a lot of options to choose from here. "Have Their Centroid In" simply means that all of the individual features of the tract layer must have their geographic center within the Bronx boundary. But there are a number of geographic selection options suited to a variety of different purposes: contained by, outside of, within a certain distance of, touched by, shares a boundary with, etc. Despite its rather technical name, Have Their Centroid in is a pretty common function that you will use to extract subsets of features from larger features.
Click OK, and the tracts will be selected.
In this case, we have a pretty neat selection, since tract boundaries never cross county lines. If they did, then tracts that were mostly inside The Bronx would be included in our selection, and tracts that were mostly outside would be excluded.
Now that we have our selection, we can turn it into a new layer. Right click on the tract layer in the Table of Contents and Select Export > Data Export to bring up the Export window.
We want to export all of the selected features of the tract layer using the same coordinate system and save them as a new shapefile. Use the folder icon to navigate to your working directory and give your new file the name Bronx_Pov. Click OK and ArcGIS will start exporting the features into the new file. When prompted, add the new file to your map.
Clear the selected features, and repeat the same steps to create a subset of the health districts layer just for The Bronx called bronx_hd. To save you the trouble, I have already created an open space layer for the Bronx, which you can add to the map from your Part3 folder. Then spend some time tidying up your map. You can turn off the layers that cover the entire city and work just with the Bronx layers. Symbolize the census tracts in the Bronx based on the [pov_percent] field using four data categories. Give your parks a proper color, and give The Bronx health districts some outlines and labels that stand out from the colors used for your poverty data. Then, add the bronx_waste layer from the Part3 folder to your map. When you are finished, your Data View should resemble Figure 49.
Brief Introduction to Data Analysis
When you are ready, use some of the selection tools to try some actual analysis. The bronx_waste layer represents businesses that are classified as waste storage or disposal companies. Let's say you were interested in knowing the poverty status of tracts that were within 500 feet of these businesses. Go up to the Selection Menu and choose Select by Geography.
Select features from the Bronx_Pov tract layer that are Within a Distance of the bronx_waste layer. Apply a buffer to the features of bronx_waste of 500 feet. This will select all of the tracts within 500 feet of the waste storage and disposal sites. Click OK, and the tracts will be selected.
If you want to see a list of the selected tracts, open the attribute table for the Bronx tracts and at the bottom of the window, click the option to Show only the Selected tracts. This way, you'll be able to examine just the selected tracts. You can calculate statistics for these areas in ArcGIS, or you can export the results out as a table (under Options > Export) and work with them in Excel or another program. If you haven't noticed already, the Bronx_Pov layer contains all of the same data fields as the original tract layer for the city, AND all of the fields from the Excel poverty table. When we exported the Bronx tracts to the new shapefile, it took all of the fields from both and included them in the new attribute table for the tracts.
When you are finished, create a map of your choosing, and then click over to the Layout View and add the necessary elements (title, legend, scale bar, etc) to create a finished map. Don't forget to save your map as an mxd file, and export the finished map out as an image or pdf if you want to share it with someone else.
Hopefully, this tutorial served as a useful introduction to learning GIS by mapping census data. It covered the most important tools and steps that you need to take in order to visualize your data. Of course, we have merely scratched the surface of what GIS can do. When you are ready to learn more, here are the next topics that you should investigate on your own. It is likely that you will encounter these issues as your GIS experiences and needs evolve.
- Creating geodatabases. Rather than working with multiple shapefiles and data tables, you may want to keep all of these files together and organized in a geodatabase. You can easily create a geodatabase file in the ArcCatalog and export files to the geodatabase.
- Converting Interchange Files. As you explore sites for downloading data, you may encounter Interchange files that have the extension .e00. These are older files that you have to convert in the ArcCatalog before you can add them to a map. You can convert them to Coverages, also an older format but one that will work in ArcGIS.
- Spatial Joins. This is the second option in the Join menu. A spatial join will take the attributes of one geographic layer and assign them to another geographic layer. For example, we could give each of the waste storage and disposal sites in the bronx_waste layer a census tract number from the tracts layers, based on the site's location within each tract.
- Adding XY Coordinates. If you have a data table with records for particular places, and you have fields that contain the longitude (X) and latitude (Y) coordinates for each place, you can add this table to ArcGIS and plot the locations using Tools > Add XY Data. This is how the bronx_waste layer was created.
- Geoprocessing. There are many different geoprocessing tools in the ArcToolbox that you can use to modify the geometry of features in your shapefiles. Common tools that you would use include: Clip - cut features of a layer based on another layer (if we wanted to cut the open space layer by the borough boundaries), Erase - alter the features of a layer based on another layer (if we had a water layer and a boundary layer, we could create a boundary layer where water is removed), Dissolve - combine features in a layer based on a common attribute (if we had a layer of single part features we could convert them to a multipart features - i.e. one record for the State of New York rather than individual records for the mainland, Long Island, Staten Island, Manhattan, etc).
- Raster data. This tutorial focused on using vector data and tables. Raster imagery, like aerial photographs, satellite images, or scanned paper maps, can serve as reference layers for your vector features, or you can use them for land use and land cover analyses.