Archive for the ‘Miscellaneous’ Category

Introducing – Data!

Wednesday, April 9th, 2014

Professors invite me to their classes each semester to give students a crash course in finding data for neighborhoods in New York City, with a particular emphasis on Census data. I typically visit courses in journalism and public affairs, but this semester I added classes in management and – theater – to the list. Before I dive into what the Census is and what sources they should use, I preface the presentation with a discussion of what neighborhoods are and how we define them. This is important because neighborhoods are locally and informally defined, and when searching for datasets we often have to use a proxy, like census tracts, ZIP codes, PUMAs, or local legal or administrative areas, to approximate them.

But before we get this far, I always begin the discussion with some basic questions to set the stage: what is data, and what can we use it for? For the journalism students, I explain that data can help support a story. If they’re covering a town hall or community board meeting where affordabale housing is the topic of discussion, they’re going to want to provide some context and include some facts to support their story – what is the rent like in the neighborhood? How many people live there? Alternatively, data can provide the basis for a story. I point to one of many numerous examples in NYC where journalists have taken a big lump of unrefined data – the NYPD’s stop and frisk data, traffic fatality incidents, 311 complaints – and have refined it to produce information that leads them to an interesting story that was hidden in these numbers. Lastly, data is a story – whenever the Census releases a new dataset, someone is writing to announce the release and tell us what’s in there.

This idea of refining leads us to our first basic definition – data can be considered as raw and unrefined information. It doesn’t tell us much in and of itself, but if we sift through and refine it we can turn it into information that we can use to tell or support a story or reveal some fact or truth that was previously unknown. Data can be quantitative or qualitative – journalists for example may interview someone for two or three hours, but they’re not going to turn around and publish that entire interview. They’re going to write an article that summarizes it and gives us the most important bits, or edit it for a radio broadcast that covers the high lights. With quantitative data the issue is similar – I use a basic example of population data for the 50 states and show them this image of a comma delimited text file:


I explain that this is what data looks lke in a raw state. It’s in a basic format suitable for preservation or transit between systems, but is not in a presentable state. There are a lot of codes that are meaningless to the average person, the data isn’t sorted in a meaningful way, the column headings seem ambiguous, and the numbers aren’t formated for viewing. This isn’t something that they’d want to insert directly into their story or paper. But if they take this and do a little bit of work:


They can take that raw data and turn it into information. Here we’ve moved from raw data to a presentable table. The statistics are sorted in a logical order based on total population, columns are given comprehensible names, and unecessary information (for presentation purposes) is dropped. We add commas to the numbers so they’re more legible, and we create some value by adding a percent total column. Now we have something we can use to communicate a message. But we can go further – we can take that same information and turn it into this:


Now we have a chart. At this point I turn to the students and ask them what the benefit of using the chart is, followed by a discussion of trade-offs; we’ve gained something but lost something too. On the plus side, we can appeal to people’s visual sensibilities, and we can see more clearly that California has twice as many people as New York. The chart is also more concise, as it’s taking up less real estate on the page or on the screen. But we’ve exchanged conciseness for preciseness; we can no longer tell what the exact population numbers are with the chart; we can only approximate. But we can also go further:


We can take that same dataset and turn it into a map. Once again, we discuss the pluses and minuses. Now we can key into to people’s geographic knowledge as well as their visual senses; Ohio may be more meaningful now that we can see it on a map, rather than just seeing a number in a table. We can also see geographic patterns of clustering or diffusion, which the table or chart couldn’t show us. But with the map we’ve lost even more precision. Now we can only see that a state’s population number falls within a given range; we can’t see the precise number and can’t approximate it like we could with the chart.

At this point, one student will point out that if the chart or map is on the web, we can have the best of all worlds. If the graphic is interactive we can hover over it and see the exact population number. This leads to a discussion of the trade-offs between interactive web-based information and static information. The interactive chart or map let’s us keep precision and conciseness, but the sacrifice is complexity, portability, and preservation. It’s more complex to create, and it can only exist in it’s native environment, within a specific bundle of technology that includes programming and scripting langauges, software libraries, browsers, and operting systems. Such things go obsolete quickly and can easily break, so the shiny chart, map, or app you have today is non-functional in a year or two, and difficult to preserve. Contrast that with a static image or text, which is simple, easy to move around, depends on little else, and can make the jump from a screen to the printed page.

We sum up this little talk with the basis of what they’re trying to achieve – I use the DIK pyramid, which I was introduced to in library school (OK – this pic is the DIKW pyramid, with wisdom thrown on top – it’s public domain so I can safely use it):


As journalists or researchers, you’re taking data and refining it to turn it into information to support your work or to commuicate a message. You take those pieces of new information and bring them together to tell a bigger story and paint a bigger picture, which we hope will lead to greater knowledge (which, unlike data and information, is something that can only be learned and not simply assesmbled and communicated). The weather is a good example – a giant log of temperature and precitiptation data isn’t going to do me much good. But if you process that data to calculate the high, low, and mean, now you have information I can use. Take that information and combine it with a radar picture and a forecast and now I have a rich information object. I can take that object and piece it together with other information – another forecast I hear on the radio, what I see out the window, my previous experiences of getting wet, my wife’s advice – to formulate a decision that I can act on. By considering all of this information – my experiences, contextualized information, and know how – and weighing it to reach a conslucion, I am using my knowledge. In this case I’ll use it carry an umbrella.

The final point is that, in their papers, the students must take the information objects that they’ve created or acquired and integrate them into their work. Many students will just copy and download a table and stick it at the back of the paper, and assume that it speaks for itself. I tell them – it doesn’t! You have to explain why it’s there; make reference to it in the paper and weave it into your research.

Overall this presentation / discussion takes all of about 10 minutes, AND THEN we move into the discussion of neighborhoods, the census, and specific datasets. I’ve contemplated skipping it all together, but ultimately decided that it’s necessary. I think it’s essential to provide some context and theory coupled with the actual sources and the pragmatic nature of finding the data. There are some librarians who are completely adverse to teaching “tools” and will speak completely in the abstract, while there are others who cut directly to listing the sources and leaving it at that. The first approach is useless because the students won’t learn what to actually do; the second apporach makes assumptions about what they know and fails to prepare them for what they’ll face. There also seems to be a clear need for me to do this – I’ve heard many faculty who have commented that students are simply tacking data tables they’ve copied off the web into the back of papers without any explanation. When I present the slide that depicts the csv file, I was initally shocked by the looks of shock on many student’s faces – like they’d never seen or heard of this before and were worried that they’d have to wrestle with it. Here’s the data-driven world, step 1.

Libraries Help Create Video Game Geography

Thursday, June 9th, 2011

Just for fun – I stumbled on a blog post from a TV Station in California that discusses how the developers for the new L.A. Noire video game made extensive use of libraries and archives to recreate Los Angeles of 1947. They used property, city planning, and USGS maps to recreate the street grid and landscape and aerial and street-level photos to faithfully replicate everything from specific buildings to street lights and garbage cans. They also dove into newspaper archives and Raymond Chandler’s works and personal papers for dialogue and story lines. It’s a good thing that we keep libraries and archives around to organize and collect all this stuff…

How Archivists Helped Video Game Designers Recreate the City’s Dark Side for ‘L.A. Noire’

Don’t know what I’m talking about? Check out this trailer.

Copyright RockStar Games 2011

Geographic Information: Literacy and Systems

Wednesday, August 5th, 2009

I’ve been spending a good portion of my summer working on the course that I’m going to teach this fall. The library at my college offers credit courses in Information Studies which students can take as a minor – they can choose two 3000 level courses and then a 4000 level capstone course. My course is a 3000 level special topics course which I’ve called Geographic Information: Literacy and Systems.

My situation is rather peculiar. I can’t teach this course as a pure GIS course, since it’s an information studies class and not geography or earth sciences. Beyond that, my college does not have a geography department, and earth sciences are not an individual department but are combined with other natural and physical sciences. With the exception of a regional geography class offered by the anthropology department, my college doesn’t offer geography instruction. So even if I could teach a pure GIS class, it’s unlikely that any of the students would have any foundational geographic knowledge.

I also can’t teach the course as a “library” class where I’m training people to be map or GIS librarians, because that isn’t the point of the info studies minor. The minor is meant to introduce students to the foundational principles of information – what is information, how do we search for it, organize it, what is its context in society, etc. I also could not teach the course as a basic software class, as that isn’t really appropriate for a college course. In short, I couldn’t find a model that I could follow, as what I’m doing falls outside these traditional realms.

So I decided to build the course around the concept of geographic information where I’ll cover some foundational geography,cartography, and GIS from an information science perspective that encompasses:  organization, search and retrieval, data processing, and assessment and analysis of GI. I’ve divided the class into four units that cover geographic information and fundamental geography, maps as information objects, and two units of GIS. In the first GIS unit we’ll cover the theoretical aspects and the basics of using the software with datasets that I’ll provide. In the second unit we’ll deal with the nitty gritty of actually searching for and processing freely available GIS data. In the last couple of weeks I’ll spend some time on web mapping and on geographic analysis and research.

Many of the concepts that I’ll be teaching are things that I never formally learned in a college course, such as a discussion of the kinds of administrative and statistical divisions that exist in the world, why they exist, and how data is collected for them. The second GIS unit on data processing is something that I feel is never adequately covered in GIS classes, but is essential for doing just about anything in GIS. I think this is also poignant in information studies, as it involves a discussion of the difference between data and information and how you can turn one into the other.

I’ve decided to use all open source software. Since these are undergraduate students who probably won’t be entering a geography related field, and we are a commuter campus where students have to make special trips to get to computer labs, I didn’t see any logic in using ArcGIS. With the open source software they can use it anywhere and there will be a better chance that they’ll use it after the course is over (and after they graduate). I’ve opted to go with QGIS as it covers all the bases I need. I liked gvSIG but had too many problems with the map layout – I might be able to cut my way through them, but can sophomore business and english majors? QGIS is also more thoroughly documented (in english), which is important since this is an introductory class.

I’m using Krygier and Woods Making Maps as my textbook, along with a few chapters here and there from other texts. I have looked to the pages Krygier’s created for his courses for guidance, and like the stream of consciousness style he used for writing his notes. I’ll post an annotated reading list later.

Since I’m breaking molds, I’ve also decided not to use Blackboard to organize the whole course and am using a blog and various other bits and pieces of software for creating assignments, organizing the roster, etc. If you’re interested you can follow along on my course blog – (only students can register). Classes start on August 31st…

Creating a New Shapefile in ArcGIS: Part I

Thursday, May 14th, 2009

I’m working with a grad student who needs to create a new shapefile from scratch, and thought I’d turn the instructions for doing this in ArcGIS into a tutorial / post for creating new point layers. The idea in this example is to create a point layer that shows the relative center of 291 neighborhoods in New York City. Since many of these neighborhoods are place names without finite boundaries, we’ll have to use various sources (NYC Planning map and Rand McNally street maps) to pinpoint the relative center of each neighborhood.

These points will be used for labeling each neighborhood. In this case, creating a new, georeferenced layer is preferable to creating 291 text labels on a map that are not tied to geography in any way.

  • The first step is to download some layers from the NYC Department of Planning to use for reference, such as a layer for boroughs and community districts. Community districts are used by the city to approximate neighborhoods. Many of the neighborhoods that we are trying to plot are, in many cases, smaller areas or places within these boundaries.
  • scrnshot1Next, open ArcCatalog and create a folder to store the data. Then, right click on the folder in the table of contents and select New – Shapefile. In the Create New Shapefile window, we give the shapefile a name, select Point as the feature type, and hit Edit to change the
    coordinate system. In the Spatial Reference Properties menu, we’ll import a coordinate system from one of the files we downloaded from NYC Planning, which uses New York State Plane for Long Island. Click OK and OK again, and we’ll have a new shapefile.
  • scrnshot2Right now, our new shapefile isn’t very exciting because it’s empty – you can preview it in the catalog to see for yourself. If you preview the table, you’ll see that Arc created three fields – FID, Shape, and ID, which it will automatically fill in when we start creating features. Before we do that, we’ll have to add an additional column to store the name of the neighborhood. To do that, open ArcMap and add the neighborhood layer to the map. Then, right click on the layer in the Table of Contents and open the attribute table. Hit the Options button and choose Add Field. In the Add Field menu, name the new field, choose Text as the type, and change the length to 80 (in case we have some neighborhoods with long names). Hit OK, and you’ll have a new field.
  • scrnshot3Let’s add our reference layers next. Hit the Add Data button (or File – Add Data), and add the borough boundaries and community districts (if you don’t see anything after you add them, right click on one of these layers and choose Zoom to Layer). Go into the symbology tab for each layer and change their display to make the areas appear more distinctive. Make sure your neighborhood layer is on top of your other layers.
  • Now it’s time to start plotting neighborhoods. Go to the Selection menu – Set selectable Layers, and turn off all the layers except the neighborhood layer. Then, use the dropdown on the Editor Toolbar and Select Start Editing (if you don’t see the Editor Toolbar, make sure it’s activated by going to View – Toolbars and select it). scrnshot4On the Editor Toolbar, make sure the Create New Feature task is activated and that the target layer is the neighborhood layer, and not any of the reference layers. Zoom in to the top of Manhattan. With the Pencil tool selected in the toolbar, and using your sources (NYC planning map, Rand McNally street map, whatever), click on the map to approximate where the center of the Inwood neighborhood would be. A blue dot should appear on the map. Then right-click on the neighborhoods layer in the Table of Contents and open the attributes table. You’ll see a brand new record for your new dot. Click in the empty field for Name, type in the name of the neighborhood, and press enter.
  • That’s the process! Next, locate the area for Washington Heights and click on the map to create the point for that neighborhood. The new dot will appear hi-lighted, while the previous dot for Inwood will now appear as a regular point symbol. Now it’s just a matter of plugging away. Make sure to occasionally save your edits by clicking Editor and choosing Save Edits. If you make a mistake, you can delete a feature by selecting the Select Feature tool in the regular tool bar (white arrow with a blue and white feature box next to it), select the particular point, and hit the delete key. If you’re having trouble pinpointing the right location for the neighborhood, try downloading additional reference layers to guide you. The NYC DOITT also has a page with GIS layers for the city with features like parks and streets that may be helpful. When you’re finished editing, choose Stop Editing under the Editor Toolbar.


  • The ultimate goal of this exercise was to get neighborhood labels to appear without the actual point. To accomplish this, change the point symbol for the neighborhood to nothing by going into the Symbology tab for the layer and reducing the fill to no color, the outline to nothing, and the size to zero. Then open the Labels tab under the Properties menu, turn labels on using the name field as the label field, select Placement Properties and choose the setting to place the labels on top of the point, hit ok, and voila! Perfectly centered neighborhood names that are part of a georeferenced layer.

This covers the basics. In the next post, I’ll go a little further and discuss adding additional fields to the new file, without having to type them in manually.

Heading Cross Country with Google Maps

Monday, June 16th, 2008

I’m flying out to Seattle tonight, and will eventually be driving back to New York, after side trips to Vancouver BC, Olympia, and San Fran. When I moved to Seattle for grad school back in 2005, I used a Rand McNally road atlas to plot time and distances between stops. This time around I used Google Maps, which made it much easier to fiddle around with spacing the stops out. You can check out my plan (for Seattle to New York via San Fran) here. Place names and Long / Lat coordinates get passed through the url, making it easy to hack together your own map.

I’m still taking that road atlas with me though!

My Goings On

Monday, May 26th, 2008

It’s been awhile since my last post – I’ve been locked away in my apartment on research leave for the past two weeks. I’m working on a database-backed web directory for finding GIS data, as I’m tired of dealing with bookmarks, html lists, and protracted web searches for keeping track of datasets. The goal is to keep it simple and standards-based. The basic architecture is in place; I’m just struggling to learn and apply PHP. Fingers crossed, I may have a prototype ready by the end of my next round of leave this summer.

I’m also still reading Georeferencing by Linda Hill, which extensively covers gazetteers and and metadata standards.  It’s definetly worth checking out, from a library near you.

Copyright © 2017 Gothos. All Rights Reserved.
No computers were harmed in the 0.390 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.