Posts Tagged ‘teaching’

Introducing – Data!

Wednesday, April 9th, 2014

Professors invite me to their classes each semester to give students a crash course in finding data for neighborhoods in New York City, with a particular emphasis on Census data. I typically visit courses in journalism and public affairs, but this semester I added classes in management and – theater – to the list. Before I dive into what the Census is and what sources they should use, I preface the presentation with a discussion of what neighborhoods are and how we define them. This is important because neighborhoods are locally and informally defined, and when searching for datasets we often have to use a proxy, like census tracts, ZIP codes, PUMAs, or local legal or administrative areas, to approximate them.

But before we get this far, I always begin the discussion with some basic questions to set the stage: what is data, and what can we use it for? For the journalism students, I explain that data can help support a story. If they’re covering a town hall or community board meeting where affordabale housing is the topic of discussion, they’re going to want to provide some context and include some facts to support their story – what is the rent like in the neighborhood? How many people live there? Alternatively, data can provide the basis for a story. I point to one of many numerous examples in NYC where journalists have taken a big lump of unrefined data – the NYPD’s stop and frisk data, traffic fatality incidents, 311 complaints – and have refined it to produce information that leads them to an interesting story that was hidden in these numbers. Lastly, data is a story – whenever the Census releases a new dataset, someone is writing to announce the release and tell us what’s in there.

This idea of refining leads us to our first basic definition – data can be considered as raw and unrefined information. It doesn’t tell us much in and of itself, but if we sift through and refine it we can turn it into information that we can use to tell or support a story or reveal some fact or truth that was previously unknown. Data can be quantitative or qualitative – journalists for example may interview someone for two or three hours, but they’re not going to turn around and publish that entire interview. They’re going to write an article that summarizes it and gives us the most important bits, or edit it for a radio broadcast that covers the high lights. With quantitative data the issue is similar – I use a basic example of population data for the 50 states and show them this image of a comma delimited text file:

csv

I explain that this is what data looks lke in a raw state. It’s in a basic format suitable for preservation or transit between systems, but is not in a presentable state. There are a lot of codes that are meaningless to the average person, the data isn’t sorted in a meaningful way, the column headings seem ambiguous, and the numbers aren’t formated for viewing. This isn’t something that they’d want to insert directly into their story or paper. But if they take this and do a little bit of work:

table

They can take that raw data and turn it into information. Here we’ve moved from raw data to a presentable table. The statistics are sorted in a logical order based on total population, columns are given comprehensible names, and unecessary information (for presentation purposes) is dropped. We add commas to the numbers so they’re more legible, and we create some value by adding a percent total column. Now we have something we can use to communicate a message. But we can go further – we can take that same information and turn it into this:

chart

Now we have a chart. At this point I turn to the students and ask them what the benefit of using the chart is, followed by a discussion of trade-offs; we’ve gained something but lost something too. On the plus side, we can appeal to people’s visual sensibilities, and we can see more clearly that California has twice as many people as New York. The chart is also more concise, as it’s taking up less real estate on the page or on the screen. But we’ve exchanged conciseness for preciseness; we can no longer tell what the exact population numbers are with the chart; we can only approximate. But we can also go further:

map

We can take that same dataset and turn it into a map. Once again, we discuss the pluses and minuses. Now we can key into to people’s geographic knowledge as well as their visual senses; Ohio may be more meaningful now that we can see it on a map, rather than just seeing a number in a table. We can also see geographic patterns of clustering or diffusion, which the table or chart couldn’t show us. But with the map we’ve lost even more precision. Now we can only see that a state’s population number falls within a given range; we can’t see the precise number and can’t approximate it like we could with the chart.

At this point, one student will point out that if the chart or map is on the web, we can have the best of all worlds. If the graphic is interactive we can hover over it and see the exact population number. This leads to a discussion of the trade-offs between interactive web-based information and static information. The interactive chart or map let’s us keep precision and conciseness, but the sacrifice is complexity, portability, and preservation. It’s more complex to create, and it can only exist in it’s native environment, within a specific bundle of technology that includes programming and scripting langauges, software libraries, browsers, and operting systems. Such things go obsolete quickly and can easily break, so the shiny chart, map, or app you have today is non-functional in a year or two, and difficult to preserve. Contrast that with a static image or text, which is simple, easy to move around, depends on little else, and can make the jump from a screen to the printed page.

We sum up this little talk with the basis of what they’re trying to achieve – I use the DIK pyramid, which I was introduced to in library school (OK – this pic is the DIKW pyramid, with wisdom thrown on top – it’s public domain so I can safely use it):

DIKW-diagram

As journalists or researchers, you’re taking data and refining it to turn it into information to support your work or to commuicate a message. You take those pieces of new information and bring them together to tell a bigger story and paint a bigger picture, which we hope will lead to greater knowledge (which, unlike data and information, is something that can only be learned and not simply assesmbled and communicated). The weather is a good example – a giant log of temperature and precitiptation data isn’t going to do me much good. But if you process that data to calculate the high, low, and mean, now you have information I can use. Take that information and combine it with a radar picture and a forecast and now I have a rich information object. I can take that object and piece it together with other information – another forecast I hear on the radio, what I see out the window, my previous experiences of getting wet, my wife’s advice – to formulate a decision that I can act on. By considering all of this information – my experiences, contextualized information, and know how – and weighing it to reach a conslucion, I am using my knowledge. In this case I’ll use it carry an umbrella.

The final point is that, in their papers, the students must take the information objects that they’ve created or acquired and integrate them into their work. Many students will just copy and download a table and stick it at the back of the paper, and assume that it speaks for itself. I tell them – it doesn’t! You have to explain why it’s there; make reference to it in the paper and weave it into your research.

Overall this presentation / discussion takes all of about 10 minutes, AND THEN we move into the discussion of neighborhoods, the census, and specific datasets. I’ve contemplated skipping it all together, but ultimately decided that it’s necessary. I think it’s essential to provide some context and theory coupled with the actual sources and the pragmatic nature of finding the data. There are some librarians who are completely adverse to teaching “tools” and will speak completely in the abstract, while there are others who cut directly to listing the sources and leaving it at that. The first approach is useless because the students won’t learn what to actually do; the second apporach makes assumptions about what they know and fails to prepare them for what they’ll face. There also seems to be a clear need for me to do this – I’ve heard many faculty who have commented that students are simply tacking data tables they’ve copied off the web into the back of papers without any explanation. When I present the slide that depicts the csv file, I was initally shocked by the looks of shock on many student’s faces – like they’d never seen or heard of this before and were worried that they’d have to wrestle with it. Here’s the data-driven world, step 1.

Reading List for Geographic Information Course

Saturday, August 29th, 2009

The fall semester is here, and I’m about to start teaching the class I mentioned in my last post (an information studies course on geographic information). I thought I’d share my reading list and try out the Open Book plugin. I chose my readings based on: my particular audience (undergraduate students from many disciplines with little or no background in geography), relevance (materials appropriate in a hybrid information studies / geography course), cost (wanting to assign the students a single textbook that’s reasonably priced and covers all the bases, and will supplement with other readings), and copyright (staying within the bounds of fair use by not assigning too much from a single work). Here goes:

[openbook]1593852002[/openbook] I decided to go with Krygier and Woods Making Maps as my assigned text book. Since cartography is a visual and technical art, I thought it made sense to use a book that relies on visuals for explanations rather than text. It’s approachable, particularly for my students who won’t be coming from a geography background, affordable, wonderfully quirky, and covers all of the essentials of the geographic framework and map interpretation and design independent of specific GIS software.

[openbook]1405106727[/openbook] I’m using the first chapter of Cresswell’s book as a succinct introduction to how individuals define places, but would recommend the rest of the text for classes that cover geographic concepts and methods.

[openbook]026208354X[/openbook] I’m assigning the second and third chapters of Hill’s book. The second chapter, which discusses how people process, store, and use geographic information is the best summary of this topic that I’ve ever seen, and the third chapter is a good overview of the different types of geographic objects. As a librarian-geo nerd, I love the chapters that deal with coordinate metadata and gazetteers, but won’t be using them in this class.

[openbook]0262620014[/openbook]This is an urban planning / design classic, and I’ll have my students read the summary of Lynch’s city elements (based on his research, Lynch proposed that people mentally break the urban environment down into five types of elements in order to organize and navigate the city: paths, barriers, districts, nodes, and landmarks).

[openbook]0470129050[/openbook]This is the only traditional textbook that I’ll be borrowing from (I actually used it when I was a Freshmen, way back when). While I’m using the previous three books to discuss egocentric places, or how we as individuals conceive of place, I’m using the first chapter of this book to give the students an overview of geocentric places – the formal, defined hierarchy of places that exist in the world – and to introduce them to the concept of regions.

[openbook]0226534146[/openbook]This has become a modern classic and I almost assigned it as a second textbook. I am assigning the chapter on maps for propaganda as a background to our discussion on map interpretation and communication, and will later use the chapter on census maps to talk about the effects of data classification and choice of enumeration units.

[openbook]1934356069[/openbook]This is the only software book that I’ll be using chapters from, so the students have some formal guide for using QGIS (in addition to the QGIS documentation). I’m using the chapters on vector and raster data.

[openbook]1412910161[/openbook]This concise, excellent book deals strictly with the concepts and principles behind GIS. I’m using the chapters on spatial search and geoprocessing, but would recommend the entire book for any GIS course, novice to advanced.

In addition to chapters from these books, I’ll also be using:

  • “Revolutions in Mapping” by John Noble Wilford, National Geographic Feb 1998 – a great overview of the history of cartography
  • USGS GIS poster – if there is such a thing, this is a “web classic” and an accessible intro to GIS
  • One article from a scholarly journal and one article from a mass market magazine to illustrate how geographic research is covered and used
  • And for shameless self-promotion, I summary I wrote about US Census data – In Three Parts

Finally, an honorable mention:

[openbook] 1593855664[/openbook] If I was teaching an introductory GIS course in a geography or earth sciences department, this is certainly the book I would use, and for those of you in that boat I’d recommend checking it out. It does an excellent job of covering GIS principles without being software specific, contains exercises at the end of each chapter, and is well written and affordable. Since the scope of my course is broader than GIS and my audience more general and diverse, I opted to leave it out (but may still assign a chapter).


Copyright © 2014 Gothos. All Rights Reserved.
No computers were harmed in the 0.279 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.