Archive for the ‘Announcements’ Category

I’m At These Coordinates Now

Thursday, September 14th, 2017

Adios, Gothos! After almost ten years of blogging here it’s time to start anew. Instead of just slapping on a fresh coat of paint, I’ve decided to create a new site.

Please join me At These Coordinates https://atcoordinates.info.

Gothos.info will remain here until the end of January 2018. After that, the old url will just redirect you to the new site. I’ve migrated a few of the most recent posts over to my new pad, along with some greatest hits. But after January most of the content here will vanish, although the bits and bytes are preserved in the Wayback Machine in the Internet Archive.

I’ve described my rationale for moving on in the inaugural post on the new site. The last point I’ll touch on here – what the heck is Gothos anyway? Well:

squire

The Squire of Gothos is a Star Trek episode from the original series. It’s about a mercurial child super-being who creates his own planet and then attempts to hold Captain Kirk and crew hostage there to keep him company. “Welcome to an island of peace on my stormy little planet of Gothos!” Since I was creating my own little world about geospatial data and I wanted a catchy one-word title it fit the bill.

I’ll see you At These Coordinates! Best – Frank

FOSS4G 2017 Boston is One Month Away

Thursday, July 13th, 2017

So this summer we’re taking our show on the road! The Free and Open Source for Geospatial (FOSS4G) conference is “the” international conference for all things related to geospatial technology and open source software. FOSS4G 2017 is in Boston August 14-18 and the Baruch GIS team will be there! Anastasia, Janine, and I will be running our full-day introductory GIS Practicum workshop (re-dubbed “Introduction to GIS Using QGIS” for the conference) at the Harvard Center for Geographic Analysis in Cambridge on Tuesday Aug 15th. There are a slew of great workshops being offered that Monday and Tuesday, covering all technologies and user levels. The main conference runs from Wednesday to Friday.

FOSS4G is an excellent event that brings together open source GIS and mapping developers, practitioners, and educators. It’s a good place to learn new skills, make connections, and keep up with the latest developments. The main conference only comes to North America once every 3 years, and this is the first time it’s been on the east coast. So if you have some time and money to spare, check it out (August 3 is the last day to register) and come by and say hello.

The last one I attended was FOSS4G 2011 in Denver. I gave a talk about a brand new, introductory workshop with QGIS that I had just launched… 29 workshop sessions and 358 participants later, I couldn’t be happier that I’m returning to complete the circle, running the workshop at the same conference where I had initially unveiled it as a little experiment six years earlier. That conference introduced me to so many tools and ideas that I’ve carried with me in my work over the past several years. I’m eager to learn some more and to connect with some mapping / GIS / librarian pals I haven’t seen in quite a while.

In preparation for the conference and the upcoming academic year, I will be updating the GIS Practicum manual pretty soon. While QGIS 2.18 Las Palmas is currently the latest release, it is scheduled to become the new Long Term Release once version 3.0 comes out later this year. I’m going to make the switch from 2.14 to 2.18 in the next workbook, since this change is on the horizon.

Average Distance to Public Libraries in the US

Monday, February 22nd, 2016

A few months ago I had a new article published in LISR, but given the absurd restrictions of academic journal publishing I’m not allowed to publicly post the article, and have to wait 12 months before sharing my post-print copy. It is available via your local library if they have a subscription to the Science Direct database (you can also email me to request a copy). I am sharing some of the un-published state-level data that was generated for the project here.

Citation and Abstract

Regional variations in average distance to public libraries in the United States
F. Donnelly
Library & Information Science Research
Volume 37, Issue 4, October 2015, Pages 280–289
http://dx.doi.org/10.1016/j.lisr.2015.11.008

Abstract

“There are substantive regional variations in public library accessibility in the United States, which is a concern considering the civic and educational roles that libraries play in communities. Average population-weighted distances and the total population living within one mile segments of the nearest public library were calculated at a regional level for metropolitan and non-metropolitan areas, and at a state level. The findings demonstrate significant regional variations in accessibility that have been persistent over time and cannot be explained by simple population distribution measures alone. Distances to the nearest public library are higher in the South compared to other regions, with statistically significant clusters of states with lower accessibility than average. The national average population-weighted distance to the nearest public library is 2.1 miles. While this supports the use of a two-mile buffer employed in many LIS studies to measure library service areas, the degree of variation that exists between regions and states suggests that local measures should be applied to local areas.”

Purpose

I’m not going to repeat all the findings, but will provide some context.

As a follow-up to my earlier work, I was interested in trying an alternate approach for measuring public library spatial equity. I previously used the standard container approach – draw a buffer at some fixed distance around a library and count whether people are in or out, and as an approximation for individuals I used population centroids for census tracts. In my second approach, I used straight-line distance measurements from census block groups (smaller than tracts) to the nearest public library so I could compare average distances for regions and states; I also summed populations for these areas by calculating the percentage of people that lived within one-mile rings of the nearest library. I weighted the distances by population, to account for the fact that census areas vary in population size (tracts and block groups are designed to fall within an ideal population range – for block groups it’s between 600 and 3000 people).

Despite the difference in approach, the outcome was similar. Using the earlier approach (census tract centroids that fell within a library buffer that varied from 1 to 3 miles based on urban or rural setting), two-thirds of Americans fell within a “library service area”, which means that they lived within a reasonable distance to a library based on standard practices in LIS research. Using the latest approach (using block group centroids and measuring the distance to the nearest library) two-thirds of Americans lived within two miles of a public library – the average population weighted distance was 2.1 miles. Both studies illustrate that there is a great deal of variation by geographic region – people in the South consistently lived further away from public libraries compared to the national average, while people in the Northeast lived closer. Spatial Autocorrelation (LISA) revealed a cluster of states in the South with high distances and a cluster in the Northeast with low distances.

The idea in doing this research was not to model actual travel behavior to measure accessibility. People in rural areas may be accustomed to traveling greater distances, public transportation can be a factor, people may not visit the library that’s close to their home for several reasons, measuring distance along a network is more precise than Euclidean distance, etc. The point is that libraries are a public good that provide tangible benefits to communities. People that live in close proximity to a public library are more likely to reap the benefits that it provides relative to those living further away. Communities that have libraries will benefit more than communities that don’t. The distance measurements serve as a basic metric for evaluating spatial equity. So, if someone lives more than six miles away from a library that does not mean that they don’t have access; it does means they are less likely to utilize it or realize it’s benefits compared to someone who lives a mile or two away.

Data

I used the 2010 Census at the block group level, and obtained the location of public libraries from the 2010 IMLS. I improved the latter by geocoding libraries that did not have address-level coordinates, so that I had street matches for 95% of the 16,720 libraries in the dataset. The tables that I’m providing here were not published in the original article, but were tacked on as supplementary material in appendices. I wanted to share them so others could incorporate them into local studies. In most LIS research the prevailing approach for measuring library service areas is to use a buffer of 1 to 2 miles for all locations. Given the variation between states, if you wanted to use the state-average for library planning in your own state you can consider using these figures.

To provide some context, the image below shows public libraries (red stars) in relation to census block group centroids (white circles) for northern Delaware (primarily suburban) and surrounding areas (mix of suburban and rural). The line drawn between the Swedesboro and Woodstown libraries in New Jersey is 5-miles in length. I used QGIS and Spatialite for most of the work, along with Python for processing the data and Geoda for the spatial autocorrelation.

Map Example - Northern Delaware

The three tables I’m posting on the resources page are for states: one counts the 2010 Census population within one to six mile rings of the nearest public library, the second is the percentage of the total state population that falls within that ring, and the third is a summary table that calculates the mean and population-weighted distance to the nearest library by state. One set of tables is formatted text (for printing or just looking up numbers) while the other set are CSV files that you can use in a spreadsheet. I’ve included a metadata record with some methodological info, but you can read the full details in the article.

In the article itself I tabulated and examined data at a broader, regional level (Northeast, Midwest, South, and West), and also broke it down into metropolitan and non-metropolitan areas for the regions. Naturally people that live in non-metropolitan areas lived further away, but the same regional patterns existed: more people in the South in both metro and non-metro areas lived further away compared to their counterparts in other parts of the country. This weekend I stumbled across this article in the Washington Post about troubles in the Deep South, and was struck by how these maps mirrored the low library accessibility maps in my past two articles.

Update Your Links to the New Baruch Geoportal

Thursday, August 13th, 2015

A few weeks ago I launched a new version of our college’s GIS data repository, the Baruch Geoportal. At the back end I have a simplified process for getting data onto our server, and on the front end we did away with manually updating HTML and CSS webpages in favor of using a Confluence wiki. My college has a subscription to Confluence, and I’ve been using an internal wiki for documenting and administering all aspects of our projects. A public, external wiki for providing our data seemed like a nice way to go – we can focus more on the content and it’s easier for my team and I to collaborate.

Since it’s a new site with a new address, many of the links to projects I’ve referred to throughout the years on this blog are no longer valid. Redirects are in place, but they won’t last forever. Some notable links to update:

The new site has a dedicated blog that you can follow (via RSS) for the latest updates to the portal. The portal also has a number of relatively new and publicly accessible datasets that we’ve posted over the last year (but that I haven’t had time to post about). These include the NYC Mass Transit Spatial Layers series and population centroids for US census geographies. We’ve been creating ISO spatial metadata for all of our new layers, but we still need to create XML stylesheets to make them more human-readable. That will be one of many projects to do for this academic year.

baruch_geoportal

Census Proposes to Cut 3-year ACS in Fiscal 2016

Friday, February 6th, 2015

I’m coming out of my blog hibernation for this announcement – the US Census Bureau is proposing that they drop the 3-year series of the American Community Survey in fiscal year 2016. A colleague mentioned that he overheard this at a meeting yesterday. Searching the web, I found a post at the Free Government Information site which points to this Census Bureau Press release. The press release cites the predictable reasons (budget constraints, funding priorities, etc.) for dropping the series. Oddly, the news comes through some random site and not through the Census Bureau’s website, where there’s no mention of it. I saw that Stanford also had a post, where they shared the same press release.

I kept searching for some definitive proof, and through someone’s tweet I found a link to a PDF of the US Census Bureau’s Budget Estimates for Fiscal Year 2016, presented to Congress this February 2015. I found confirmation buried on page CEN – 106 (the 100th page in a 190 page doc):

Data Products

Restoration of ACS Data Products ($1.5 million): Each year, the ACS releases a wide range of data products widely used by policymakers, Federal, state and local governments, businesses and the public to make decisions on allocation of taxpayer-funds, the location of businesses and the placement of products, emergency management plans, and a host of other matters. Resource constraints have led to the cancellation of data products for areas with populations between 20 and 60 thousand based on 3-year rolling averages of ACS data (known as the “3-Year Data” Product).They have also resulted in delays in the release of the 1- and 5- year Public Use Macro Sample (PUMS) data files and canceled the release of the 5- year Comparison Profile data product and the Spanish Translation of the 1- and 5- year Puerto Rico data products.

The Census Bureau proposes to terminate permanently the 3-Year Data Product. The Census Bureau intended to produce this data product for a few years when the ACS was a new survey. Now that the ACS has collected data for nearly a decade, this product can be discontinued without serious impacts on the availability of the estimates for these communities.

The ACS would like to restore the timely release of the other essential products in FY2016. The continued absence of these data products will impact the availability of data – especially for Puerto Rico – to public and private sector decision makers.

So at this point it’s still just a proposal. The benefits, besides the ability to release other datasets in a timely fashion, would be simplification for users. Instead of choosing between three datasets now there will only be two – the one year and the five year. You choose the one year for large areas and the five year for every place else. In terms of disadvantages, consider this example – here are the number of children enrolled in nursery school in NY State PUMA 03808, which covers Murray Hill, Gramercy, and Stuyvesant Town in the eastern half of Midtown Manhattan:

PUMA NY 03808

Population Over 3 Years Old Enrolled in Nursery / Pre-school

  • 1 year 2013: 1,166 +/- 609
  • 3 year 2011-2013: 1,549 +/- 530
  • 5 year 2009-2013: 1,819 +/- 409

Since PUMAs are statistical areas built to contain 100k people, data for all of them is available in each series. Like all the ACS estimates these have a 90% confidence interval. Look at the data for the 1-year series. The margin of error (ME) is so large that’s it’s approximately 50% of the estimate, which in my opinion makes it worthless for just about any application. The estimate itself is much lower than the estimate for the other two series. It’s true that it’s only capturing the latest year, but administrative data and news reports suggest that the number of nursery school children in the district that covers this area has been relatively stable over time, with modest increases (geographically the district covers an area much larger than this PUMA). This suggests that the estimate itself is not so great.

The 5 year estimate may be closer to reality, and its ME is only 20% of the estimate. But it covers five years in time. If you wanted something that was a compromise – more timely than the five year but with a lower ME than the one year, then the three year series was your choice, in this case with an ME that’s about 33% of the estimate. But under this proposal, this choice goes away and you have to make do with either 1-year estimates (which will be lousy for geographies that aren’t far above the 65k population threshold, and lousy for small population groups where ever they are located), or better 5-year estimates that cover a greater time span.

Article on Processing Government Data With Python

Thursday, August 28th, 2014

Last month I had an article published in the code{4}lib journal, about a case study using Python to process IRS data on tax-exempt organizations (non-profits). It includes a working Python script that can be used by any one who wishes to make a place-based extract of that dataset for their geographic area of interest. The script utilizes the ZIP to ZCTA masterfile that I’ve mentioned in a previous post, and I include a discussion on wrestling with ZIP Code data. Both the script and the database are included in the download files at the bottom of the article.

I also provide a brief explanation of using OpenRefine to clean data using their text facet tools. One thing I forgot to mention in the article is that after you apply your data fixes with OpenRefine, it records the history. So if you have to process an update of the same file in the future (which I’ll have to do repeatedly), you can simply re-apply all the fixes you made in the past (which are saved in a JSON file).

While the article is pragmatic in nature, I did make an attempt to link this example to the bigger picture of data librarianship, advocating that data librarians can work to add value to datasets for their users, rather than simply pointing them to unrefined resources that many won’t be able to use.

The citation and link:

Donnelly, F. P. (2014). Processing government data: ZIP Codes, Python, and OpenRefine. code{4}lib Journal, 25 (2014-07-21). http://journal.code4lib.org/articles/9652.

As always the journal has a great mix of case studies, and this issue included an article on geospatial metadata.

While I’ve used Python quite a bit, this is the first time that I’ve written anything serious that I’ve released publicly. If there are ways I could improve it, I’d appreciate your feedback. Other than a three-day workshop I took years ago, I’m entirely self-taught and seldom have the opportunity to bounce ideas off people for this type of work. I’ve disabled the blog comments here a long time ago, but feel free to send me an email. If there’s enough interest I’ll do a follow-up post with the suggestions – mail AT gothos DOT info.

Government Shutdown: Alternate Resources for US Census Data

Sunday, October 13th, 2013

As the US government shutdown continues (thanks to a handful of ideological nutcases in congress) those of us who work with and rely on government data are re-learning the lesson of why it’s important to keep copies of things. This includes having alternate sources of information floating around on the web and in the cloud, as well as the tried and true approach of downloading and saving datasets locally. There have been a number of good posts (like this succinct one) to point users to alternatives to the federal sources that many of us rely on. I’ll go into more detail here with my suggestions on where to access US Census data, based on user-level and need.

  • The Social Explorer: this web-mapping resource for depicting and accessing US Census data from 1790 to present (including the 2010 Census and the latest American Community Survey data) is intuitive and user-friendly. Many academic and public libraries subscribe to the premium edition that provides full access to all features and datasets (so check with yours to see if you have access), while a basic free version is available online. Given the current circumstances the Social Explorer team has announced that it will open the hatch and provide free access to users who request it.
  • The NHGIS (National Historic GIS): this project is managed by the Minnesota Population Center and also provides access to all US Census data from 1790 to present. While it’s a little more complex than the Social Exlorer, the NHGIS is the better option for downloading lots of data en-masse, and is the go-to place if you need access to all datasets in their entirety, including all the detail from the American Community Survey (as the Social Explorer does not include margins of error for any of the ACS estimates) or if you need access to other datasets like the County Business Patterns. Lastly – it is the alternative to the TIGER site for GIS users who need shapefiles of census geography. You have to register to use NHGIS, but it’s free. For users who need microdata (decennial census, ACS, Current Population Survey), you can visit a related MPC project to the NHGIS: IPUMS.
  • The Missouri Census Data Center (MCDC): I’ve mentioned a number of their tools in the past; they provide easy-to-access profiles from the 2010 Census and American Community Survey, as well as historical trend reports for the ACS. For intermediate users they provide extract applications for the 2010 Census and ACS for creating spreadsheets and SAS files for download, and for advanced users the Dexter tool for downloading data en-masse from 1980 to present. Unlike the other resources no registration or sign-up is required. I also recommend the MCDC’s ACS and 2010 Census profiles to web designers and web mappers; if you’ve created online resources that tapped directly into the American Factfinder via deep links (like I did), you can use the MCDC’s profiles as an alternative. The links to their profiles are persistent and use a logical syntax (as it looks like there’s no end in site to this shutdown I may make the change-over this week). Lastly, the MCDC is a great resource for technical documentation about geography and datasets.
  • State and local government: thankfully many state and local governments have taken subsets of census data of interest to people in their areas and have recompiled and republished it on the web. These past few weeks I’ve been constantly sending students to the NYC Department of City Planning’s population resources. Take a look at your state data center’s resources, as well as local county or city planning departments, transportation agencies, or economic development offices to see what they provide.

Article on Working With the American Community Survey

Monday, June 17th, 2013

I’ve got another article that’s just hit the presses. In this one I discuss the American Community Survey: how it differs from the Decennial Census, when you should use it versus other summary data sets, how to work with the different period estimates, and how to create derived estimates and calculate their margins of error. For that last piece I’ve essentially done an extended version of this old post on Excel formulas, with several different and updated examples.

The article is available via Emerald’s journal database. If you don’t have access to it from your library feel free to contact me and I’ll send you a copy (can’t share this one freely online).

Title: The American Community Survey: practical considerations for researchers
Author(s): Francis P. Donnelly
Citation: Francis P. Donnelly, (2013) “The American Community Survey: practical considerations for researchers”, Reference Services Review, Vol. 41 Iss: 2, pp.280 – 297
Keywords: American Community Survey, Census, Census geography, Data handling, Decennial census, Demographic data, Government data processing, Government information, Margins of error, Sample-based data, United States of America, US Census Bureau
Article type: Technical paper
DOI: 10.1108/00907321311326228 (Permanent URL)
Publisher: Emerald Group Publishing Limited

The Geography of US Public Libraries

Monday, March 18th, 2013

Last month my article on the geographic disribution of US public libraries was pre-published online in JOLIS, with a print date pending. I can’t share this one freely on-line, so if you don’t have access via a library database (Sage Journals, ERIC, various LIS dbs) you can contact me if you’re interested in getting a copy.

Title: The geographic distribution of United States public libraries : An analysis of locations and service areas
Author: Francis P. Donnelly
Journal: Journal of Librarianship and Information Science (JOLIS)
Year: 2014, Volume 46, Issue 2, Pages 110-129
ISSN: 0961-0006
DOI: 10.1177/0961000612470276
Publisher: Sage

Abstract

This article explores the geography of public libraries in the United States. The distribution of libraries is examined using maps and spatial statistics to analyze spatial patterns. Methods for delineating and studying library service areas used in previous LIS research are modified and applied using geographic information systems to study variations in library accessibility by state and by socio-economic group at the national level. A history of library development is situated within the broader economic and demographic history of the US to provide insight to contemporary patterns, and Louis Round Wilson’s Geography of Reading is used as a focal point for studying historical trends. Findings show that while national library coverage is extensive, the percentage of the population that lives in a library’s geographic service area varies considerably by region and state, with Southern and Western states having lower values than Northeastern and Midwestern states.

Keywords

Geographic information systems, geography, public libraries, service areas, spatial equity, United States

This OCLC flier (How Public Libraries Stack Up) piqued my interest in public libraries as community resources, public goods, and placemaking institutions. If the presence of a public library brings so much value to a community, then by extension the lack of a public library could leave a community at a disadvantage. This led to the next set of logical questions: how are libraries distributed across the country, and which people and communties are being served and which aren’t?

I took a few different approaches to answer these questions. The first approach was to use (and learn) spatial statistics so the overall distribution could be characterized, and the second was to use spatial extraction methods to select census areas and populations that were within the service areas of each library, to see differences in how populations were served and to study these differences across different states. The LIS literature is rich with research that uses GIS to study library use, so I provide a thorough summary of what’s come before. Then after I had the results I spent a good deal of time researching how the contemporary pattern came to be, and coupled the research on the history of public libraries with the broader history of urban and economic development in the United States.

I had a few unstated motives – one of them was to learn spatial statistics, with the help of: OpenGeoda and its documentation, this excellent book on Spatial Data Analysis (for theory), these great examples from Spatial Justice, and invaluable advice from Deborah Balk, a Professor of Spatial Demography with the CUNY Institute for Demographic Research.

One of my other goals was to use only open source software – QGIS, GRASS, and OpenGeoda, which was also a success. Although in my next study I’ll probably rely on QGIS and Spatialite; I found I was doing a lot of attribute data crunching using the SQLite Manager, since the attributes of GRASS vectors can be stored in SQLite, and I could probably save time (and frustration) by using Spatialite’s features instead. I did get to learn a lot about GRASS, but for my purposes it was overkill and I would have been just fine with a spatial database. I was definetely able to sharpen my Python skills, as processing the American Community Survey data for every census tract in the US manually would have been crazy.

In a project this size there are always some pieces that end up on the cutting room floor, so I thought I’d share one here – a dot map that shows where all 16,700 public libraries are. In the article I went with a county choropleth map to show the distribution, because I was doing other county-level stuff and because the dimension restrictions on the graphic made it a more viable option. The dot map reveals that libraries are typically where people are, except that the south looks emptier and the midwest fuller than it should be, if libraries were in fact evenly distributed by population. As my research shows – they’re not.

US Public Libraries

New Version of Introductory GIS Tutorial Now Available

Sunday, October 7th, 2012

The latest version of my Introduction to GIS tutorial using QGIS is now available. I’ve completely revised how it’s organized and presented; I wrote the first two manuals in HTML, since I wanted something that gave me flexibility with inserting many images in a large document (word processors are notoriously poor at this). Over the summer I learned how to use LaTeX, and the result for this 3rd edition is an infintely better document, for classroom use or self study.

I also updated the manual for use with QGIS 1.8. I’m thinking that the addition of the Data Browser and the ability to simply select the CRS of the current layer or project when you’re doing a Save As (rather than having to select the CRS from the master list) will save a lot of valuable time in class. With every operation that we perform we’re constantly creating new files as the result of selections and geoprocessing, and I always lose a few people each time we’re forced to crawl through the file system to add new layers we’ve created. These simple changes should speed things up. I’ve updated the manual throughout to reflect these changes, and have also updated the datasets to reflect what’s currently available. I provide a summary of the most salient changes in the introduction.


Copyright © 2017 Gothos. All Rights Reserved.
No computers were harmed in the 0.442 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.