Archive for the ‘Announcements’ Category

Average Distance to Public Libraries in the US

Monday, February 22nd, 2016

A few months ago I had a new article published in LISR, but given the absurd restrictions of academic journal publishing I’m not allowed to publicly post the article, and have to wait 12 months before sharing my post-print copy. It is available via your local library if they have a subscription to the Science Direct database (you can also email me to request a copy). I am sharing some of the un-published state-level data that was generated for the project here.

Citation and Abstract

Regional variations in average distance to public libraries in the United States
F. Donnelly
Library & Information Science Research
Volume 37, Issue 4, October 2015, Pages 280–289


“There are substantive regional variations in public library accessibility in the United States, which is a concern considering the civic and educational roles that libraries play in communities. Average population-weighted distances and the total population living within one mile segments of the nearest public library were calculated at a regional level for metropolitan and non-metropolitan areas, and at a state level. The findings demonstrate significant regional variations in accessibility that have been persistent over time and cannot be explained by simple population distribution measures alone. Distances to the nearest public library are higher in the South compared to other regions, with statistically significant clusters of states with lower accessibility than average. The national average population-weighted distance to the nearest public library is 2.1 miles. While this supports the use of a two-mile buffer employed in many LIS studies to measure library service areas, the degree of variation that exists between regions and states suggests that local measures should be applied to local areas.”


I’m not going to repeat all the findings, but will provide some context.

As a follow-up to my earlier work, I was interested in trying an alternate approach for measuring public library spatial equity. I previously used the standard container approach – draw a buffer at some fixed distance around a library and count whether people are in or out, and as an approximation for individuals I used population centroids for census tracts. In my second approach, I used straight-line distance measurements from census block groups (smaller than tracts) to the nearest public library so I could compare average distances for regions and states; I also summed populations for these areas by calculating the percentage of people that lived within one-mile rings of the nearest library. I weighted the distances by population, to account for the fact that census areas vary in population size (tracts and block groups are designed to fall within an ideal population range – for block groups it’s between 600 and 3000 people).

Despite the difference in approach, the outcome was similar. Using the earlier approach (census tract centroids that fell within a library buffer that varied from 1 to 3 miles based on urban or rural setting), two-thirds of Americans fell within a “library service area”, which means that they lived within a reasonable distance to a library based on standard practices in LIS research. Using the latest approach (using block group centroids and measuring the distance to the nearest library) two-thirds of Americans lived within two miles of a public library – the average population weighted distance was 2.1 miles. Both studies illustrate that there is a great deal of variation by geographic region – people in the South consistently lived further away from public libraries compared to the national average, while people in the Northeast lived closer. Spatial Autocorrelation (LISA) revealed a cluster of states in the South with high distances and a cluster in the Northeast with low distances.

The idea in doing this research was not to model actual travel behavior to measure accessibility. People in rural areas may be accustomed to traveling greater distances, public transportation can be a factor, people may not visit the library that’s close to their home for several reasons, measuring distance along a network is more precise than Euclidean distance, etc. The point is that libraries are a public good that provide tangible benefits to communities. People that live in close proximity to a public library are more likely to reap the benefits that it provides relative to those living further away. Communities that have libraries will benefit more than communities that don’t. The distance measurements serve as a basic metric for evaluating spatial equity. So, if someone lives more than six miles away from a library that does not mean that they don’t have access; it does means they are less likely to utilize it or realize it’s benefits compared to someone who lives a mile or two away.


I used the 2010 Census at the block group level, and obtained the location of public libraries from the 2010 IMLS. I improved the latter by geocoding libraries that did not have address-level coordinates, so that I had street matches for 95% of the 16,720 libraries in the dataset. The tables that I’m providing here were not published in the original article, but were tacked on as supplementary material in appendices. I wanted to share them so others could incorporate them into local studies. In most LIS research the prevailing approach for measuring library service areas is to use a buffer of 1 to 2 miles for all locations. Given the variation between states, if you wanted to use the state-average for library planning in your own state you can consider using these figures.

To provide some context, the image below shows public libraries (red stars) in relation to census block group centroids (white circles) for northern Delaware (primarily suburban) and surrounding areas (mix of suburban and rural). The line drawn between the Swedesboro and Woodstown libraries in New Jersey is 5-miles in length. I used QGIS and Spatialite for most of the work, along with Python for processing the data and Geoda for the spatial autocorrelation.

Map Example - Northern Delaware

The three tables I’m posting on the resources page are for states: one counts the 2010 Census population within one to six mile rings of the nearest public library, the second is the percentage of the total state population that falls within that ring, and the third is a summary table that calculates the mean and population-weighted distance to the nearest library by state. One set of tables is formatted text (for printing or just looking up numbers) while the other set are CSV files that you can use in a spreadsheet. I’ve included a metadata record with some methodological info, but you can read the full details in the article.

In the article itself I tabulated and examined data at a broader, regional level (Northeast, Midwest, South, and West), and also broke it down into metropolitan and non-metropolitan areas for the regions. Naturally people that live in non-metropolitan areas lived further away, but the same regional patterns existed: more people in the South in both metro and non-metro areas lived further away compared to their counterparts in other parts of the country. This weekend I stumbled across this article in the Washington Post about troubles in the Deep South, and was struck by how these maps mirrored the low library accessibility maps in my past two articles.

Update Your Links to the New Baruch Geoportal

Thursday, August 13th, 2015

A few weeks ago I launched a new version of our college’s GIS data repository, the Baruch Geoportal. At the back end I have a simplified process for getting data onto our server, and on the front end we did away with manually updating HTML and CSS webpages in favor of using a Confluence wiki. My college has a subscription to Confluence, and I’ve been using an internal wiki for documenting and administering all aspects of our projects. A public, external wiki for providing our data seemed like a nice way to go – we can focus more on the content and it’s easier for my team and I to collaborate.

Since it’s a new site with a new address, many of the links to projects I’ve referred to throughout the years on this blog are no longer valid. Redirects are in place, but they won’t last forever. Some notable links to update:

The new site has a dedicated blog that you can follow (via RSS) for the latest updates to the portal. The portal also has a number of relatively new and publicly accessible datasets that we’ve posted over the last year (but that I haven’t had time to post about). These include the NYC Mass Transit Spatial Layers series and population centroids for US census geographies. We’ve been creating ISO spatial metadata for all of our new layers, but we still need to create XML stylesheets to make them more human-readable. That will be one of many projects to do for this academic year.


Census Proposes to Cut 3-year ACS in Fiscal 2016

Friday, February 6th, 2015

I’m coming out of my blog hibernation for this announcement – the US Census Bureau is proposing that they drop the 3-year series of the American Community Survey in fiscal year 2016. A colleague mentioned that he overheard this at a meeting yesterday. Searching the web, I found a post at the Free Government Information site which points to this Census Bureau Press release. The press release cites the predictable reasons (budget constraints, funding priorities, etc.) for dropping the series. Oddly, the news comes through some random site and not through the Census Bureau’s website, where there’s no mention of it. I saw that Stanford also had a post, where they shared the same press release.

I kept searching for some definitive proof, and through someone’s tweet I found a link to a PDF of the US Census Bureau’s Budget Estimates for Fiscal Year 2016, presented to Congress this February 2015. I found confirmation buried on page CEN – 106 (the 100th page in a 190 page doc):

Data Products

Restoration of ACS Data Products ($1.5 million): Each year, the ACS releases a wide range of data products widely used by policymakers, Federal, state and local governments, businesses and the public to make decisions on allocation of taxpayer-funds, the location of businesses and the placement of products, emergency management plans, and a host of other matters. Resource constraints have led to the cancellation of data products for areas with populations between 20 and 60 thousand based on 3-year rolling averages of ACS data (known as the “3-Year Data” Product).They have also resulted in delays in the release of the 1- and 5- year Public Use Macro Sample (PUMS) data files and canceled the release of the 5- year Comparison Profile data product and the Spanish Translation of the 1- and 5- year Puerto Rico data products.

The Census Bureau proposes to terminate permanently the 3-Year Data Product. The Census Bureau intended to produce this data product for a few years when the ACS was a new survey. Now that the ACS has collected data for nearly a decade, this product can be discontinued without serious impacts on the availability of the estimates for these communities.

The ACS would like to restore the timely release of the other essential products in FY2016. The continued absence of these data products will impact the availability of data – especially for Puerto Rico – to public and private sector decision makers.

So at this point it’s still just a proposal. The benefits, besides the ability to release other datasets in a timely fashion, would be simplification for users. Instead of choosing between three datasets now there will only be two – the one year and the five year. You choose the one year for large areas and the five year for every place else. In terms of disadvantages, consider this example – here are the number of children enrolled in nursery school in NY State PUMA 03808, which covers Murray Hill, Gramercy, and Stuyvesant Town in the eastern half of Midtown Manhattan:

PUMA NY 03808

Population Over 3 Years Old Enrolled in Nursery / Pre-school

  • 1 year 2013: 1,166 +/- 609
  • 3 year 2011-2013: 1,549 +/- 530
  • 5 year 2009-2013: 1,819 +/- 409

Since PUMAs are statistical areas built to contain 100k people, data for all of them is available in each series. Like all the ACS estimates these have a 90% confidence interval. Look at the data for the 1-year series. The margin of error (ME) is so large that’s it’s approximately 50% of the estimate, which in my opinion makes it worthless for just about any application. The estimate itself is much lower than the estimate for the other two series. It’s true that it’s only capturing the latest year, but administrative data and news reports suggest that the number of nursery school children in the district that covers this area has been relatively stable over time, with modest increases (geographically the district covers an area much larger than this PUMA). This suggests that the estimate itself is not so great.

The 5 year estimate may be closer to reality, and its ME is only 20% of the estimate. But it covers five years in time. If you wanted something that was a compromise – more timely than the five year but with a lower ME than the one year, then the three year series was your choice, in this case with an ME that’s about 33% of the estimate. But under this proposal, this choice goes away and you have to make do with either 1-year estimates (which will be lousy for geographies that aren’t far above the 65k population threshold, and lousy for small population groups where ever they are located), or better 5-year estimates that cover a greater time span.

Article on Processing Government Data With Python

Thursday, August 28th, 2014

Last month I had an article published in the code{4}lib journal, about a case study using Python to process IRS data on tax-exempt organizations (non-profits). It includes a working Python script that can be used by any one who wishes to make a place-based extract of that dataset for their geographic area of interest. The script utilizes the ZIP to ZCTA masterfile that I’ve mentioned in a previous post, and I include a discussion on wrestling with ZIP Code data. Both the script and the database are included in the download files at the bottom of the article.

I also provide a brief explanation of using OpenRefine to clean data using their text facet tools. One thing I forgot to mention in the article is that after you apply your data fixes with OpenRefine, it records the history. So if you have to process an update of the same file in the future (which I’ll have to do repeatedly), you can simply re-apply all the fixes you made in the past (which are saved in a JSON file).

While the article is pragmatic in nature, I did make an attempt to link this example to the bigger picture of data librarianship, advocating that data librarians can work to add value to datasets for their users, rather than simply pointing them to unrefined resources that many won’t be able to use.

The citation and link:

Donnelly, F. P. (2014). Processing government data: ZIP Codes, Python, and OpenRefine. code{4}lib Journal, 25 (2014-07-21).

As always the journal has a great mix of case studies, and this issue included an article on geospatial metadata.

While I’ve used Python quite a bit, this is the first time that I’ve written anything serious that I’ve released publicly. If there are ways I could improve it, I’d appreciate your feedback. Other than a three-day workshop I took years ago, I’m entirely self-taught and seldom have the opportunity to bounce ideas off people for this type of work. I’ve disabled the blog comments here a long time ago, but feel free to send me an email. If there’s enough interest I’ll do a follow-up post with the suggestions – mail AT gothos DOT info.

Government Shutdown: Alternate Resources for US Census Data

Sunday, October 13th, 2013

As the US government shutdown continues (thanks to a handful of ideological nutcases in congress) those of us who work with and rely on government data are re-learning the lesson of why it’s important to keep copies of things. This includes having alternate sources of information floating around on the web and in the cloud, as well as the tried and true approach of downloading and saving datasets locally. There have been a number of good posts (like this succinct one) to point users to alternatives to the federal sources that many of us rely on. I’ll go into more detail here with my suggestions on where to access US Census data, based on user-level and need.

  • The Social Explorer: this web-mapping resource for depicting and accessing US Census data from 1790 to present (including the 2010 Census and the latest American Community Survey data) is intuitive and user-friendly. Many academic and public libraries subscribe to the premium edition that provides full access to all features and datasets (so check with yours to see if you have access), while a basic free version is available online. Given the current circumstances the Social Explorer team has announced that it will open the hatch and provide free access to users who request it.
  • The NHGIS (National Historic GIS): this project is managed by the Minnesota Population Center and also provides access to all US Census data from 1790 to present. While it’s a little more complex than the Social Exlorer, the NHGIS is the better option for downloading lots of data en-masse, and is the go-to place if you need access to all datasets in their entirety, including all the detail from the American Community Survey (as the Social Explorer does not include margins of error for any of the ACS estimates) or if you need access to other datasets like the County Business Patterns. Lastly – it is the alternative to the TIGER site for GIS users who need shapefiles of census geography. You have to register to use NHGIS, but it’s free. For users who need microdata (decennial census, ACS, Current Population Survey), you can visit a related MPC project to the NHGIS: IPUMS.
  • The Missouri Census Data Center (MCDC): I’ve mentioned a number of their tools in the past; they provide easy-to-access profiles from the 2010 Census and American Community Survey, as well as historical trend reports for the ACS. For intermediate users they provide extract applications for the 2010 Census and ACS for creating spreadsheets and SAS files for download, and for advanced users the Dexter tool for downloading data en-masse from 1980 to present. Unlike the other resources no registration or sign-up is required. I also recommend the MCDC’s ACS and 2010 Census profiles to web designers and web mappers; if you’ve created online resources that tapped directly into the American Factfinder via deep links (like I did), you can use the MCDC’s profiles as an alternative. The links to their profiles are persistent and use a logical syntax (as it looks like there’s no end in site to this shutdown I may make the change-over this week). Lastly, the MCDC is a great resource for technical documentation about geography and datasets.
  • State and local government: thankfully many state and local governments have taken subsets of census data of interest to people in their areas and have recompiled and republished it on the web. These past few weeks I’ve been constantly sending students to the NYC Department of City Planning’s population resources. Take a look at your state data center’s resources, as well as local county or city planning departments, transportation agencies, or economic development offices to see what they provide.

Article on Working With the American Community Survey

Monday, June 17th, 2013

I’ve got another article that’s just hit the presses. In this one I discuss the American Community Survey: how it differs from the Decennial Census, when you should use it versus other summary data sets, how to work with the different period estimates, and how to create derived estimates and calculate their margins of error. For that last piece I’ve essentially done an extended version of this old post on Excel formulas, with several different and updated examples.

The article is available via Emerald’s journal database. If you don’t have access to it from your library feel free to contact me and I’ll send you a copy (can’t share this one freely online).

Title: The American Community Survey: practical considerations for researchers
Author(s): Francis P. Donnelly
Citation: Francis P. Donnelly, (2013) “The American Community Survey: practical considerations for researchers”, Reference Services Review, Vol. 41 Iss: 2, pp.280 – 297
Keywords: American Community Survey, Census, Census geography, Data handling, Decennial census, Demographic data, Government data processing, Government information, Margins of error, Sample-based data, United States of America, US Census Bureau
Article type: Technical paper
DOI: 10.1108/00907321311326228 (Permanent URL)
Publisher: Emerald Group Publishing Limited

The Geography of US Public Libraries

Monday, March 18th, 2013

Last month my article on the geographic disribution of US public libraries was pre-published online in JOLIS, with a print date pending. I can’t share this one freely on-line, so if you don’t have access via a library database (Sage Journals, ERIC, various LIS dbs) you can contact me if you’re interested in getting a copy.

Title: The geographic distribution of United States public libraries : An analysis of locations and service areas
Author: Francis P. Donnelly
Journal: Journal of Librarianship and Information Science (JOLIS)
Year: 2014, Volume 46, Issue 2, Pages 110-129
ISSN: 0961-0006
DOI: 10.1177/0961000612470276
Publisher: Sage


This article explores the geography of public libraries in the United States. The distribution of libraries is examined using maps and spatial statistics to analyze spatial patterns. Methods for delineating and studying library service areas used in previous LIS research are modified and applied using geographic information systems to study variations in library accessibility by state and by socio-economic group at the national level. A history of library development is situated within the broader economic and demographic history of the US to provide insight to contemporary patterns, and Louis Round Wilson’s Geography of Reading is used as a focal point for studying historical trends. Findings show that while national library coverage is extensive, the percentage of the population that lives in a library’s geographic service area varies considerably by region and state, with Southern and Western states having lower values than Northeastern and Midwestern states.


Geographic information systems, geography, public libraries, service areas, spatial equity, United States

This OCLC flier (How Public Libraries Stack Up) piqued my interest in public libraries as community resources, public goods, and placemaking institutions. If the presence of a public library brings so much value to a community, then by extension the lack of a public library could leave a community at a disadvantage. This led to the next set of logical questions: how are libraries distributed across the country, and which people and communties are being served and which aren’t?

I took a few different approaches to answer these questions. The first approach was to use (and learn) spatial statistics so the overall distribution could be characterized, and the second was to use spatial extraction methods to select census areas and populations that were within the service areas of each library, to see differences in how populations were served and to study these differences across different states. The LIS literature is rich with research that uses GIS to study library use, so I provide a thorough summary of what’s come before. Then after I had the results I spent a good deal of time researching how the contemporary pattern came to be, and coupled the research on the history of public libraries with the broader history of urban and economic development in the United States.

I had a few unstated motives – one of them was to learn spatial statistics, with the help of: OpenGeoda and its documentation, this excellent book on Spatial Data Analysis (for theory), these great examples from Spatial Justice, and invaluable advice from Deborah Balk, a Professor of Spatial Demography with the CUNY Institute for Demographic Research.

One of my other goals was to use only open source software – QGIS, GRASS, and OpenGeoda, which was also a success. Although in my next study I’ll probably rely on QGIS and Spatialite; I found I was doing a lot of attribute data crunching using the SQLite Manager, since the attributes of GRASS vectors can be stored in SQLite, and I could probably save time (and frustration) by using Spatialite’s features instead. I did get to learn a lot about GRASS, but for my purposes it was overkill and I would have been just fine with a spatial database. I was definetely able to sharpen my Python skills, as processing the American Community Survey data for every census tract in the US manually would have been crazy.

In a project this size there are always some pieces that end up on the cutting room floor, so I thought I’d share one here – a dot map that shows where all 16,700 public libraries are. In the article I went with a county choropleth map to show the distribution, because I was doing other county-level stuff and because the dimension restrictions on the graphic made it a more viable option. The dot map reveals that libraries are typically where people are, except that the south looks emptier and the midwest fuller than it should be, if libraries were in fact evenly distributed by population. As my research shows – they’re not.

US Public Libraries

New Version of Introductory GIS Tutorial Now Available

Sunday, October 7th, 2012

The latest version of my Introduction to GIS tutorial using QGIS is now available. I’ve completely revised how it’s organized and presented; I wrote the first two manuals in HTML, since I wanted something that gave me flexibility with inserting many images in a large document (word processors are notoriously poor at this). Over the summer I learned how to use LaTeX, and the result for this 3rd edition is an infintely better document, for classroom use or self study.

I also updated the manual for use with QGIS 1.8. I’m thinking that the addition of the Data Browser and the ability to simply select the CRS of the current layer or project when you’re doing a Save As (rather than having to select the CRS from the master list) will save a lot of valuable time in class. With every operation that we perform we’re constantly creating new files as the result of selections and geoprocessing, and I always lose a few people each time we’re forced to crawl through the file system to add new layers we’ve created. These simple changes should speed things up. I’ve updated the manual throughout to reflect these changes, and have also updated the datasets to reflect what’s currently available. I provide a summary of the most salient changes in the introduction.

American Factfinder Tutorial & Census Geography Updates

Monday, July 23rd, 2012

I’ve been en-meshed in the census lately as I’ve been writing a paper about the American Community Survey. Here are a few a things to share:

  • Since I frequently receive questions about how to use the American Factfinder, I’ve created a brief tutorial with screenshots demonstrating a few ways to navigate it. I illustrate how to download a profile for a single census tract from the American Community Survey, and how to download a table for all ZIP Code Tabulation Areas (ZCTAs) in a county using the 2010 Census.
  • New boundaries for PUMAs based on 2010 census geography have been released; they’re not available from the TIGER web-based interface yet but you get can state-based files from the FTP site. I’ve downloaded the boundaries for New York and there are small changes here and there from the 2000 Census boundaries; not surprising as PUMAs are built from tracts and tract boundaries have changed. One big bonus is that PUMAs now have names associated with them, based on local government suggestions. In NY State they either take the name of counties with some directional element (east, central, south, etc), or the name of MCDs that are contained within them. In NYC they’ve been given the names of community districts.
  • I’ve done some digging through the FAQs at and discovered that the census is going to stick with the old 2000 PUMA boundaries for the next release of the American Community Survey – the 2011 ACS will be released at the end of this year. 2010 PUMAs won’t be used until the 2012 ACS, to be released at the end of 2013.
  • Urban Areas are the other holdovers in the ACS that use 2000 vintage boundaries. The ACS will also transition to the 2010 boundaries for urban areas in the 2012 ACS.
  • In the course of my digging I discovered that the census will begin including ZCTA-level data as part of the 5-year ACS estimates, beginning with the 2011 release this year. 2010 ZCTA boundaries are already available, and 2010 Census data has already been released for ZCTAs. The ACS will use the 2010 vintage ZCTAs for each release until they’re redrawn for 2020.

GIS Workshops This Apr & May

Sunday, March 25th, 2012

This semester I’ll be teaching three workshops with Prof. Deborah Balk in spatial tools and analysis. Sponsored by the CUNY Institute of Demographic Research (CIDR), the workshops will be held on Baruch College’s campus in midtown NYC on Friday afternoons. The course is primarily intended for data and policy analysts who want to gain familiarity with the basics of map making and spatial analysis; registration is open to anyone. The workshops progress from basic to intermediate skills that cover making a map (Apr 27th), geospatial calculations (May 4th), and geospatial analysis (May 11th). We’ll be using QGIS and participants will work off of their own laptops; we’ll also be demonstrating some of the processes in ArcGIS and participants will receive an evaluation copy of that software. Each workshop is $300 or you can register for all three for $750.

For full details check out this flier. You can register via the College’s CAPS website; do a search for DEM and register for each session (DEM0003, DEM0004, and DEM0005).

Copyright © 2017 Gothos. All Rights Reserved.
No computers were harmed in the 0.572 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.