Article on Processing Government Data With Python

Last month I had an article published in the code{4}lib journal, about a case study using Python to process IRS data on tax-exempt organizations (non-profits). It includes a working Python script that can be used by any one who wishes to make a place-based extract of that dataset for their geographic area of interest. The script utilizes the ZIP to ZCTA masterfile that I’ve mentioned in a previous post, and I include a discussion on wrestling with ZIP Code data. Both the script and the database are included in the download files at the bottom of the article.

I also provide a brief explanation of using OpenRefine to clean data using their text facet tools. One thing I forgot to mention in the article is that after you apply your data fixes with OpenRefine, it records the history. So if you have to process an update of the same file in the future (which I’ll have to do repeatedly), you can simply re-apply all the fixes you made in the past (which are saved in a JSON file).

While the article is pragmatic in nature, I did make an attempt to link this example to the bigger picture of data librarianship, advocating that data librarians can work to add value to datasets for their users, rather than simply pointing them to unrefined resources that many won’t be able to use.

The citation and link:

Donnelly, F. P. (2014). Processing government data: ZIP Codes, Python, and OpenRefine. code{4}lib Journal, 25 (2014-07-21). http://journal.code4lib.org/articles/9652.

As always the journal has a great mix of case studies, and this issue included an article on geospatial metadata.

While I’ve used Python quite a bit, this is the first time that I’ve written anything serious that I’ve released publicly. If there are ways I could improve it, I’d appreciate your feedback. Other than a three-day workshop I took years ago, I’m entirely self-taught and seldom have the opportunity to bounce ideas off people for this type of work. I’ve disabled the blog comments here a long time ago, but feel free to send me an email. If there’s enough interest I’ll do a follow-up post with the suggestions – mail AT gothos DOT info.

Tags: , , , , , , , , , ,

Comments are closed.


Copyright © 2017 Gothos. All Rights Reserved.
No computers were harmed in the 0.516 seconds it took to produce this page.

Designed/Developed by Lloyd Armbrust & hot, fresh, coffee.