<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Gothos &#187; Data Processing</title>
	<atom:link href="http://gothos.info/category/data-processing/feed/" rel="self" type="application/rss+xml" />
	<link>http://gothos.info</link>
	<description>A Geospatial Librarian's World</description>
	<lastBuildDate>Mon, 26 Jul 2010 13:18:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Learning Python at PyCamp</title>
		<link>http://gothos.info/2010/06/learning-python-at-pycamp/</link>
		<comments>http://gothos.info/2010/06/learning-python-at-pycamp/#comments</comments>
		<pubDate>Thu, 10 Jun 2010 19:39:09 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Resources]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[pycamp]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=513</guid>
		<description><![CDATA[I got back from leave a couple week ago, and spent part of it at a Python boot camp. I&#8217;ve gotten tired of hacking away at data in spreadsheets and read in several places that Python is a good language to learn for beginning programmers &#8211; it&#8217;s also open source, flexible, and is used by [...]]]></description>
			<content:encoded><![CDATA[<p>I got back from leave a couple week ago, and spent part of it at a Python boot camp. I&#8217;ve gotten tired of hacking away at data in spreadsheets and read in several places that Python is a good language to learn for beginning programmers &#8211; it&#8217;s also open source, flexible, and is used by many in the GIS community for processing data and building plugins and software (the instructor for the camp, Chris Calloway, pointed me to this presentation on <a href="http://proceedings.esri.com/library/userconf/devsummit09/papers/pythonscriptingadvancedtechniques.pdf target="_blank"">Python scripting techniques for ArcGIS</a>).</p>
<p>The workshop was a three-day event hosted at Penn State by the Triangle Zope and Python Users Group (<a href="http://trizpug.org/" target="_blank">TriZPUG</a>). It was geared towards beginners and non-programmers (although many of my fellow classmates were IT and systems people) and provided a pretty thorough review of all of the elements of the language &#8211; now it&#8217;s up to me to tie it all together! The price was extremely reasonable (only $300 for a 3 day class!) and I&#8217;d certainly recommend it if there&#8217;s a camp in your area; although I would also recommend reading a book or taking a tutorial to familiarize yourself with the basics BEFORE attending the class; I did, and as a result I think I got more out of it than I would have had going in cold.</p>
<p>The next <a href="http://trizpug.org/boot-camp/" target="_blank">PyCamp</a> is being held in LA in a few days, and the following one will be in Toronto from Aug 30th to Sept 3rd (although this isn&#8217;t posted on the website yet); the normal workshop is a five day affair, the one I attended was a mini 3 day version which suited my needs pretty well.</p>
<p>There are tons of Python tutorials on the web and Python&#8217;s site is pretty definitive. If you&#8217;re looking for a book, I&#8217;d recommend <a href="http://www.worldcat.org/title/practical-programming-an-introduction-to-computer-science-using-python/oclc/251217630" target="_blank">Practical Programming: An Introduction to Computer Science Using Python</a>. Unlike the &#8220;Learn Language X&#8221; books, this one introduces you to general theory and practice in programming, and the authors illustrate the applications with practical examples using Python &#8211; it&#8217;s been immensely helpful to me. Now that I&#8217;m around the initial learning curve, I&#8217;ve been relying more on <a href="http://www.worldcat.org/title/beginning-python-from-novice-to-professional/oclc/62150088" target="_blank">Beginning Python: From Novice to Professional</a>, which is better as a reference book and good for illustrating many of the uses for individual objects, methods, etc (which I had a hard time grasping before I covered the basics of programming).</p>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2010/06/learning-python-at-pycamp/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google Maps to Create a Census Finding Aid</title>
		<link>http://gothos.info/2010/05/google-maps-to-create-a-census-finding-aid/</link>
		<comments>http://gothos.info/2010/05/google-maps-to-create-a-census-finding-aid/#comments</comments>
		<pubDate>Thu, 13 May 2010 20:19:17 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Maps]]></category>
		<category><![CDATA[Resources]]></category>
		<category><![CDATA[american community survey]]></category>
		<category><![CDATA[census data]]></category>
		<category><![CDATA[google maps]]></category>
		<category><![CDATA[kml]]></category>
		<category><![CDATA[new york city]]></category>
		<category><![CDATA[pumas]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=472</guid>
		<description><![CDATA[Yikes! It&#8217;s been quite awhile since my last post (the past couple months have been a little tough for me), but I just finished an interesting project that I can share.
I constantly get questions from students who are interested in getting recent demographic and socio-economic profiles for neighborhoods in New York City. The problem is [...]]]></description>
			<content:encoded><![CDATA[<p>Yikes! It&#8217;s been quite awhile since my last post (the past couple months have been a little tough for me), but I just finished an interesting project that I can share.</p>
<p>I constantly get questions from students who are interested in getting recent demographic and socio-economic profiles for neighborhoods in New York City. The problem is that neighborhoods are not officially defined, so we have to look for a surrogate. The City has created neighborhood-like areas out of census tracts called community districts and they publish profiles for them, but this data is from the decennial census  and not current enough for their needs.  ZIP code data is also only available from the decennial census.</p>
<p>We can use PUMAs (Public Use Microdata Areas) to approximate neighborhoods in large cities, and they are published as part of the 3 year estimates of the American Community Survey. The problem is, in order to look up the data from the census you need to search by PUMA number &#8211; there are no qualitative place names. The city and the census have worked together to assign names to neighborhoods as part of the <a href="http://www.census.gov/hhes/www/housing/nychvs/nychvs.html" target="_blank">NYC Housing and Vacancy Survey</a>, but this is the only place (I&#8217;ve found) that uses these names. You need to look in several places to figure out what the PUMA number and boundaries for an area are and then navigate through the census site to find it. Too much for the average student who visits me at the reference desk or emails me looking for data.</p>
<p>My solution was to create a finding aid in Google maps that tied everything together:<br />
<iframe width="425" height="350" frameborder="0" scrolling="no" marginheight="0" marginwidth="0" src="http://maps.google.com/maps?f=q&amp;source=embed&amp;hl=en&amp;geocode=&amp;q=http:%2F%2Fwww.baruch.cuny.edu%2Fgeoportal%2Fkml%2Fpumas_nyc.kml&amp;sll=37.0625,-95.677068&amp;sspn=31.922255,56.513672&amp;ie=UTF8&amp;ll=40.697488,-73.979681&amp;spn=0.364413,0.583649&amp;output=embed"></iframe><br /><small><a href="http://maps.google.com/maps?f=q&amp;source=embed&amp;hl=en&amp;geocode=&amp;q=http:%2F%2Fwww.baruch.cuny.edu%2Fgeoportal%2Fkml%2Fpumas_nyc.kml&amp;sll=37.0625,-95.677068&amp;sspn=31.922255,56.513672&amp;ie=UTF8&amp;ll=40.697488,-73.979681&amp;spn=0.364413,0.583649" style="color:#0000FF;text-align:left">View Larger Map</a></small></p>
<p><a href="http://gothos.info/wp-content/uploads/2010/05/puma_kml_4.png"><img class="size-thumbnail wp-image-491  alignleft" style="margin: 10px;" title="puma_kml_4" src="http://gothos.info/wp-content/uploads/2010/05/puma_kml_4-150x150.png" alt="" width="150" height="150" /></a> I downloaded PUMA boundaries from the Census TIGER file site in a shapefile format. I opened them up in ArcGIS and used an excellent script that I downloaded called <a href="http://arcscripts.esri.com/details.asp?dbid=14273" target="_blank">Export to KML</a>. ArcGIS 9.3 does support KML exports via the toolbox, and there are a number of other scripts and stand-alone programs that can do this (I tried several) but Export to KML was best (assuming you have access to ArcGIS) in terms of the level of customization and the thoroughness of the user documentation. I symbolized the PUMAs in ArcGIS using the colors and line thickness that I wanted and fired up the tool. It allows you to automatically group and color features based on the layer&#8217;s symbology. I was able to add a &#8220;snippet&#8221; to each feature to help identify it (I used the PUMA number as the attribute name and the neighborhood name as my snippet, so both appear in the legend) and added a description that would appear in the pop up window when that feature is clicked. In that description, I added the URL from the ACS census profile page for a particular PUMA &#8211; the cool part here is that the URL is consistent and contains the PUMA number. So, I replaced the specific number and inserted the [field] name from the PUMAs attribute table that contained the number. When I did the export, the URLs for each individual feature were created with their PUMA number inserted into the link.</p>
<p>There were a few quirks &#8211; I discovered that you can&#8217;t automatically display labels on a Google Map without subterfuge, like creating the <a href="http://www.google.com/support/forum/p/maps/thread?tid=489ca54d8665228a&amp;hl=en" target="_blank">labels as images and not text</a>. Google Earth (but not Maps) supports labels if you <a href="http://www.google.com/support/forum/p/earth/thread?tid=5b0422ebdb13f5ea&amp;hl=en" target="_blank">create multi-geometry</a> where you have a point for a label and a polygon for the feature. If you select a labeling attribute on the initial options screen of the Export to KML tool, you create an icon in the middle of each polygon that has a different description pop-up (which I didn&#8217;t want so I left it to none and lived without labels). I made my features 75% transparent (a handy feature of Export to KML) so that you could see the underlying Google Map features through the PUMA, but this made the fill AND the lines transparent, making the features too difficult to see. After the export I opened the KML in a text editor and changed the color values for the lines / boundaries by hand, which was easy since the styles are saved by feature group (boroughs) and not by individual feature (pumas). I also manually changed the value of the folder open element (from 0 to 1) so that the feature and feature groups (pumas and boroughs) are expanded by default when someone opens the map.</p>
<p>After making the manual edits, I uploaded the KML to my webserver and pasted the url for it into the Google Maps search box, which overlayed my KML on the map. Then I was able to get a persistent link to the map and code for embedding it into websites via the Google Map Interface. No need to add it to Google My Maps, as I have my own space. One big quirk &#8211; it&#8217;s difficult to make changes to an existing KML once you&#8217;ve uploaded and displayed it. After I uploaded what I thought would be my final version I noticed a typo. So I fixed it locally, uploaded the KML and overwrote the old one. But &#8211; the changes I made didn&#8217;t appear. I tried reloading and clearing the cache in my browser, but no good &#8211; once the KML is uploaded and Google caches it, you won&#8217;t see any of your changes until Google re-caches. The conventional wisdom is to change the name of the file every single time &#8211; which is pretty dumb as you&#8217;ll never be able to have a persistent link to anything. There are <a href="http://www.google.com/support/forum/p/maps/thread?tid=5a442227daf9e5fd&amp;hl=en" target="_blank">ways to circumvent the problem</a>, or you can just wait it out. I waited one day and by the next the file was updated; good enough for me, as I&#8217;ll only need to update it once a year.</p>
<p>I&#8217;m hosting the map, along with some static PDF maps and a spreadsheet of PUMA names and neighborhood numbers, from the <a href="http://guides.newman.baruch.cuny.edu/content.php?pid=95819&amp;sid=985209" target="_blank">NYC Data LibGuide</a> I created (part of my college&#8217;s collection of <a href="http://guides.newman.baruch.cuny.edu/index.php" target="_blank">research guides</a>). If you&#8217;re looking for neighborhood names to associate with PUMA numbers for your city, you&#8217;ll have to hunt around and see if a local planning agency or non-profit has created them for a project or research study (as the Census Bureau does not create them). For example, the County of Los Angeles Department of Mental Health uses pumas in a <a href="http://dmh.lacounty.gov/AboutDMH/MHSA/MHSA_Plans/PEI/Data/service_area_profiles.html" target="_blank">large study</a> they did where they associated local place names with each puma.</p>
<p>If you&#8217;re interested in dabbling in some KML, there&#8217;s <a href="http://code.google.com/apis/kml/documentation/kml_tut.html" target="_blank">Google&#8217;s KML tutorial</a>. I&#8217;d also recommend <a href="http://www.worldcat.org/title/kml-handbook-geographic-visualization-for-the-web/oclc/227921862" target="_blank">The KML Handbook</a> by Josie Wernecke. The catch for any guide to KML is that while all KML elements are supported by Google Earth, there&#8217;s only <a href="http://code.google.com/apis/kml/documentation/mapsSupport.html" target="_blank">partial support for Google Maps</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2010/05/google-maps-to-create-a-census-finding-aid/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Calculated Fields in SpatiaLite / SQLite</title>
		<link>http://gothos.info/2010/02/calculated-fields-in-spatialite-sqlite/</link>
		<comments>http://gothos.info/2010/02/calculated-fields-in-spatialite-sqlite/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 14:22:48 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[calculated fields]]></category>
		<category><![CDATA[geodatabase]]></category>
		<category><![CDATA[joining tables]]></category>
		<category><![CDATA[new york city]]></category>
		<category><![CDATA[pumas]]></category>
		<category><![CDATA[qgis]]></category>
		<category><![CDATA[spatialite]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[sqlite]]></category>
		<category><![CDATA[Tutorial]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=418</guid>
		<description><![CDATA[After downloading data, it&#8217;s pretty common that you&#8217;ll want to create calculated fields, such as percent totals or change, to use for analysis and mapping. The next step in my QGIS / SpatiaLite experiment was to create a calculated field (aka derived field). I&#8217;ll run through three ways of accomplishing this, using my subway commuter [...]]]></description>
			<content:encoded><![CDATA[<p>After downloading data, it&#8217;s pretty common that you&#8217;ll want to create calculated fields, such as percent totals or change, to use for analysis and mapping. The next step in my QGIS / SpatiaLite experiment was to create a calculated field (aka derived field). I&#8217;ll run through three ways of accomplishing this, using my subway commuter data to calculate the percentage of workers in each NYC PUMA who commute to work. Just to keep everything straight:</p>
<blockquote>
<ul>
<li> sub_commuters is a census data table for all PUMAs in NY State
<ul>
<li>[SUBWAY] field that has the labor force that commutes by subway</li>
<li>[WORKERS_16] field with the total labor force</li>
<li>[SUB_PER] a calculated field with the % of labor force that commutes by subway</li>
<li>[GEO_ID2] the primary key field, FIPS code that is the unqiue identifier</li>
</ul>
</li>
<li> nyc_pumas is a feature class with all PUMAs in NYC
<ul>
<li>[PUMA5ID00] is the primary key field, FIPS code that is the unqiue identifier</li>
</ul>
</li>
<li>pumas_nyc_subcom is the data table that results from joining sub_commuters and nyc_pumas; it can be converted to a feature class for mapping</li>
</ul>
</blockquote>
<h3>Spreadsheet</h3>
<p>The first method would be to add the calculated field to the data after downloading it from the census in a spreadsheet, as part of the cleaning / preparation stage. You could then save it as a delimited text file for import to SpatiaLite. No magic there, so I&#8217;ll skip to the second method.</p>
<h3>SpatiaLite</h3>
<p>The second method would be to create the calculated field in the SpatiaLite database. I&#8217;ll go through the steps I used to figure this out. The basic SQL select query:</p>
<blockquote><p>SELECT *, (SUBWAY / WORKERS_16) AS SUB_PER FROM sub_commuters</p></blockquote>
<p>This gives us the proper result, but there are two problems. First, the data in my SUBWAY and WORKERS_16 field are stored as integers, and when you divide the result is rounded to the nearest whole number. Not very helpful here, as my percentage results get rounded to 0 or 1. There are many ways to work around this: set the numeric fields as double, real, or float in the spreadsheet before import (didn&#8217;t work for me), specify the field types when importing (didn&#8217;t get that option with the SpatiaLite GUI, but maybe you can with the command line), add * 100 to the expression to multiply the percentage to a whole number (ok unless you need decimals in your result) or use the CAST operator. CAST converts the current data type of a field to a specified data type in the result of the expression. So:</p>
<blockquote><p>SELECT *, (CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL)) AS SUB_PER FROM sub_commuters</p></blockquote>
<p><a href="http://gothos.info/wp-content/uploads/2010/02/sql1.png"><img class="alignleft size-thumbnail wp-image-424" style="margin-left: 20px; margin-right: 20px;" title="sql1" src="http://gothos.info/wp-content/uploads/2010/02/sql1-150x150.png" alt="" width="90" height="90" /></a>This gave me the percentages with several decimal places (since we&#8217;re casting the fields as real instead of integer), which is what I needed. The second problem is that this query just produces a temporary view; in order to map this data, we need to create a new table to make the calculated field permanent and join it to a feature class. Here&#8217;s how we do that:</p>
<blockquote><p>CREATE TABLE pumas_nyc_subcom AS<br />
SELECT *, (CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL)) AS SUB_PER<br />
FROM sub_commuters, nyc_pumas<br />
WHERE nyc_pumas.PUMA5ID00=sub_commuters.geo_id2</p></blockquote>
<p>The CREATE TABLE AS statement let&#8217;s us create a new table from the existing two tables &#8211; the data table of subway commuters and the feature class table for NYC PUMAs. We select all the fields in both while throwing in the new calculated field, and we join the data table to the feature class all in one step, and via the join we end up with just data from NYC (the data for the rest of the state gets dropped). After that, it&#8217;s just a matter of taking our new table and enabling the geometry to make it a feature class (as explained in the <a href="http://gothos.info/?p=378">previous post</a>).</p>
<p><a href="http://gothos.info/wp-content/uploads/2010/02/sql2.png"><img class="alignleft size-thumbnail wp-image-425" style="margin-left: 20px; margin-right: 20px;" title="sql2" src="http://gothos.info/wp-content/uploads/2010/02/sql2-150x150.png" alt="" width="90" height="90" /></a>This seems like it should work &#8211; but I discovered another problem. The resulting calculated field that has the percentage of subway commuters per PUMA, SUB_PER, has no data type associated with it. Looking at the schema for the table in SpatiaLite shows that the data type is blank. If I bring this into QGIS, I&#8217;m not able to map this field as a numeric value, because QGIS doesn&#8217;t know what it is. I have to define the data type for this field. SpatiaLite (SQLite really) doesn&#8217;t allow you to re-define an existing field &#8211; we have to create and define a new blank field, and the set the value of our calculated field equal to it. Here are the SQL statements to make it all happen:</p>
<blockquote><p>ALTER TABLE sub_commuters ADD SUB_PER REAL</p></blockquote>
<blockquote><p>UPDATE sub_commuters SET SUB_PER=(CAST (SUBWAY AS REAL)/ CAST(WORKERS_16 AS REAL))</p></blockquote>
<blockquote><p>CREATE TABLE pumas_nyc_subcom AS<br />
SELECT * FROM sub_commuters, nyc_pumas<br />
WHERE nyc_pumas.PUMA5ID00=sub_commuters.geo_id2</p></blockquote>
<p><a href="http://gothos.info/wp-content/uploads/2010/02/sql4.png"><img class="alignleft size-thumbnail wp-image-426" style="margin-left: 20px; margin-right: 20px;" title="sql4" src="http://gothos.info/wp-content/uploads/2010/02/sql4-150x150.png" alt="" width="90" height="90" /></a>So, we add a new blank field to our data table and define it as real. Then we update our data table by seting that blank field equal to our expression, thus filling the field with the result of our expression. Once we have the defined calculated field, we can create a new table from the data plus the features based on the ID they share in common. Once the table is created, then we can activate the geometry (right click on geometry field in the feature class and activate &#8211; see <a href="http://gothos.info/?p=378">previous post</a> for details) so we can map it in QGIS. Phew!</p>
<h3>QGIS</h3>
<p><a href="http://gothos.info/wp-content/uploads/2010/02/sql3.png"><img class="alignleft size-thumbnail wp-image-428" style="margin-left: 20px; margin-right: 20px;" title="sql3" src="http://gothos.info/wp-content/uploads/2010/02/sql3-150x150.png" alt="" width="90" height="90" /></a>The third method is to create the calculated field within QGIS, using the new field calculator. It&#8217;s pretty easy to do &#8211; you select the layer in the table of contents and go into an edit mode. Open the attribute table for the features and click the last button in the row of buttons underneath the table &#8211; this is the field calculator button. Once we&#8217;re in the field calculator window, we can choose to update an existing field or create a new field. We give the output field a name and a data type, enter our expression SUBWAY / WORKERS_16, hit OK, and we have our new field. Save the edits and we should be good to go. HOWEVER &#8211; I wasn&#8217;t able to add a calculated fields to features in a SpatiaLite geodatabase without getting errors. I posted to the QGIS forum &#8211; initially it was thought that the SpatiaLite driver was read only, but it turns out that&#8217;s not the case and so and the developers are investigating a possible bug. The investigation continues &#8211; stay tuned. I have tried the field calculator with shapefiles and it works perfectly (incidentally, you can export SpatiaLite features out of the database as shapefiles).</p>
<p>I&#8217;m providing the database I created <a href="http://gothos.info/lib3010/subway_test.zip">here</a> for download, if anyone wants to experiment.</p>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2010/02/calculated-fields-in-spatialite-sqlite/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SpatiaLite and QGIS: Loading, Joining, Mapping Shapefiles and Tables</title>
		<link>http://gothos.info/2010/01/spatialite-and-qgis-loading-joining-mapping-shapefiles-and-tables/</link>
		<comments>http://gothos.info/2010/01/spatialite-and-qgis-loading-joining-mapping-shapefiles-and-tables/#comments</comments>
		<pubDate>Sat, 30 Jan 2010 21:52:30 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[joining data]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[qgis]]></category>
		<category><![CDATA[shapefile]]></category>
		<category><![CDATA[spatialite]]></category>
		<category><![CDATA[sqlite]]></category>
		<category><![CDATA[Tutorial]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=378</guid>
		<description><![CDATA[I stuck with with the Long Term Support Version of QGIS (1.02) last semester while I was teaching, but now I finally have had a chance to experiment with the latest version (1.4) which has a lot of great new features including: improved symbolization, labeling, print layouts, and support for SpatiaLite &#8211; a personal (single [...]]]></description>
			<content:encoded><![CDATA[<p>I stuck with with the Long Term Support Version of QGIS (1.02) last semester while I was teaching, but now I finally have had a chance to experiment with the latest version (1.4) which has a lot of great new features including: improved symbolization, labeling, print layouts, and support for SpatiaLite &#8211; a personal (single file) geodaatbase based on SQLite. For a summary of the new QGIS features check out <a href="http://blog.qgis.org/node/142" target="_blank">the QGIS blog</a> and <a href="http://linfiniti.com/2009/11/some-more-sneak-peeks-into-qgis-trunk/" target="_blank">this developer&#8217;s blog</a>, and for an overview of SpatialLite you can go <a href="http://www.gaia-gis.it/spatialite/docs.html" target="_blank">to the official docs page</a> and <a href="http://bostongis.org/?content_name=spatialite_tut01#195" target="_blank">this tutorial</a>. The latter will show you the obvious strengths of SpatialLite &#8211; the ability to store features and attributes in one container, with the ability to run standard SQL and spatial queries on both. Since that&#8217;s covered pretty well, I thought I&#8217;d  run through a basic operation &#8211; how do you load a shapefile and an attribute table in SpatialLite, join them, connect to the database in QGIS and map the data. I&#8217;m using the SpatialLite GUI, but for those more inclined you could use the command line tool instead.</p>
<p><a href="http://gothos.info/wp-content/uploads/2010/01/spatlite2.png"><img class="size-thumbnail wp-image-382  alignleft" style="margin-left: 20px; margin-right: 20px;" title="spatlite2" src="http://gothos.info/wp-content/uploads/2010/01/spatlite2-150x150.png" alt="Loading shapefile in Spatialite" width="90" height="90" /></a>Fire up the GUI, and create a new, empty geodatabase under the File menu.Once we have a container, we can hit the load a shapefile button. I have a  census PUMA layer for NYC that I&#8217;ve formatted by erasing water features. Click load, go to the path, give the file a nice brief name, and specify the SRID &#8211; the EPSG code that specifies what coordinate system my shapefile is in. In this case, it&#8217;s 4269 as the layer is in NAD83 (you can check your files by opening the prj file in a text editor or by using the OGR tools).</p>
<p><a href="http://gothos.info/wp-content/uploads/2010/01/spatlite3.png"><img class="size-thumbnail wp-image-383  alignleft" style="margin-left: 20px; margin-right: 20px;" title="spatlite3" src="http://gothos.info/wp-content/uploads/2010/01/spatlite3-150x150.png" alt="Table view" width="90" height="90" /></a>Once it&#8217;s loaded, you can expand the listing in the table of contents to see all the field names of the feature, and you can right click on it and choose the edit option to see all of the data in the attribute table.</p>
<p>Next we can load a data table. I have a 2006-2008 ACS census table in tab-delimited text format that I&#8217;ve pre-formatted. The table has the number of workers (labor force age 16+) and number of workers who commuted to work via the subway for every PUMA in the State of New York (it&#8217;s faster to download the whole state and filter out the city PUMAs later). Hit the load txt/csv button, specify a path, a new table name (subway_commuters), the delimiter used, and load the table. It&#8217;s given a different icon in the table of contents (toc), to distinguish a regular data table from a feature class.</p>
<p><a href="http://gothos.info/wp-content/uploads/2010/01/spatlite4.png"><img class="alignleft size-thumbnail wp-image-384" style="margin-left: 20px; margin-right: 20px;" title="spatlite4" src="http://gothos.info/wp-content/uploads/2010/01/spatlite4-150x150.png" alt="spatlite4" width="90" height="90" /></a>The next step is to join them together; I already insured that they both share a common, unique identifier; a FIPS code that has a state and PUMA code. If I run a standard SELECT query I can join the tables in a temporary view &#8211; but that&#8217;s not what I want. I can save the query as a view, but I won&#8217;t be able to access the view within QGIS (at least not with this current stable version of SpatialLite, 2.31). What we have to do here is create a brand new table that combines both the puma feature class and the subway commuter table (referred to in Microsoft Access land as a Make Table Query). Here&#8217;s the SQL that we type in the command window:</p>
<blockquote><p>CREATE TABLE pumas_nyc_subcom AS<br />
SELECT *<br />
FROM nyc_pumas, sub_commuters<br />
WHERE PUMA5ID00=GEO_ID2</p></blockquote>
<p>Execute the query, and we get a message that an empty results set was generated. Uh, ok. But then if we select the database path at the top of the TOC , right-click, and refresh, we&#8217;ll see our new combined table, pumas_nyc_subcom, and we can expand it and take a look at the data. The join worked, but we&#8217;re not done yet. Right now this is just a regular old data table (notice the icon?) We have to turn this into a feature class next.</p>
<p><a href="http://gothos.info/wp-content/uploads/2010/01/spatlite6.png"><img class="size-thumbnail wp-image-385  alignleft" style="margin-left: 20px; margin-right: 20px;" title="spatlite6" src="http://gothos.info/wp-content/uploads/2010/01/spatlite6-150x150.png" alt="Joined and created feature class" width="90" height="90" /></a>Expand the fields for the new table in the TOC, select the Geometry field, right click, and check the geometry. We&#8217;ll see that it&#8217;s MULTIPOLYGON geometry, the projection is still NAD83, and there are 55 features (the non-NYC PUMAS were filtered out during the join, leaving us just with NYC data). Right click on Geometry again, choose the option to Recover Geometry. Specify the geometry type and the SRID, run, refresh the database, and success. A little globe appears next to pumas_nyc_subcom, indicating that it&#8217;s now a feature class.</p>
<p><a href="http://gothos.info/wp-content/uploads/2010/01/spatlite7.png"><img class="aligncenter size-full wp-image-408" style="margin-top: 20px; margin-bottom: 20px;" title="spatlite7" src="http://gothos.info/wp-content/uploads/2010/01/spatlite7.png" alt="spatlite7" width="258" height="77" /></a></p>
<p><a href="http://gothos.info/wp-content/uploads/2010/01/spatlite8.png"><img class="size-thumbnail wp-image-386  alignleft" style="margin-left: 20px; margin-right: 20px;" title="spatlite8" src="http://gothos.info/wp-content/uploads/2010/01/spatlite8-150x150.png" alt="QGIS Spatialite connection interface" width="90" height="90" /></a>At this point we can fire up QGIS. In the toolbar for versions post 1.02, there should be a connect to SpatialLite button. Hit connect, add a New database, and browse to get to it. Once it&#8217;s loaded, then we can hit connect to connect to it, and we&#8217;ll be able to see all feature classes (but NOT data tables, which is why we had to go through the join). Select pumas_nyc_subcom, which has features and data, and click add.</p>
<p style="text-align: center;">
<p>As with any GIS, now we have to symbolize the features to map the subway commuters. Right click on the layer in the table of contents, select properties, and you&#8217;ll get to the recently redesigned properties menu. Go to Symbology, map the subway commuters field by graduated values, change some colors, and voila, a map!</p>
<p><a href="http://gothos.info/wp-content/uploads/2010/01/spatlite12.png"><img class="size-thumbnail wp-image-387  alignleft" style="margin-left: 20px; margin-right: 20px;" title="spatlite12" src="http://gothos.info/wp-content/uploads/2010/01/spatlite12-150x150.png" alt="QGIS map with data and new labels" width="90" height="90" /></a>The developers are still experimenting with improvements &#8211; there&#8217;s a button in the upper right-hand corner of the symbology tab that asks you if you want to try the New Symbology &#8211; this is a new layout, with the introduction of graduated color palettes. It&#8217;s pretty slick, but still a work in progress (color ranges are assigned from dark to light, with the lowest values getting the darkest color; the opposite of cartographic convention). The same label properties are there too, but you can experiment with the improved labeling engine under the Plugins menu. The automatic placement of labels is vastly improved.</p>
<p>Mapping totals for subway commuters isn&#8217;t as interesting as mapping the percentage of commuters in each PUMA who ride the subway. So I&#8217;ll share my experiments working with calculated fields (in SpatialLite and QGIS) in my next post.</p>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2010/01/spatialite-and-qgis-loading-joining-mapping-shapefiles-and-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UNdata Processing, Calc Data Pilot</title>
		<link>http://gothos.info/2009/09/undata-processing-calc-data-pilot/</link>
		<comments>http://gothos.info/2009/09/undata-processing-calc-data-pilot/#comments</comments>
		<pubDate>Sun, 27 Sep 2009 21:27:02 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Data Sources]]></category>
		<category><![CDATA[calc]]></category>
		<category><![CDATA[country codes]]></category>
		<category><![CDATA[data pilot]]></category>
		<category><![CDATA[open office]]></category>
		<category><![CDATA[pivot table]]></category>
		<category><![CDATA[UNdata]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=322</guid>
		<description><![CDATA[I&#8217;d downloaded some data from the UNdata website and cleaned it up so I could use it for my class, and thought I&#8217;d share some tips here. In many cases when you download data from UNdata you get multiple records for each country; one record for each year for each data point. In order to [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;d downloaded some data from the <a href="http://data.un.org/" target="_blank">UNdata</a> website and cleaned it up so I could use it for my class, and thought I&#8217;d share some tips here. In many cases when you download data from UNdata you get multiple records for each country; one record for each year for each data point. In order to bring this data into GIS, I needed to re-arrange it to move the years from rows to columns, so that I&#8217;d have one record for each country with multiple columns for years.</p>
<p>You can do this in Excel using a pivot table, but since I was working off of my Linux notebook, I accomplished this using the Data Pilot tool in Open Office 3.0&#8217;s Calc spreadsheet. Here&#8217;s what I did:</p>
<ul>
<li> <a href="http://gothos.info/wp-content/uploads/2009/09/screenshot1.png"><img class="size-thumbnail wp-image-324 alignleft" style="margin-left: 30px; margin-right: 30px;" title="screenshot1" src="http://gothos.info/wp-content/uploads/2009/09/screenshot1-150x150.png" alt="screenshot1" width="105" height="105" /></a>I opened the csv file I downloaded from UNdata in Calc, and accepted the defaults on the text import screen. Once it was imported, I saved the file in a spreadsheet format &#8211; you can use Calc&#8217;s odt format, or you can save it in Excel xls format if you need to use Excel later. But you have to get out of the csv format &#8211; Calc crashed on me a couple of times when I was running the Pilot and creating multiple sheets in the csv.</li>
</ul>
<ul>
<li> I went up to the Data menu, selected Data Pilot, and Start, which opens the Data Pilot Menu. I clicked the More button to see the full range of options.</li>
</ul>
<ul>
<li> <a href="http://gothos.info/wp-content/uploads/2009/09/screenshot3.png"><img class="alignleft size-thumbnail wp-image-326" style="margin-left: 30px; margin-right: 30px;" title="screenshot3" src="http://gothos.info/wp-content/uploads/2009/09/screenshot3-150x150.png" alt="screenshot3" width="105" height="105" /></a>Then it was a simple matter of dragging the field names into the right places. I dragged the country code and name fields into the rows box, the year into the column box (as I wanted to move years to columns), and the actual data field into the values box. Under the options listed under More, I changed the Results drop down box to save the table in a new sheet, and I unchecked all of the boxes listed below (for adding filters, creating totals rows and columns, etc). Then clicked OK.</li>
</ul>
<ul>
<li> <a href="http://gothos.info/wp-content/uploads/2009/09/screenshot4.png"><img class="alignleft size-thumbnail wp-image-346" style="margin-left: 30px; margin-right: 30px;" title="screenshot4" src="http://gothos.info/wp-content/uploads/2009/09/screenshot4-150x150.png" alt="screenshot4" width="105" height="105" /></a>Voila! I had my newly formatted table, with one row for each country and one column for each year. But since I&#8217;ll be bringing this data into GIS (and will have to save the data in DBF format as I want my students to bring it into QGIS), I need to make sure that my data doesn&#8217;t have any funky formatting that may mess up joining my data to a shapefile. So I added a blank worksheet, copied my new pilot table, and did a Paste Special into the blank worksheet and pasted only text and numbers &#8211; with formatting, formulas, and anything else funky left out.</li>
</ul>
<ul>
<li> <a href="http://gothos.info/wp-content/uploads/2009/09/screenshot5.png"><img class="alignleft size-thumbnail wp-image-347" style="margin-left: 30px; margin-right: 30px;" title="screenshot5" src="http://gothos.info/wp-content/uploads/2009/09/screenshot5-150x150.png" alt="screenshot5" width="105" height="105" /></a>Once I had my plain, reformatted data in my new sheet, I deleted the top row (which had labels for Sum Value and Year) so I&#8217;d be left with only one header row, and I changed the field names to something more database friendly (truncating names and removing spaces). Lastly, I deleted the original data sheet and the formatted data pilot table sheet, so I was left with just the final copy.</li>
</ul>
<p>That&#8217;s it! Sort of. Since I now have the year&#8217;s in columns, I could create a few calculated fields to show change over time.</p>
<p>But the last piece will be dealing with the country codes. To get a data table with codes from the UNdata website, you have to choose an Add Columns option from their data browser page before you download, as you don&#8217;t get the country codes by default. Then, the codes you get could be anything. Since these data tables are coming from dozens of different organizations, agencies, and bureaus within the UN, the country codes will vary based on what that agency did. In some cases I&#8217;ve downloaded data that had the ISO two-digit alpha codes, and in other cases I had three digit numerical ISO codes (stored incorrectly as numbers, so leading zeros were dropped).</p>
<p>Most of the tables I&#8217;ve been downloading come from from the World Health Organization (WHO), and came with no standardized country codes. Instead, the codes are sequential numbers assigned to the countries in alphabetical order from 1 to 193. Doh! Then, if a new country gets added they tacked on the next available number regardless of the alphabet &#8211; so the country of Montenegro is assigned 194, after Zambia which is 193. Typically, data from countries that are not UN members or observers (like Liechtenstein and Vatican City) and are dependencies (Greenland, Falkland Islands, French Polynesia, etc) are not included in the data sets.</p>
<p>So, I&#8217;ll be typing in ISO alpha two codes into one of my data tables and will end up with a table that connects their sequential number system to ISO. Then I can bring this bridge and all of the other tables into a database, relate them to the bridge based on the sequential number, and create new tables out of them that have ISO numbers, so I can join them to my GIS file based on ISO. Or I guess I could add the sequential number field to my countries shapefile and join each table to it based on the sequential number.</p>
<p>Anyway &#8211; happy Data Piloting (or Pivoting, if you prefer).</p>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2009/09/undata-processing-calc-data-pilot/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Formulas for Working With Census ACS Data in Excel / Calc</title>
		<link>http://gothos.info/2009/06/formulas-for-working-with-census-acs-data-in-excel-calc/</link>
		<comments>http://gothos.info/2009/06/formulas-for-working-with-census-acs-data-in-excel-calc/#comments</comments>
		<pubDate>Fri, 26 Jun 2009 21:19:24 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[acs]]></category>
		<category><![CDATA[american community survey]]></category>
		<category><![CDATA[calc formulas]]></category>
		<category><![CDATA[census]]></category>
		<category><![CDATA[census data]]></category>
		<category><![CDATA[excel formulas]]></category>
		<category><![CDATA[margin of error]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=216</guid>
		<description><![CDATA[After downloading US census data, you often need to reformat it before using it. It&#8217;s quite common that you download files where the population is broken down by gender and age, and you need to aggregate the data to get a total or divide a particular characteristic to get a percent total. This is pretty [...]]]></description>
			<content:encoded><![CDATA[<p>After downloading US census data, you often need to reformat it before using it. It&#8217;s quite common that you download files where the population is broken down by gender and age, and you need to aggregate the data to get a total or divide a particular characteristic to get a percent total. This is pretty straightforward if you&#8217;re working with decennial census data, but data from the American Community Survey (ACS) is a little trickier to deal with since you&#8217;re working with estimates that have a margin of error. When creating new data, you also have to calculate what the margin of error is for your derived numbers. I&#8217;ll walk through some examples of how you would do this in a spreadsheet (the formulas below will work in either Excel or Calc).</p>
<p><strong>Creating an Aggregate</strong></p>
<p>We&#8217;ll use the following data in our example:</p>
<p><a href="http://gothos.info/wp-content/uploads/2009/06/screenshot1.png"><img class="size-full wp-image-220 alignnone" title="spreadsheet 1" src="http://gothos.info/wp-content/uploads/2009/06/screenshot1.png" alt="screenshot1" width="767" height="167" /></a></p>
<p>We have the total population of people three years and older who are enrolled in school, and a breakdown of this population enrolled in grades 1 through 4 and grades 5 through 8 in a few counties in New York, with margins of error for each data point. Our data is from the 3 year averaged 2005-2007 American Community Survey.</p>
<p>Let&#8217;s say we want to create a total for students who are enrolled in grades 1 through 8 for each county. We create a new column and sum the estimates for each county with the formula e3+g3, or sum(e3:g3).</p>
<p>To calculate a margin or error (MOE) for our grade 1 to 8 data, first we have to use the find and replace command to get rid of the &#8220;+/-&#8221; signs in the MOE column, so our spreadsheet will treat our values as numbers and not text (this is an issue if you downloaded the data as an Excel file &#8211; if you download a txt file the +/- is not included). Depending on the dataset you&#8217;re working with, you may also need to replace dashes, which represent data that was null or not estimated.</p>
<p>Once the data is cleaned up, we can insert a new column with this formula:</p>
<blockquote><p>=SQRT((F3^2)+(H3^2))</p></blockquote>
<p>This calculates our new margin of error by squaring the moes for each of our data points, summing the results together, and taking the square root of that sum. In other words,</p>
<blockquote><p>=SQRT((MOE1^2)+(MOE2^2))</p></blockquote>
<p>Once that&#8217;s done, you may want to round the new MOE to a whole number.</p>
<p><strong>Creating a Percent Total</strong></p>
<p>Let&#8217;s calculate the percentage of the population 3 years and older enrolled in school that are in grades 1 through 8. Based on what we have thus far (I hid the columns E,F,G, and H for grades 1-4 and 5-8 in this screenshot, as we don&#8217;t need them):</p>
<p><a href="http://gothos.info/wp-content/uploads/2009/06/screenshot-2.png"><img class="size-full wp-image-223 alignnone" title="spreadsheet 2" src="http://gothos.info/wp-content/uploads/2009/06/screenshot-2.png" alt="screenshot-2" width="597" height="160" /></a></p>
<p>We insert a new column where we divide our subgroup by the total, as you would expect &#8211; I3/C3. In the next column we insert the following formula to create a MOE for our new percent total:</p>
<blockquote><p>=(SQRT((J3^2)-((K3^2)*(D3^2))))/C3</p></blockquote>
<p>This one&#8217;s a little weightier than our last formula. We&#8217;re taking the square of our percent total (K3) and the square of the MOE of the total population (D3), multiplying them together, then subtracting that number from the square of the MOE of our subgroup (J3). Then we take the square root of the whole thing, then divide it by our total population (C3). If you&#8217;re saying &#8211; HUH? Maybe this is clearer:</p>
<blockquote><p>=(SQRT((MOEsubset^2)-((PercentTotal^2)*(MOEtotalpop^2))))/TotalPop</p></blockquote>
<p>Finally, we have something like this:</p>
<p><a href="http://gothos.info/wp-content/uploads/2009/06/screenshot-3.png"><img class="size-full wp-image-224 alignnone" title="speadsheet 3" src="http://gothos.info/wp-content/uploads/2009/06/screenshot-3.png" alt="screenshot-3" width="764" height="155" /></a></p>
<p>Based on our data, we can say things like &#8220;There were approximately 30,556 students enrolled in 1st through 8th grade per year in Dutchess County, NY between 2005 and 2007, plus or minus 1,184 students. An estimated 37% of the population enrolled in school in the county was in the 1st through 8th grade, plus or minus 1%.&#8221; The ACS estimates have a 90% confidence interval.</p>
<p><strong>Wrap Up</strong></p>
<p>In this example we worked with aggregating and calculating percentages based on characteristics. We could also use these same formulas to aggregate data by geography, if we wanted to add the characteristics for all the counties together.</p>
<p>For the full documentation on working with ACS data, take a look at the appendix in the Census&#8217; <a href="http://www.census.gov/acs/www/UseData/Compass/handbook_def.html" target="_blank">ACS Compass Guide, What General Data Users Need to Know</a>. It provides you with the formulas in their proper statistical notation (for those of you more mathematically inclined than I) and includes formulas for calculating other kinds of numbers, such as ratios and percent change. It does provide you with worked-through examples, but not with spreadsheet formulas. I used their examples when I created formulas the first time around, so I could compare my formula results to their examples to insure that I was getting it right. I&#8217;d strongly recommend doing that before you start plugging away with your own data &#8211; one misplaced parentheses and you could end up with a different (and incorrect) result.</p>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2009/06/formulas-for-working-with-census-acs-data-in-excel-calc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating a New Shapefile in ArcGIS: Part II</title>
		<link>http://gothos.info/2009/05/creating-a-new-shapefile-in-arcgis-part-ii/</link>
		<comments>http://gothos.info/2009/05/creating-a-new-shapefile-in-arcgis-part-ii/#comments</comments>
		<pubDate>Fri, 15 May 2009 18:53:32 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[ArcGIS]]></category>
		<category><![CDATA[id fields]]></category>
		<category><![CDATA[identifiers]]></category>
		<category><![CDATA[new york city]]></category>
		<category><![CDATA[shapefile]]></category>
		<category><![CDATA[spatial join]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=180</guid>
		<description><![CDATA[In my previous post I gave an overview of how to create a shapefile from scratch, where we created a point layer to identify places and neighborhoods in NYC. In this post, I&#8217;ll pick up where we left off.
Whenever you create new features in a shapefile, ArcGIS automatically adds a couple of fields, including an [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post I gave an overview of how to create a shapefile from scratch, where we created a point layer to identify places and neighborhoods in NYC. In this post, I&#8217;ll pick up where we left off.</p>
<p>Whenever you create new features in a shapefile, ArcGIS automatically adds a couple of fields, including an auto-number ID field that uniquely identifies each feature. This was sufficient for our example as the 291 place names we were working with do not have a standard ID number that represents them. If we were creating features that did have a recognized ID number or code, we certainly would want to add an additional field to hold that number. This would allow us to share and relate our data to other datasets that use that conventional ID. For example, if we had a layer with the 50 states, we would want to have a FIPS number or the two digit postal code for each state in the attribute table, so we could relate our states feature to the zillions of other state-based data tables out there that also use these codes.</p>
<p>It&#8217;s also helpful to add other identifiers to relate our place names to some larger geographic area. Why? Let&#8217;s say we want to filter our neighborhoods by borough &#8211; perhaps we just want to label neighborhoods in Manhattan or calculate distances only between places that appear in the Bronx. It would be useful to have a borough code or some other code associated with each of our place names for running queries.</p>
<p><a href="http://gothos.info/wp-content/uploads/2009/05/scrnshot6.png"><img src="http://gothos.info/wp-content/uploads/2009/05/scrnshot6-150x150.png" alt="scrnshot6" title="scrnshot6" width="150" height="150" class="alignleft size-thumbnail wp-image-186" /></a>As it turns out, the City of New York does use a standardized system of three digit codes to identify all boroughs and community districts in the city. In our example, the code for Manhattan Community District 12, which contains Inwood and Washington Heights, is 112. The first digit identifies the borugh and the second two digits identify the district. It would be a good idea to assign each of our neighborhoods this district code, so we could filter our features by either borough or district.</p>
<p>When we create each feature, we could manually type in the code in it&#8217;s own field just like we added the neighborhood names, but that would be rather tedious &#8211; and unnecessary. A better choice would be to do a spatial join. Whereas a &#8220;regular&#8221; join allows us to join attribute tables based on a common ID field, a spatial join allows us to assign attributes to one layer based on their geographical relationship to another layer. </p>
<p><a href="http://gothos.info/wp-content/uploads/2009/05/scrnshot7.png"><img src="http://gothos.info/wp-content/uploads/2009/05/scrnshot7-150x150.png" alt="scrnshot7" title="scrnshot7" width="150" height="150" class="alignleft size-thumbnail wp-image-187" /></a>In the Table of Contents, right click on the neighborhoods layer and choose Joins and Relates &#8211; Joins. We&#8217;ll get the familiar Join dialog box. However, if you hit the first drop down box that says Join Attributes From a Table and choose Join Data Based on Spatial Location, we&#8217;ll get the options for doing a spatial join. Choose the community districts as the layer to join to the neighborhoods, and since we&#8217;re joining points to polygons we&#8217;ll choose parameters that are relevant for relating these two features. In this case, give each point (neighborhood / place) the attributes of the polygons (districts) that it falls inside. ArcGIS will create a new point layer with the joined fields when you hit OK. Open the attribute table of the new point layer, and you&#8217;ll see the additional fields, including the community district numbers. You&#8217;ll also get some rather useless fields from the district layer, like the length and area of each district, which you can safely delete.</p>
<p>So instead of tediously entering these numbers by hand for each neighborhood, we simply run the spatial join process once (after we&#8217;ve finished adding the points for all 291 neighborhoods) and the IDs are automatically added.</p>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2009/05/creating-a-new-shapefile-in-arcgis-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating a New Shapefile in ArcGIS: Part I</title>
		<link>http://gothos.info/2009/05/creating-a-new-shapefile-in-arcgis-part-i/</link>
		<comments>http://gothos.info/2009/05/creating-a-new-shapefile-in-arcgis-part-i/#comments</comments>
		<pubDate>Thu, 14 May 2009 15:28:57 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[ArcGIS]]></category>
		<category><![CDATA[labels]]></category>
		<category><![CDATA[layers]]></category>
		<category><![CDATA[new york city]]></category>
		<category><![CDATA[shapefile]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=134</guid>
		<description><![CDATA[I&#8217;m working with a grad student who needs to create a new shapefile from scratch, and thought I&#8217;d turn the instructions for doing this in ArcGIS into a tutorial / post for creating new point layers. The idea in this example is to create a point layer that shows the relative center of 291 neighborhoods [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m working with a grad student who needs to create a new shapefile from scratch, and thought I&#8217;d turn the instructions for doing this in ArcGIS into a tutorial / post for creating new point layers. The idea in this example is to create a point layer that shows the relative center of 291 neighborhoods in New York City. Since many of these neighborhoods are place names without finite boundaries, we&#8217;ll have to use various sources (<a href="http://www.nyc.gov/html/dcp/pdf/neighbor/neighbor.pdf" target="_blank">NYC Planning map</a> and Rand McNally street maps) to pinpoint the relative center of each neighborhood. </p>
<p>These points will be used for labeling each neighborhood. In this case, creating a new, georeferenced layer is preferable to creating 291 text labels on a map that are not tied to geography in any way.</p>
<ul>
<li>The first step is to download some layers from the <a href="http://www.nyc.gov/html/dcp/html/bytes/applbyte.shtml" target="_blank">NYC Department of Planning</a> to use for reference, such as a layer for boroughs and community districts. Community districts are used by the city to approximate neighborhoods. Many of the neighborhoods that we are trying to plot are, in many cases, smaller areas or places within these boundaries.
</li>
</ul>
<ul>
<li>
<a href="http://gothos.info/wp-content/uploads/2009/05/scrnshot1.png"><img src="http://gothos.info/wp-content/uploads/2009/05/scrnshot1-150x150.png" alt="scrnshot1" title="scrnshot1" width="150" height="150" style="padding: 10px; margin: 10px" class="alignleft size-thumbnail wp-image-142" /></a>Next, open ArcCatalog and create a folder to store the data. Then, right click on the folder in the table of contents and select New &#8211; Shapefile. In the Create New Shapefile window, we give the shapefile a name, select Point as the feature type, and hit Edit to change the<br />
coordinate system. In the Spatial Reference Properties menu, we&#8217;ll import a coordinate system from one of the files we downloaded from NYC Planning, which uses New York State Plane for Long Island. Click OK and OK again, and we&#8217;ll have a new shapefile.
</li>
</ul>
<ul>
<li>
<a href="http://gothos.info/wp-content/uploads/2009/05/scrnshot2.png"><img src="http://gothos.info/wp-content/uploads/2009/05/scrnshot2-150x150.png" alt="scrnshot2" title="scrnshot2" width="150" height="150" style="padding: 10px; margin: 10px" class="alignleft size-thumbnail wp-image-145" /></a>Right now, our new shapefile isn&#8217;t very exciting because it&#8217;s empty &#8211; you can preview it in the catalog to see for yourself. If you preview the table, you&#8217;ll see that Arc created three fields &#8211; FID, Shape, and ID, which it will automatically fill in when we start creating features. Before we do that, we&#8217;ll have to add an additional column to store the name of the neighborhood. To do that, open ArcMap and add the neighborhood layer to the map. Then, right click on the layer in the Table of Contents and open the attribute table. Hit the Options button and choose Add Field. In the Add Field menu, name the new field, choose Text as the type, and change the length to 80 (in case we have some neighborhoods with long names). Hit OK, and you&#8217;ll have a new field.
</li>
</ul>
<ul>
<li>
<a href="http://gothos.info/wp-content/uploads/2009/05/scrnshot3.png"><img src="http://gothos.info/wp-content/uploads/2009/05/scrnshot3-150x150.png" alt="scrnshot3" title="scrnshot3" width="150" height="150" style="padding: 10px; margin: 10px" class="alignleft size-thumbnail wp-image-148" /></a>Let&#8217;s add our reference layers next. Hit the Add Data button (or File &#8211; Add Data), and add the borough boundaries and community districts (if you don&#8217;t see anything after you add them, right click on one of these layers and choose Zoom to Layer). Go into the symbology tab for each layer and change their display to make the areas appear more distinctive. Make sure your neighborhood layer is on top of your other layers.
</li>
</ul>
<ul>
<li>
Now it&#8217;s time to start plotting neighborhoods. Go to the Selection menu &#8211; Set selectable Layers, and turn off all the layers except the neighborhood layer. Then, use the dropdown on the Editor Toolbar and Select Start Editing (if you don&#8217;t see the Editor Toolbar, make sure it&#8217;s activated by going to View &#8211; Toolbars and select it). <a href="http://gothos.info/wp-content/uploads/2009/05/scrnshot4.png"><img src="http://gothos.info/wp-content/uploads/2009/05/scrnshot4-150x150.png" alt="scrnshot4" title="scrnshot4" width="150" height="150" style="padding: 10px; margin: 10px" class="alignleft size-thumbnail wp-image-149" /></a>On the Editor Toolbar, make sure the Create New Feature task is activated and that the target layer is the neighborhood layer, and not any of the reference layers. Zoom in to the top of Manhattan. With the Pencil tool selected in the toolbar, and using your sources (NYC planning map, Rand McNally street map, whatever), click on the map to approximate where the center of the Inwood neighborhood would be. A blue dot should appear on the map. Then right-click on the neighborhoods layer in the Table of Contents and open the attributes table. You&#8217;ll see a brand new record for your new dot. Click in the empty field for Name, type in the name of the neighborhood, and press enter.
</li>
</ul>
<ul>
<li>
That&#8217;s the process! Next, locate the area for Washington Heights and click on the map to create the point for that neighborhood. The new dot will appear hi-lighted, while the previous dot for Inwood will now appear as a regular point symbol. Now it&#8217;s just a matter of plugging away. Make sure to occasionally save your edits by clicking Editor and choosing Save Edits. If you make a mistake, you can delete a feature by selecting the Select Feature tool in the regular tool bar (white arrow with a blue and white feature box next to it), select the particular point, and hit the delete key. If you&#8217;re having trouble pinpointing the right location for the neighborhood, try downloading additional reference layers to guide you. The <a href="http://www.nyc.gov/html/doitt/html/eservices/eservices_gis_agreement.shtml" target="_blank">NYC DOITT</a> also has a page with GIS layers for the city with features like parks and streets that may be helpful. When you&#8217;re finished editing, choose Stop Editing under the Editor Toolbar.
</li>
</ul>
<ul>
<p><a href="http://gothos.info/wp-content/uploads/2009/05/scrnshot5.png"><img src="http://gothos.info/wp-content/uploads/2009/05/scrnshot5-150x150.png" alt="scrnshot5" title="scrnshot5" width="150" height="150" style="padding: 10px; margin: 10px" class="alignleft size-thumbnail wp-image-150" /></a>
<li>The ultimate goal of this exercise was to get neighborhood labels to appear without the actual point. To accomplish this, change the point symbol for the neighborhood to nothing by going into the Symbology tab for the layer and reducing the fill to no color, the outline to nothing, and the size to zero. Then open the Labels tab under the Properties menu, turn labels on using the name field as the label field, select Placement Properties and choose the setting to place the labels on top of the point, hit ok, and voila! Perfectly centered neighborhood names that are part of a georeferenced layer.
</li>
</ul>
<p>This covers the basics. In the next post, I&#8217;ll go a little further and discuss adding additional fields to the new file, without having to type them in manually.</p>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2009/05/creating-a-new-shapefile-in-arcgis-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Transform Projections with GDAL / OGR</title>
		<link>http://gothos.info/2009/04/transform-projections-with-gdal-ogr/</link>
		<comments>http://gothos.info/2009/04/transform-projections-with-gdal-ogr/#comments</comments>
		<pubDate>Tue, 14 Apr 2009 22:32:50 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Resources]]></category>
		<category><![CDATA[coordinate systems]]></category>
		<category><![CDATA[gdal / ogr]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[projections]]></category>
		<category><![CDATA[reproject]]></category>
		<category><![CDATA[transformations]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=116</guid>
		<description><![CDATA[The GDAL / OGR tools are an open source, cross platform, command-line toolkit that can be used for viewing GIS metadata, performing attribute queries, and converting file formats, among other things. It can also be used for transforming coordinate systems and projections for GIS files. I&#8217;ll demonstrate in this brief tutorial how to accomplish this [...]]]></description>
			<content:encoded><![CDATA[<p>The GDAL / OGR tools are an open source, cross platform, command-line toolkit that can be used for viewing GIS metadata, performing attribute queries, and converting file formats, among other things. It can also be used for transforming coordinate systems and projections for GIS files. I&#8217;ll demonstrate in this brief tutorial how to accomplish this using the OGR tools, which are for vector based GIS. The raster based GDAL tools work in a similar fashion.</p>
<p><strong>Viewing basic coordinate system / projection info:</strong></p>
<blockquote><p>ogrinfo -al -so world_wgs.shp</p></blockquote>
<p>Where ogrinfo is the name of the tool, -al is a switch to get detailed info about the layer, -so is a switch to display summary info, and world_wgs.ship is the name of our file. Run that command and we&#8217;ll get something that looks like this, with info about the features, coordinate system, and attribute fields of our shapefile:</p>
<blockquote><p>
INFO: Open of `world_wgs.shp&#8217;<br />
      using driver `ESRI Shapefile&#8217; successful.</p>
<p>Layer name: world_wgs<br />
Geometry: Polygon<br />
Feature Count: 243<br />
Extent: (-179.808664, -89.677397) &#8211; (179.808664, 83.435942)<br />
Layer SRS WKT:<br />
GEOGCS["GCS_WGS_1984",<br />
    DATUM["WGS_1984",<br />
        SPHEROID["WGS_1984",6378137,298.257223563]],<br />
    PRIMEM["Greenwich",0],<br />
    UNIT["Degree",0.017453292519943295]]<br />
CNTRY_NAME: String (254.0)<br />
FIPS_CNT_1: String (254.0)<br />
ISO_2DIGIT: String (254.0)<br />
ISO_3DIGIT: String (254.0)<br />
STATUS: String (254.0)<br />
COLORMAP: Real (18.6)<br />
CONTINENT: String (254.0)<br />
UN_CONTINE: String (254.0)<br />
REGION: String (254.0)<br />
UN_REGION: String (254.0)
</p></blockquote>
<p><strong>Convert coordinate systems supported by EPSG</strong></p>
<p>GDAL / OGR and most of the open source GIS software supports projections and coordinate systems that are part of the EPSG library. If you want to do a conversion between two coordinate systems and they are both supported by EPSG, you just have to reference the EPSG code that&#8217;s used to identity the system that you want to project to. You can look up codes using <a href="http://spatialreference.org/" target="_blank">spatialreference.org</a>.</p>
<p>Let&#8217;s say we want to convert our shapefile that&#8217;s in WGS 84 (common lat and long) to NAD 83 (used frequently in North America):</p>
<blockquote><p>
ogr2ogr -t_srs EPSG:4269 world_new.shp world_wgs.shp
</p></blockquote>
<p>Where ogr2ogr is the name of the tool, -t_srs is the command for transforming from one coordinate system to the other, EPSG:4269 is the code that identifies the coordinate system we want the new file to have &#8211; NAD83, world_new.shp is the name of the output file that will have the new projection that we want, and world_wgs.shp is our input file. If you run the command and get no error message, you&#8217;re in good shape. Just run the ogrinfo command on the new file to verify that it&#8217;s been re-projected.</p>
<p><strong>Convert coordinate system not supported by EPSG</strong></p>
<p>The EPSG library is extensive, but doesn&#8217;t contain everything, particularly some global and continental map projections. GDAL / OGR can still do the job, but you&#8217;ll have to provide the tool with the proper frame of reference since the EPSG library doesn&#8217;t have the info. Let&#8217;s say we want to project our WGS file to the Robinson Projection, which is not part of EPSG.</p>
<p>First, go back to <a href="http://spatialreference.org/" target="_blank">spatialreference.org</a> and search for Robinson. Its ID code is ESRI 54030 &#8211; not part of the EPSG library. Click on the link for the projection to open its window. You&#8217;ll be able to look at the projection data in a number of standard file formats. Select OGC_WKT from the list, and it will open the text in a new window, showing you the parameters of that projection. In your browser, go up to file, save as, and save the file as robinson_ogcwkt.txt in the same directory as the shapefile you want to reproject.</p>
<p>Now that you have the projection info stored in the text file, run the following command to make the conversion:</p>
<blockquote><p>
ogr2ogr -t_srs robisnon_ogcwkt.txt world_rob.shp world_wgs.shp
</p></blockquote>
<p>It&#8217;s the same command as our previous one, except that you&#8217;re referencing the text file with your data instead of an EPSG code.</p>
<p><strong>Define an undefined coordinate system</strong></p>
<p>If you run the ogrinfo command and your coordinate system is undefined, you should define it before doing anything else, and you must define an undefined projection before converting to another projection. Look at the metadata that came with you file or go back to the source to figure out what it is. For example the US Census Bureau Generalized Cartographic Boundary Files for 2000 are in NAD83 according to their metadata, but the files lack a projection definition.</p>
<p>To define one, use the following command:</p>
<blockquote><p>ogr2ogr -a_srs EPSG:4269 states_nad83.shp states_unknown.shp</p></blockquote>
<p>The only difference here is the -a_srs command is used to assign a coordinate system to a file &#8211; the rest of the parameters are the same. If you&#8217;re defining a non-EPSG projection, use the same method from the previous example &#8211; download a definition file from spatialreference.org and use the file name in place of the EPSG code.</p>
<p><strong>More help and where to download:</strong></p>
<p><a href="http://help.nceas.ucsb.edu/GDAL" target="_blank">UC Santa Barbara NCEAS</a>  and the <a href="http://casoilresource.lawr.ucdavis.edu/drupal/node/98" target="_blank">UC Davis Soil Lab</a> both have short tutorials and sample commands of GDAL / OGR.</p>
<p>If you want to thumb through the world&#8217;s map projections, the folks at radicalcartography have a nice <a href="http://www.radicalcartography.net/?projectionref" target="_blank">projection reference</a> page with visuals and brief descriptions.</p>
<p>Visit the <a href="http://www.gdal.org/" target="_blank">GDAL / OGR</a> page for downloading, or if you&#8217;re a Windows or Mac user, you can download QGIS and GDAL / OGR together from the <a href="http://www.qgis.org/" target="_blank">QGIS</a> download page. Linux users can get GDAL / OGR via your package handler &#8211; depending on your distro, you may have it already.</p>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2009/04/transform-projections-with-gdal-ogr/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>QGIS: Data Defined Labeling and Table Joins</title>
		<link>http://gothos.info/2009/03/qgis-data-defined-labeling-and-table-joins/</link>
		<comments>http://gothos.info/2009/03/qgis-data-defined-labeling-and-table-joins/#comments</comments>
		<pubDate>Sat, 07 Mar 2009 16:32:49 +0000</pubDate>
		<dc:creator>Frank</dc:creator>
				<category><![CDATA[Data Processing]]></category>
		<category><![CDATA[Resources]]></category>
		<category><![CDATA[ftools]]></category>
		<category><![CDATA[labels]]></category>
		<category><![CDATA[latitude]]></category>
		<category><![CDATA[longitude]]></category>
		<category><![CDATA[qgis]]></category>
		<category><![CDATA[table joins]]></category>
		<category><![CDATA[xy coordinates]]></category>

		<guid isPermaLink="false">http://gothos.info/?p=72</guid>
		<description><![CDATA[A little while ago I posted a text file with geographic centroids (centers) for each of the world&#8217;s countries. The reason why I put this together was that I wanted to test the data defined labeling features in QGIS. While automatic labeling in QGIS isn&#8217;t so hot (overlapping labels, multiple lables for each polygon), there [...]]]></description>
			<content:encoded><![CDATA[<p>A little while ago I posted a text file with geographic centroids (centers) for each of the world&#8217;s countries. The reason why I put this together was that I wanted to test the data defined labeling features in QGIS. While automatic labeling in QGIS isn&#8217;t so hot (overlapping labels, multiple lables for each polygon), there are some powerful features for storing and referencing columns for annotation within the attribute table of shapefiles. One of the neat features is the ability to place labels based on coordinates stored in the attribute table.</p>
<p>The first step was to take the centroids file and join in to a shapefile of the worlds countries based on a common ID field, in this case FIPS country codes. QGIS doesn&#8217;t support table joins directly, but you can accomplish this with a good plugin called fTools, which includes a lot of additional and useful features. The instructions for getting fTools up and running are available on the <a href="http://www.ftools.ca/" target="_blank">fTools website</a>; the installation doesn&#8217;t require you to download any files, you just handle everything through the QGIS plugin manager (if you have trouble seeing the plugin manager or getting fTools to appear, check to make sure that you have python installed on your machine). Once fTools is up and running, you&#8217;ll see a Tools dropdown menu next to your other menus &#8211; drop it down, select data management tools and join attribute tables. You&#8217;ll get a dialog box asking which shapefile and field you want to join and which shapefile or table you want to join to it. The plugin only supports joins from other shapefiles and dbf tables, so you have to save the save the country centroids text file as a dbf before you do the join (you can do this in Calc or a pre-2007 version of Excel). These aren&#8217;t dynamic joins; fTools will create a new shapefile with the table fields attached.</p>
<p>Once the join is complete, you can add the new shapefile with the new fields, click on the layer, and navigate to the labels tab. Hit the checkbox to turn the labels on, select the field that contains the label in the dropdown box at the top, then select data defined position from the menu below. You&#8217;ll see a new series of dropdowns on the right, and you can select your longitude column for the X coordinate and latitude column for the Y coordinate. Hit OK, and voila! You&#8217;ll have labels that are centered in the middle of each country.</p>
<p>Of course, the label placement will not be perfect in every case. There will be label overlap in areas with small countries, areas with many countries clustered together, and with countries that have long names. The scale and size of the font will also be a factor, and placing the country name in the center is not always ideal for small island nations. However, you can easily change the label placement by going into an edit mode and changing the coordinates in the attribute table to get optimal placement. You can mouse over the map and use the coordinate information that&#8217;s displayed beside the scale in the lower right-hand corner of the window to determine which coordinates are most optimal for a given situation. If you produce several maps at the same area and scale, you can use the same settings over and over again. You can also globally change the placement of all the labels using some of the other label options, such as placing all labels above or to the top-right of the centroid.</p>
<p>Now in order for all of this to work, the coordinates in the country centroid file must be in the same coordinate system as the shapefile. Since the country centroid file uses basic latitude and longitude, I was able to do this with a shapefile that was in the basic WGS 84 geographic coordinate system. If you&#8217;re using a different geographic coordinate system or a projected coordinate system, you&#8217;ll have to convert the coordinates in the centroid file to match that system. I haven&#8217;t delved into this too deeply yet, but there are a number of free tools that you can download that should do this &#8211; one of them is called <a href="http://earth-info.nga.mil/GandG/geotrans/" target="_blank">GEOTRANS</a>, and it&#8217;s available for free download from the NGA. It can handle batch transformations of coordinate data stored in text files, and supports conversions to several different geographic and projected systems.</p>
<div id="attachment_74" class="wp-caption alignleft" style="width: 310px"><a href="http://gothos.info/wp-content/uploads/2009/03/labels2.png"><img src="http://gothos.info/wp-content/uploads/2009/03/labels2-300x217.png" alt="QGIS Label Placement With XY Coordinates" title="labels2" width="300" height="217" class="size-medium wp-image-74" /></a><p class="wp-caption-text">QGIS Label Placement With XY Coordinates</p></div>
]]></content:encoded>
			<wfw:commentRss>http://gothos.info/2009/03/qgis-data-defined-labeling-and-table-joins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
