This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Internet & WWW

OpenStreetMap project completes import of United States TIGER data

By Nathan Willis on January 23, 2008 (11:00:00 PM)

Share    Print    Comments   

OpenStreetMap (OSM) has completed the bulk import of comprehensive street and highway data for the United States, months ahead of the project's original estimates. The massive data set originated with the US Census Bureau's public domain map database, and importing it required a dedicated upload process running around the clock since August 2007. The imported data will still require human editing and error-correction, but the completed task is a major milestone for the OSM project.

As we reported in October, OSM's Dave Hansen retrieved the data from the Census Bureau's Topologically Integrated Geographic Encoding and Referencing (TIGER) system and converted it into an OSM-friendly format offline.

Once happy with the results, the project began importing the data into the main OSM system using three dedicated daemons running concurrently. Requiring all of the imported data to go through the OSM server's API just like manually collected GPS trace data meant it was a time-consuming process, but it was safer than attempting to bypass the API and alter the database directly.

Quadtiles to the rescue

At the beginning of the process, the predicted completion date hovered between late May and early June 2008. Luckily, admin Tom Hughes found a way to re-index the database using quadtiles, resulting in greatly shortened database lookup times.

Quadtiles recursively split each quadrant of the map into four subquadrants, allowing for better space efficiency by only subdividing those quadrants that require more detail -- a quadrant containing only ocean and therefore no roads, for example, would not require subdivision, whereas a metropolitan city center would.

Indexing the database with quadtile keys resulted in two benefits. First, the quadtile keys are shorter -- 32 bits as opposed to 16 bytes for the old latitude/longitude indices -- so less memory is required. And second, because of quadtiles' hierarchical nature, geographically close nodes are adjacent in the database index, which improves cache performance.

The speed gains resulting from the new database index affected all API requests, not just imports, and earned Hughes a lolcat of awesomeness award from the other OSM participants.

Now the real work begins, and you can help

The TIGER data set covers 6% of the Earth's surface. Its successful import does not mean that the work is finished. Users who have collected their own GPS logs in areas covered by the TIGER maps and uploaded the resulting data report sporadic problems with TIGER's information. Problems include misalignment of roads, missing features (including the regular absence of on-ramps and access roads, and representation of divided highways as a single road), and occasional confusion on features such as cul-de-sacs. Since the TIGER map data was produced from aerial photography, and was originally intended to assist Census Bureau officials in the field, such problems are bound to occur and are unlikely to have undergone official correction.

But even with its faults, the TIGER data is orders of magnitude better than no data at all, and serves as an excellent baseline for community improvement. OSM provides interested users with tutorials and helpful hints to aid them in correcting the problems they encounter. Users collecting their own GPS routes can update mistakes in the imported TIGER data using the Java Open Street Map Editor (JOSM) developed by the project.

OSM has performed one other bulk data import. In July 2007, AND Automotive Navigation Data donated a comprehensive road map of the Netherlands and highway system maps of India and China. The Netherlands import was completed last fall, but the data for the India and China import has yet to be released to the project.

Although other prospects for large-scale map donations have been discussed, none are on the horizon. The addition of the AND and TIGER information gives the OSM project a helpful boost, but the bulk of the future work remains in the hands of individual users, each contributing their input toward the whole.

Share    Print    Comments   

Comments

on OpenStreetMap project completes import of United States TIGER data

Note: Comments are owned by the poster. We are not responsible for their content.

OpenStreetMap project completes import of United States TIGER data

Posted by: Anonymous [ip: 35.9.55.67] on January 24, 2008 01:13 PM
>sigh< "Quad-tiles" better known as quadtrees have been used to represent geographical data..let's see.. the first time I heard of them in that context was about 20yrs ago. For open source project workers to become aware of the 'prior art' in the application domain seems like a useful thing to do prior to charging off and spending a lot of time and effort re-discovering what's already known.

#

Re: OpenStreetMap project completes import of United States TIGER data

Posted by: Anonymous [ip: 129.240.235.122] on January 24, 2008 03:35 PM
Sigh, indeed. But where does it say that they weren't aware of the prior art before "charging off and [...] re-discovering what's already known"?

#

Re(1): OpenStreetMap project completes import of United States TIGER data

Posted by: Anonymous [ip: 86.18.224.33] on January 25, 2008 08:39 AM
And why do you assume it wasn't already known? And by *whom* !

Documenting something in an easy-to-consume wiki format is the 1st step in communicating between those people that know about such things with those people that don't.

You'd be surprised at the number of alternative, mad schemes that get proposed. But you have to build a consensus for things to move forward.

#

OpenStreetMap project completes import of United States TIGER data

Posted by: Anonymous [ip: 70.66.195.112] on January 30, 2008 03:58 AM
It's not complete yet. There's no data for Skagit County, WA in there yet.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya