News that the FCC was investigating possible impropriety by Google in monitoring Wi-Fi networks comes at a time when the National Legal and Policy Center (NLPC) is calling for a probe into the company’s ‘unusually close’ ties to President Obama. The group cited contributions from the software giant to the Democratic Party. But the real currency of Google, one which it possesses in greater abundance than nearly anyone is data. What kind of data, even open source data, can Google potentially make available to the Democratic Party, and how?
The post Database of Databases examined the increasing role of databases in investigative journalism. As the Woodward-Bernstein model of access reporting becomes harder to sustain among hard-pressed newspapers, data mining and analysis is growing in importance. Organizations like the Sunlight Foundation have been attempting to link publicly available data together to discover patterns. A bunch of us have actually tried our hand at a rudimentary attempt. Given the growing importance of finding patterns in data in order to control the narrative, it came as no surprise that Google has entered this arena. Check out this video from Google Refine. It is pitched explicitly at data journalists.
The Google Refine application shown above allows relatively unsophisticated users to conform data so that meaningful joins between heterogenous sets can be made. In other words, you can take one dataset and find foreign keys to other datasets by applying human intelligence. Once this is done on a widespread basis then everything which has left a data trail can potentially be related to anything else. Since every modern human being or organization leaves a data track, it can reveal a great deal about what anybody does. This can be used for good or for ill. It can uncover featherbedded contracts, secret payments, sweetheart deals. It can uncover cabals, reveal conspiracies, unearth terrorists.
Or it can reveal only what the underlying system chooses to make available.
Capturing user intelligence and applying it to the product is essentially the secret sauce to Google’s pagerank algorithm. Google’s great fortune rests on capturing the intelligence behind what its users do and embedding it in their data.
With Google Refine, the software giant is laying the groundwork for getting millions of users to connect all the dots in the public domain. Although the video above goes to great lengths to say that the data thereby “refined” will remain on the desktop — and therefore out of Google’s hands — there can be no serious doubt that the next step will be to offer a YouTube like service in which a user can check in his “refined” data and have it connect up with other “refined” sets. From there it is but a short step to providing online journalists with an interface to discover patterns for their next expose.
The existence of this kind of data in Google’s server farms will give it enormous political power — and influence — influence that far exceeds any kind of donations it might make to candidates. Just saying.