Thursday, October 9, 2008

Google Discovery Timeline

I found a quite eye-opening use of the news archive feature of news.google.com. I typed in search strings for "oil" and "discovery" and "timeline", and then when you invoke the timeline view, you see this:



This search essentially looks at references to dates in archived news articles over the years and it automatically creates a histogram of the relative counts. Interestingly, it shows peaks in much the same places as the classic oil discovery curve.



I believe that the curve has a USA bias as many of the large peaks correlate with specific lower-48 discoveries, such as the historically significant field found at Spindletop right after the year 1900. And of course it nails the original discovery in 1859. Yet it predicts pretty accurately the accepted worldwide peak discovery date in the early 1960's.

Google doesn't use an AI program specifically tailored for discovering oil of course, so they cannot figure out that the dates have to solely relate to that year's oil discoveries. So we see many references to recent dates, which probably relates more to the explosion of news sources during the information age.

I would classify this as another example of the "Wisdom of the Crowds" data mining approach. This works because the dates get reported by masses of people, and this large sample collectively improve the statistical accuracy of the estimate.

Update: The Google discovery peak for coal:
Free Image Hosting at www.ImageShack.us