A while back (2007 to be exact, an eternity in Internet years), Google released a product called Google 411. You could call either 1-800-GOOG-411 or 1-877-GOOG-411 and search for businesses by city and state, category, or other criteria. It was a direct competitor to the local expensive 411 services, and it was completely free.
I remember at the time a lot of people I worked with scoffed at Google. It can’t be monetized, they said. Just like people used to say about Google’s 95% blank home page. What could they possibly have to gain from launching a free 411 phone service, obviously at some expense?
The answer is of course data. Google 411 has been shut down since 2010, but while it was around Google was able to harvest millions of examples of dialects, speech patterns, and vocal nuances. By launching a free service to be used and enjoyed by everyone, they were harvesting valuable data into a phoneme database that would play a key role in audio indexing.
To me, this is a fine example of how big data knows no bounds. For some companies big data isn’t just about the data they feel they may need, but the data they may never need. It’s collecting both the signal and the noise because even the noise is valuable. Even small inconsequential bits of log data could one day save your company as Ben Franklin’s horseshoe-nail saves the kingdom. And you’ll never know unless you collect it.
Think of it this way: every time a log file ages out, or you delete/truncate a table, every time a single bit is wiped away a piece of information is lost. Who knows what that information may have taught us once combined and visualized? Lost information is lost measurement returning whatever details it described to an unknown state. And what is unknown to your data is unknown to your company. Data is corporate memory. Even at the very least, storing and tracking data could help prepare your company for an uncertain future.
So should I just save everything?
Well no, that would be crazy. If you haven’t noticed I get a little overzealous when it comes to the possibilities of mass storage and correlation of data. But that’s on a futuristic and unrealistic level. At the corporate level storing every single bit of user, server, network, customer, and business data and retaining it ad infinitum could easily number in the exabytes.
It’s called Big Data, not Biggest Data. What is big data to your company might be peanuts to someone else (like Google and their phonobase). If you decide a foray into big data is worth it to your company and if you can assemble the team, the tools, and the information necessary to get it going the decisions on how far you take it are up to you. You get to pick what to keep and what to discard, how long to keep it, and what you hope to learn from it. If you’re not getting the results you hoped for then you can broaden your horizons or hunker down and keep storing data.
With the awesome compute power of software/hardware like Hadoop, Hive, HBase, MongoDB, Exadata, Teradata, Netezza, GreenPlum, etc, and the ‘resurgence’ of data visualization and statistical computing through commercialization the beauty of Big Data is that it can be as small or big as you want it to be.