Is your company data-driven? Thought so. Data and “data-drivenness” has become so integral to companies’ success nowadays, that it would feel weird to hear somebody say “oh, our company doesn’t care about data”. In fact, it sounds almost as if it’s a crime to disrespect data like that. So, for those of you who want to do the morally right thing and get the most out of your data, let’s go over what you need to consider when making the necessary technological investments to support your data infrastructure.
#0 Necessity
Before even point #1, let’s take a step back and consider this question: Do you really need that much data? This is a question that often get’s unanswered, but it’s a fair question to ask. “Big Data” is sometimes referred to as “dumb data” when it is collected without a strategy. You can collect all the data out there in the world, only to add to your confusion of what to do with it. For companies in certain stages or industries, the cost of data collection or data management outweighs the expected returns from big data analysis. Or, you simply might not have that much data.
You might be asking why a big data integration company like ours would say such a thing, but our point is that the best solution would really depend on the situation. Sometimes big data could help. Sometimes it would not. Sometimes all you need is smarter data. On the other hand, if you are a leading tech company like Lumosity or Airbnb with millions of users, you do want to take full advantage of your data. The 10% optimization you might find from, say, correlation analysis, could mean millions of dollars in additional revenue and customer value. Among other things, if you are in a situation where data collection is easy/cheap, service improvements are scalable across your entire service, and you have a good budget for data analysis, then you are probably in a good position to benefit from big data.
Once you decide that big data is important to you, an important consideration is performance and scalability. Many companies have their own in-house database or data warehousing solution to handle their data management needs. However, as the pace of data collection increases, they tend to hit roadblocks for performance or scalability. As data volume grows, data warehouses need to be upgraded in order to collect and process data efficiently, and that requires a considerable amount of time and resources. If your data warehousing solution isn’t prepared, then even the slightest delay might result in negative consequences.
In many cases, we build a solution suitable for our current needs, hoping that it will somehow scale (or hoping someone else will do that for you). Without some prior planning, though, this could create some potential nightmares in the future. Thankfully, many data warehousing vendors are specifically prepared for these kinds of problems. Data Warehouses such as Amazon Redshift are easily scalable to the petabyte level, with just a few clicks. Using such managed services is one way to alleviate such issues. Many vendors offer free trials so that you can test out what they can and cannot do.
#2 Accessibility and Availability
A data warehouse is no use if no one can access it. Likewise, if only a few people have access to the data, there is that much less opportunity for the company to find meaningful information from the data.
You might also have data from multiple sources and locations that you want to query in one single place, or you might even have so much data that loading into your data warehouse cannot be done fast enough. When your data isn’t there for you to even query, you’ll be missing out on valuable insight and even worse, potential information that can help you improve your organization. This defeats the whole purpose of the data warehouse. It’s important to plan out for accessibility and availability of the data, so that you get maximum benefits from your big data infrastructure.
#3 Cost
When it comes to outsourcing data warehousing needs, you may find yourself in this dilemma: many 3rd party data warehouses cost too much money, while building, upgrading, and maintaining an in-house solution requires lots of engineers and engineering time (which equals a lot of money as well!). Cloud-based solutions have the huge advantage of having very low upfront cost for getting started, which we think is one of the large factors contributing to its rising popularity (besides the fact that it is getting much cheaper). Unless you have sunk costs in in-house infrastructure/employees, cloud-based solutions like Amazon Redshift should look pretty attractive. These data infrastructure costs are, however, investments that allow your company to take advantage of insight from your data. The outcome is the difference of making decisions based on sound data analysis, or making decisions blindfolded. In a rapidly changing environment like today, it is harder to rely on gut instincts alone, so the importance of such investments cannot be overstated.