There is a big barrier to learning big data technology such as Hadoop. With most new technology, a developer just downloads the software to his laptop and starts hacking. You can’t do that with Hadoop. It requires a minimum of four servers to work properly. Most developers do not have four servers lying around that they can play with to learn a new technology.
In a Gartner survey, companies said that their primary concern about using big data software is that “big data expertise is scarce and expensive.” Their second concern is that “data appliances platforms are expensive.” I believe that these two concerns are interdependent; the cost of hardware is the barrier to learning and the barrier to learning makes big data expertise scarce and thus expensive.
The supply of developers is not meeting the exploding demand for big data expertise. The proof is in the market. Visit any job board and look for the key word “Hadoop”. You will see an impressive list of openings. It is economics 101; low supply and high demand means high prices. The current rate for big data consultants is $150 to $300 per hour. That may be fine for small projects like proof of concepts, but even large companies have a hard time building scale out projects at these rates. For small and midsize companies, the price is insurmountable. In the same Gartner survey, 79% of midsize businesses stated that big data is a “significant challenge”, compared to only 55% of large corporations having the same response. One can interpret from that response that the deep pockets of larger companies makes big data less of a challenge.
The big barrier to learning is the developers’ access to hardware. Cloud is an option, but not a great option for three reasons. First, cloud is not that cheap. The free version for Amazon Web Services, for instance, will not get you four servers. Plus, cloud providers charge you for network traffic and Hadoop processing inherently generates a ton of it. Second, learning requires experimenting. For most people, the cloud is new and not a comfortably place to experiment. Learning Hadoop is hard enough. Why add the complexity of cloud computing? Hit the wrong button and accidentally spend a bundle of money? No thank you. Third, Hadoop does not work as well on cloud servers. Cloud servers are virtual servers that share hardware such as disk drives and network connections. Hadoop is built to optimize hardware usage. If you are running on virtual servers, then those optimizations will not work and performance will be unsatisfactory to say the least.
Hadoop developers should be able to build a cluster on their desktop. Software developers have always had their own personal development environment on their desktop. The challenge today is that they need not just one but at least four to eight computers. Software developers need small, affordable servers that they can use for their own personal use for experimenting, learning and developing Hadoop and other parallel processing technologies. These do not need to be monster servers. All that is required is a processor, a hard drive and network connection. The key is to have a development environment that is really distributed.
The world needs small servers for big data. Such a product would lower the barriers to learning Hadoop. That would increase the number of developers who know Hadoop, which would in turn lower the costs of hiring big data developers and building big data solutions. Small servers for big data could unleash the full potential of the big data revolution.