A distributed file system, a MapReduce programming framework, and an extended family of tools for processing huge data sets on large clusters of commodity hardware, Hadoop has been synonymous with “big data” for more than a decade. But no technology can hold the spotlight forever.
While Hadoop remains an essential part of the big data platforms, and the major Hadoop vendors—namely Cloudera, Hortonworks, and MapR—have changed their platforms dramatically. Once-peripheral projects like Apache Spark and Apache Kafka have become the new stars, and the focus has turned to other ways to drill into data and extract insight.
Let’s take a brief tour of the three leading big data platforms, what each adds to the mix of Hadoop technologies to set it apart, and how they are evolving to embrace a new era of containers, Kubernetes, machine learning, and deep learning.
Cloudera Enterprise Data Hub
Cloudera was the first to market with a Hadoop distribution—not surprising given that its core team consisted of engineers who had leveraged Hadoop in places like Yahoo, Google, and Facebook. Hadoop co-creator Doug Cutting serves as chief architect.