A CDE Definition
An open source Big Data framework from the Apache Software Foundation designed to handle huge amounts of data on clusters of servers. The storage is handled by the Hadoop Distributed File System (HDFS), and the data are sorted and summarized in parallel by Hadoop MapReduce, a version of Google's MapReduce. Required Java files are included in Hadoop Common, and Hadoop YARN provides the cluster management.
Originally written for the Nutch Web crawler for spidering the Web, in 2008, Yahoo's Search Webmap was the first very large implementation of Hadoop running on 10,000 Linux servers. Search Webmap ran in a third less time than Yahoo's previous search engine.
The Hadoop name comes from a favorite stuffed elephant of the son of the developer Doug Cutting. See Google File System, MapReduce and Spark.
Before/After Your Search Term
Terms By Topic
Click any of the following categories for a list of fundamental terms.