Alan Freedman -- The Computer Language Company - Computer Desktop Encyclopedia
Computer Desktop Encyclopedia
Longest-Running Tech Reference on the Planet

A CDE Definition

You'll love The Computer Desktop Encyclopedia (CDE) for Tech Term of the Day (TTOD)


data lake

A large storage repository that holds data in their original format prior to being parsed and analyzed. The term is often associated with Hadoop, which was designed to hold huge amounts of data. See Hadoop.


An open source Big Data framework from the Apache Software Foundation designed to handle huge amounts of data on clusters of servers. The storage is handled by the Hadoop Distributed File System (HDFS), and the data are sorted and summarized in parallel by Hadoop MapReduce, a version of Google's MapReduce. Required Java files are included in Hadoop Common, and Hadoop YARN provides the cluster management.

Originally written for the Nutch Web crawler for spidering the Web, in 2008, Yahoo's Search Webmap was the first very large implementation of Hadoop running on 10,000 Linux servers. Search Webmap ran in a third less time than Yahoo's previous search engine.

The Hadoop name comes from a favorite stuffed elephant of the son of the developer Doug Cutting. See Google File System, MapReduce and Spark.

Personal Use Only

Before/After Your Search Term
data glovedata latency
data glutdata leakage
data governancedata legibility
data hidingdata library
data hygienedata life cycle management
data independencedata line
data integritydata line monitor
data interchange formatdata link
data itemdata link escape
data kidnappingdata link layer

Terms By Topic
Click any of the following categories for a list of fundamental terms.
Computer Words You Gotta KnowSystem design
Job categoriesUnix/Linux
Interesting stuffPersonal computers
InternetIndustrial Automation/Process Control
Communications & networkingAssociations/Standards organizations
HistoryDesktop publishing
ProgrammingHealthcare IT
System design