The Apache™ Hadoop® project develops open-source software for
reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets across clusters
of computers using simple programming models. It is designed to scale up
from single servers to thousands of machines, each offering local
computation and storage. Rather than rely on hardware to deliver
high-availability, the library itself is designed to detect and handle
failures at the application layer, so delivering a highly-available
service on top of a cluster of computers, each of which may be prone to
failures.
See this video about what what Data was the Scientist's Dream
Apache Hadoop is an open-sourcesoftware framework that supports data-intensivedistributed applications, licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity hardware. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers.
The Hadoop framework transparently provides both reliability and data
motion to applications. Hadoop implements a computational paradigm
named MapReduce,
where the application is divided into many small fragments of work,
each of which may be executed or re-executed on any node in the cluster.
In addition, it provides a distributed file system that stores data on
the compute nodes, providing very high aggregate bandwidth across the
cluster. Both map/reduce and the distributed file system are designed so
that node failures are automatically handled by the framework. It enables applications to work with thousands of computation-independent computers and petabytes of data. The entire Apache Hadoop “platform” is now commonly considered to consist of the Hadoop kernel, MapReduce and Hadoop Distributed File System (HDFS), as well as a number of related projects – including Apache Hive, Apache HBase, and others.
Hadoop is written in the Java programming language and is an Apache top-level project being built and used by a global community of contributors. Hadoop and its related projects (Hive, HBase, Zookeeper, and so on) have many contributors from across the ecosystem.
Though Java code is most common, any programming language can be used
with "streaming" to implement the "map" and "reduce" parts of the
system.
So whats the problem? As Big Data science increases our ability to model
or simulate complex systems, these models, ironically, become as
complex as the real world. But they are not the real world. Whether its
astrophysics or the economy, building a computer model still demands
leaving some aspects of the problem out. More importantly, the very act
of bringing the equations over to digital form means you have changed
them in subtle ways and that means you are solving a slightly different
problem than the real-world version.
Read More :
http://www.npr.org/blogs/13.7/2012/09/18/161334704/big-data-and-its-big-problems