I attended the 2016 Gartner Business Intelligence, Analytics & Information Management Summit that was held on 22 - 23 February in Sydney. Below are my notes surrounding Hadoop:
What is Hadoop?
Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Hadoop is not just 1 thing, rather the term “Hadoop” has come to refer to the “ecosystem”, or collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others.
Hadoop provides a range of data processing options which sits on top of a distributed file system, with redundancy. Batching is Hadoop's strongest component, but also includes interactive SQL, and Streaming/Events data. Hadoop is ever evolving.
What Hadoop is used for?
Hadoop is not the best fit for consumption of analytics.
Architecture Patterns for Analytics on Hadoop
The Logical Data Warehouse
The Logical Data Warehouse (LDW) is a data management architecture for analytics which combines the strengths of traditional repository warehouses with alternative data management and access strategies. It is a clear demarcation between centralized repository approaches and managed data services for analytics.
Hadoop adoption recommendations
Business Intelligence Lead.