Big Data Overview

Categories: BigData

The subject of “big data” has been very topical in the last few years - rivalling (and having something in common with) “clouds”. It’s not something I’ve personally been involved with though, and my knowledge of the area is somewhat lacking. I’ve just spent a few days reading up on the core concepts and relevant (mostly open-source) products around. Here are my notes - a quick overview of everything “big”.

Actually, the “storage” article also covers non-sql data storage in general..

  • Big Data Storage – how to store data when plain relational databases just won’t handle it! Covers key-value stores, graph databases, document databases, BigTable/Hadoop/HBase/Cassandra/etc - plus a quick introduction to MapReduce.
  • Big Data Processing – how to analyse large amounts of data (from the programming/architecture perspective). Hadoop, Spark, Hive, Hama/BSP, Giraph/Pregel, Storm.

It is interesting that the majority of important open-source implementations in this area are hosted by the Apache Foundation. And so many are implemented in Java..