Apache HBase Overview

I’ve just completed an article on HBase, in case anyone is interested. Feedback welcome.

Apache Hive v2.0 Released

Apache Hive 2.0 was released in Feb 2016. The most significant changes are:

  • hive metadata can be stored in HBase rather than a relational database (alpha)
  • long-lived worker processes - ie somewhat like Impala (beta)
  • built-in support for HPL/SQL - a “procedural SQL” language mostly compatible with Oracle PL/SQL, Teradata BTEQ etc. See the HPSQL site
  • performance improvements, particularly when using Spark as the back-end execution engine.
  • web-based admin interface for HiveServer2 daemon

Hive 2.0 is not included in Hortonworks HDP2.4 (released 2016-03-01) - maybe next version. It isn’t in the current Cloudera release either.

In related news Kafka 0.9 has a new Java API, and initial support for “native streaming”.

Update 2016-07-03: I’ve added an article on Hive to this site.

Dropwizard and Hystrix

I’ve recently been doing some maintenance work on an existing Java application (a server providing REST APIs to clients, and talking to a database). Two of the libraries used there (Dropwizard, Hystrix) were new to me, and interesting - worth a brief post at least.

The Kafka Message Broker

OrangeFS

Linux kernel version 4.6 has just been released, and one of the items mentioned is kernel drivers for the distributed filesystem OrangeFS.

OrangeFS certainly does look interesting. However they claim that “using OrangeFS instead of HDFS … can improve MapReduce performance and …”. Having looked at the OrangeFS docs, this seems somewhat overstated.

Apache Tez - an Overview

I’ve recently spent some time investigating Apache Tez, which is used by several higher-level tools (including Hive and Pig) to execute logic in parallel on a cluster of servers. If you’re interested in that sort of thing, you can find my notes here.

NoSQL - an Overview

It’s been a while since the last new posting here. I’ve been busy - learning lots of big-data-related stuff, and updating some existing big-data articles on this site to expand details (and fix some errors).

However I’ve also written a short new article giving a quick overview of nosql.

I’ve also completed a set of pages on Apache Hadoop, if you’re interested in that sort of stuff.

Your comments are very welcome!

SQL Nulls and Tristate Logic - Fooled Again

I work only intermittently with SQL, and every once in a while I fall into the old SQL tristate logic trap - like yesterday. It’s just not natural..

For those who don’t know, here’s a quick recap…