Apache NiFi

Recently I posted some information about using Kafka Connect to import data into Kafka message-broker topics. That article included a brief review of some alternative tools for solving the same problem - eg Flume, Logstash and NiFi.

Since then I have spent some more time looking into NiFi in more detail. If you are interested in “data integration” (whether from/into Kafka or not) then my notes might be of interest.

Threadsafe Variable Access in Java

Kafka Connect

I’ve already written about the Apache Kafka scalable and high-performance message broker. It is a fine tool, and now very widely used.

Kafka Connect is another component of the Apache Kafka project, dedicated to importing data into Kafka from external systems or exporting data from Kafka into external systems. I am currently setting up ETL (extract, transform, load) for a client using kafka-connect and have written up some notes on it here.

The Java8 Optional Class

The Lambda and Kappa Design Patterns for Persistent State

A while ago, I received a bunch of books on “big data” themes, one of which was about something called the Lambda Architecture. It presents a design-pattern useful for IT systems that have large amounts of persistent data representing “current state” - stored data like users, devices or bank-accounts which is mutable (updateable). I found it interesting but too complex for the issues I needed to deal with at the time.

Recently one of my work colleagues mentioned something called the Kappa Architecture which seemed to be related. I did some reading on that topic too, and found it to be a simplification of Lambda which is applicable to many more cases. Here are my notes on the topics for those interested.

A Simple Wrapper for Starting Processes from Java

Sometimes it is necessary, from within a Java application, to start an external process on the host operating system.

The facilities for this in the standard Java library are fairly primitive. There are some full-featured libraries that help with this case, but I have recently developed a simple alternative (just a few classes); you can read about it here.

Elasticsearch5 TransportClient Mode

Gradle Internals

I am a long-time user of the Maven tool for building/packaging Java applications. However recently I worked on a client project that used Gradle, and liked it. I have written up some introductory notes on Gradle here..