Kafka Connect

I’ve already written about the Apache Kafka scalable and high-performance message broker. It is a fine tool, and now very widely used.

Kafka Connect is another component of the Apache Kafka project, dedicated to importing data into Kafka from external systems or exporting data from Kafka into external systems. I am currently setting up ETL (extract, transform, load) for a client using kafka-connect and have written up some notes on it here.

The Java8 Optional Class

The Lambda and Kappa Design Patterns for Persistent State

A while ago, I received a bunch of books on “big data” themes, one of which was about something called the Lambda Architecture. It presents a design-pattern useful for IT systems that have large amounts of persistent data representing “current state” - stored data like users, devices or bank-accounts which is mutable (updateable). I found it interesting but too complex for the issues I needed to deal with at the time.

Recently one of my work colleagues mentioned something called the Kappa Architecture which seemed to be related. I did some reading on that topic too, and found it to be a simplification of Lambda which is applicable to many more cases. Here are my notes on the topics for those interested.

A Simple Wrapper for Starting Processes from Java

Sometimes it is necessary, from within a Java application, to start an external process on the host operating system.

The facilities for this in the standard Java library are fairly primitive. There are some full-featured libraries that help with this case, but I have recently developed a simple alternative (just a few classes); you can read about it here.

Elasticsearch5 TransportClient Mode

Gradle Internals

I am a long-time user of the Maven tool for building/packaging Java applications. However recently I worked on a client project that used Gradle, and liked it. I have written up some introductory notes on Gradle here..

Some JUnit Rules

Maven Random Tips

Some random Maven build-tool tips..

For transitive dependencies, where multiple versions of the same dependency are transitively required, maven takes the version that is shallowest in the “dependency:tree” output. When two are equal depth, the first one wins. But when a dependencyManagement declaration exists, then that version is used.

When a pom with direct dependencies is included as a <dependency> in another pom, those artifacts also become dependencies of the including pom - in whichever scope the includer specifies.

When such a pom is included from within <dependencyManagement> in another pom, and scope=import is specified, then any dependencyManagement declarations in the other pom become dependencyManagement declarations in the including pom.