During a recent code-review I noticed a work colleague had used method parallelStream in Java code, and realized I didn’t know much about this method or the fork/join framework it relies on. The results of my research on the Java fork/join framework can be found here.
I recently had to process multiple messages in parallel from a single RabbitMQ queue. The official docs are sadly lacking in this area, so I did some investigation and found out some interesting things.
As you may have noticed, I’ve been doing a lot with Kafka Connect recently.
In order to test a custom Kafka Connect connector and connector configurations, I needed a suitable environment with the services installed. I set this up via Vagrant, and have now documented how this is done - mostly for myself in case I need to do something similar later, but maybe it is also useful to you..
Recently I posted some information about using Kafka Connect to import data into Kafka message-broker topics. That article included a brief review of some alternative tools for solving the same problem - eg Flume, Logstash and NiFi.
Since then I have spent some more time looking into NiFi in more detail. If you are interested in “data integration” (whether from/into Kafka or not) then my notes might be of interest.
I’ve already written about the Apache Kafka scalable and high-performance message broker. It is a fine tool, and now very widely used.
Kafka Connect is another component of the Apache Kafka project, dedicated to importing data into Kafka from external systems or exporting data from Kafka into external systems. I am currently setting up ETL (extract, transform, load) for a client using kafka-connect and have written up some notes on it here.