Apache Beam and Google Dataflow Overview

Apache Beam and CSV Headers

As noted in the previous post, I’m working with GCP at the moment - and with Apache Beam on Google Dataflow.

Beam has many cool features - but sometimes things that should be trivial are unexpectedly complicated - like reading the ‘header’ line of a CSV file.

Google Cloud Platform and AppEngine

I’ve been doing a lot of work on the Google Cloud Platform recently. As a result, here are two articles giving an overview of GCP Identities and Resources (which might be a good place to start if you don’t know GCP) and GCP AppEngine. There will be more articles to come..

Accessing HDFS from Spark in Zeppelin

When using a Zeppelin notebook for data analysis (eg on a Hortonworks platform), files stored in HDFS can be manipulated via a paragraph using the shell interpreter (%sh header) and then using the hdfs commandline tool. However sometimes it is nicer to access HDFS from spark/scala code rather than requiring a separate block.

More Spring Quirks

Two more weird problems encountered recently with the Spring framework for Java (v4.3.x)..

The spring bean name “securityProperties” appears to be magic/reserved. I created a class with name SecurityProperties, annotated with @Component as usual for Spring - and at runtime instantiation of a class which should have been injected with the singleton of that type failed with “no such bean”. Specifying an explicit bean-name via @Component(value="someOtherBeanName") fixed the problem; so did changing the class name (ie changing the default derived bean name).

I also had problems with a rest endpoint annotated with @RequestMapping(value="/some/path/{id}") - ie when using a path parameter. When the requested url was of form “/some/path/a.b.c.d” then the annotated method was invoked with param id set to “a.b.c”, ie the last dot and everything following was stripped. Some googling quickly found the answer (ie other people appear to regularly hit this too) - Spring webmvc processes urls like “/some/file.html” by stripping the file-suffix. As noted in this stackoverflow thread, the mapping annotation can be modified to use a regular expression-match eg @RequestMapping(value="/some/path/{id:.*}). However I just wanted to turn off Spring’s suffix-processing completely. It took me some time to figure out how; my solution (for annotation-based configuration) is:

import org.springframework.context.annotation.Bean;
import org.springframework.web.servlet.config.annotation.PathMatchConfigurer;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurerAdapter;
     * Allow rest-entry-points to accept path-params like "a.b.c".
     * <p>
     * By default, given a rest-endpoint mapping of form "/some/endpoint/{id}" and
     * a URL of form "/some/endpoint/aresource.html", Spring sets param id to just
     * "aresource", ie the suffix is stripped. This is not desired in this app, so
     * disable this default behaviour...
     * </p>
    public WebMvcConfigurer disablePathSuffixAdapter() {
        return new WebMvcConfigurerAdapter() {
            public void configurePathMatch(PathMatchConfigurer configurer) {

Jetty Amusing Stacktrace

I recently set a breakpoint on a spring-boot app, to diagnose a problem occurring during an http request. While looking at the stacktrace, I noticed an entertainingly-named class from the jetty framework:

	  at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:289)
	  at org.eclipse.jetty.io.ssl.SslConnection$3.succeeded(SslConnection.java:149)
	  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:104)
	  at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
==>	  at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
	  at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
	  at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	  at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:243)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:679)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:597)
	  at java.lang.Thread.run(Thread.java:745)

Remapping DNS Lookups in a JRE

Normally, when Java code opens a socket to some URL, the named host is looked up in the host system /etc/hosts and if not found then a DNS server is consulted.

I recently had a problem where a Java library I was using was trying to connect to a host whose IP address I already knew, but whose name was NOT available using the normal lookups. This problem was only occuring in a development environment, and during integration tests, so the solution was clear: hijack the usual JRE name resolution to force my desired lookup.

Do you sometimes get the feeling that code you have written is so ugly that it is somehow beautiful? I think this qualifies .. and it solves the problem.

Spring Forward

While working on a spring-boot application being deployed to Google AppEngine, I enabled CORS (Cross Origin Request Sharing) checks, and everything turned nasty. Why do apparently easy tasks turn out to be so complicated sometimes?

The cause turned out to be a combination of issues in Chrome, Google IAP, and Spring CORS support. A description of the problem, and the solution I eventually developed, can be found here.