Accessing HDFS from Spark in Zeppelin

When using a Zeppelin notebook for data analysis (eg on a Hortonworks platform), files stored in HDFS can be manipulated via a paragraph using the shell interpreter (%sh header) and then using the hdfs commandline tool. However sometimes it is nicer to access HDFS from spark/scala code rather than requiring a separate block.

More Spring Quirks

Two more weird problems encountered recently with the Spring framework for Java (v4.3.x)..

The spring bean name “securityProperties” appears to be magic/reserved. I created a class with name SecurityProperties, annotated with @Component as usual for Spring - and at runtime instantiation of a class which should have been injected with the singleton of that type failed with “no such bean”. Specifying an explicit bean-name via @Component(value="someOtherBeanName") fixed the problem; so did changing the class name (ie changing the default derived bean name).

I also had problems with a rest endpoint annotated with @RequestMapping(value="/some/path/{id}") - ie when using a path parameter. When the requested url was of form “/some/path/a.b.c.d” then the annotated method was invoked with param id set to “a.b.c”, ie the last dot and everything following was stripped. Some googling quickly found the answer (ie other people appear to regularly hit this too) - Spring webmvc processes urls like “/some/file.html” by stripping the file-suffix. As noted in this stackoverflow thread, the mapping annotation can be modified to use a regular expression-match eg @RequestMapping(value="/some/path/{id:.*}). However I just wanted to turn off Spring’s suffix-processing completely. It took me some time to figure out how; my solution (for annotation-based configuration) is:

import org.springframework.context.annotation.Bean;
import org.springframework.web.servlet.config.annotation.PathMatchConfigurer;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurerAdapter;
 
    /**
     * Allow rest-entry-points to accept path-params like "a.b.c".
     * <p>
     * By default, given a rest-endpoint mapping of form "/some/endpoint/{id}" and
     * a URL of form "/some/endpoint/aresource.html", Spring sets param id to just
     * "aresource", ie the suffix is stripped. This is not desired in this app, so
     * disable this default behaviour...
     * </p>
     */
    @Bean
    public WebMvcConfigurer disablePathSuffixAdapter() {
        return new WebMvcConfigurerAdapter() {
            @Override
            public void configurePathMatch(PathMatchConfigurer configurer) {
                configurer.setUseSuffixPatternMatch(false);
            }
        };
    }
}

Jetty Amusing Stacktrace

I recently set a breakpoint on a spring-boot app, to diagnose a problem occurring during an http request. While looking at the stacktrace, I noticed an entertainingly-named class from the jetty framework:

	  at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:289)
	  at org.eclipse.jetty.io.ssl.SslConnection$3.succeeded(SslConnection.java:149)
	  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:104)
	  at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
==>	  at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
	  at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
	  at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	  at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:243)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:679)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:597)
	  at java.lang.Thread.run(Thread.java:745)

Remapping DNS Lookups in a JRE

Normally, when Java code opens a socket to some URL, the named host is looked up in the host system /etc/hosts and if not found then a DNS server is consulted.

I recently had a problem where a Java library I was using was trying to connect to a host whose IP address I already knew, but whose name was NOT available using the normal lookups. This problem was only occuring in a development environment, and during integration tests, so the solution was clear: hijack the usual JRE name resolution to force my desired lookup.

Do you sometimes get the feeling that code you have written is so ugly that it is somehow beautiful? I think this qualifies .. and it solves the problem.

Spring Forward

While working on a spring-boot application being deployed to Google AppEngine, I enabled CORS (Cross Origin Request Sharing) checks, and everything turned nasty. Why do apparently easy tasks turn out to be so complicated sometimes?

The cause turned out to be a combination of issues in Chrome, Google IAP, and Spring CORS support. A description of the problem, and the solution I eventually developed, can be found here.

Cloud Basics

I haven’t posted anything significant for a while, not because I have nothing to say, but because I have been rather busy.

I’m currently deep in the middle of a project that uses Google Cloud Platform heavily, and so have had to make a sideways leap from learning Hadoop/Scala/Spark etc (see recent postings) into cloud-based tech instead. They are related, but not quite the same.

Here are the first couple of what is likely to be a long series of articles about cloud processing, as I learn and have time to write up my conclusions. I hope you find them helpful..

LEDE on a TPLink-WDR4300 Router

For networking at home I currently use a TP-Link WDR4300 router; four years ago I installed OpenWRT on it, a linux-based operating system for routers. I last updated the OS on the router in 2016. Given the number of significant security holes found recently in various security protocols, updating it again has been on my to-do list for a few months.

Sadly the OpenWRT site is dormant/dead - no updates since 2016. Fortunately the LEDE Project is continuing work on the same code-base. Installing the latest LEDE release on my router went extremely smoothly - just 5 minutes work.

I have updated my openwrt-on-tplink-wdr4300 article to point to the LEDE project site.

Typesafe Config

Just wanted to point out a Java library which is actually reasonably well known anyway - TypeSafe’s config library.

This provides an API for loading configuration data from external files. Among other things, it allows properties files to:

  • include references to variables (whose values can be defined as sysvars, in code, or in the config files)
  • include the contents of other files
  • define times with syntax such as “10 seconds” and memory-sizes such as “512k”

More interestingly it supports a superset of JSON called HOCON which allows comments, and which removes the verbosity and unforgiving punctuation requirements of JSON while retaining its powerful nested structure.