Jetty Amusing Stacktrace

I recently set a breakpoint on a spring-boot app, to diagnose a problem occurring during an http request. While looking at the stacktrace, I noticed an entertainingly-named class from the jetty framework:

==>	  at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(
	  at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(
	  at org.eclipse.jetty.util.thread.ReservedThreadExecutor$
	  at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
	  at org.eclipse.jetty.util.thread.QueuedThreadPool$

Remapping DNS Lookups in a JRE

Normally, when Java code opens a socket to some URL, the named host is looked up in the host system /etc/hosts and if not found then a DNS server is consulted.

I recently had a problem where a Java library I was using was trying to connect to a host whose IP address I already knew, but whose name was NOT available using the normal lookups. This problem was only occuring in a development environment, and during integration tests, so the solution was clear: hijack the usual JRE name resolution to force my desired lookup.

Do you sometimes get the feeling that code you have written is so ugly that it is somehow beautiful? I think this qualifies .. and it solves the problem.

Spring Forward

While working on a spring-boot application being deployed to Google AppEngine, I enabled CORS (Cross Origin Request Sharing) checks, and everything turned nasty. Why do apparently easy tasks turn out to be so complicated sometimes?

The cause turned out to be a combination of issues in Chrome, Google IAP, and Spring CORS support. A description of the problem, and the solution I eventually developed, can be found here.

Cloud Basics

I haven’t posted anything significant for a while, not because I have nothing to say, but because I have been rather busy.

I’m currently deep in the middle of a project that uses Google Cloud Platform heavily, and so have had to make a sideways leap from learning Hadoop/Scala/Spark etc (see recent postings) into cloud-based tech instead. They are related, but not quite the same.

Here are the first couple of what is likely to be a long series of articles about cloud processing, as I learn and have time to write up my conclusions. I hope you find them helpful..

LEDE on a TPLink-WDR4300 Router

For networking at home I currently use a TP-Link WDR4300 router; four years ago I installed OpenWRT on it, a linux-based operating system for routers. I last updated the OS on the router in 2016. Given the number of significant security holes found recently in various security protocols, updating it again has been on my to-do list for a few months.

Sadly the OpenWRT site is dormant/dead - no updates since 2016. Fortunately the LEDE Project is continuing work on the same code-base. Installing the latest LEDE release on my router went extremely smoothly - just 5 minutes work.

I have updated my openwrt-on-tplink-wdr4300 article to point to the LEDE project site.

Typesafe Config

Just wanted to point out a Java library which is actually reasonably well known anyway - TypeSafe’s config library.

This provides an API for loading configuration data from external files. Among other things, it allows properties files to:

  • include references to variables (whose values can be defined as sysvars, in code, or in the config files)
  • include the contents of other files
  • define times with syntax such as “10 seconds” and memory-sizes such as “512k”

More interestingly it supports a superset of JSON called HOCON which allows comments, and which removes the verbosity and unforgiving punctuation requirements of JSON while retaining its powerful nested structure.

Accessing Hive via JDBC

Hive container is running beyond physical memory limits

I use the hive commandline tool to make queries against hive tables. Recently, a query failed with the error message “container is running beyond physical memory limits”.

It took me quite a while to figure out what was happening, and how to work around it. My notes can be found here.

It’s a shame that Tez/Hive don’t handle this automatically. Relational databases never report “out of memory” when running a query just because the source table is particularly large. On the other hand, this table was so large that no relational database could ever have held it…

UPDATE: Shortly after solving the above problem, I struck another out-of-memory problem in Hive which is discussed here. Fun, fun, fun…