A JUnit Rule for Elasticsearch Integration Testing

Categories: Java

I have recently written about both JUnit rules and Elasticsearch. Here is a JUnit rule which supports writing integration tests which interact with a real (but single-node) Elasticsearch 5.1.x instance in the same JVM. Indexes and mappings can be defined, and the standard APIs used for saving, reading, searching and otherwise using that instance.

Elasticsearch does provide its own integration-test-framework, ESIntegTestCase. However this is IMO exceedingly complex to use. It starts a cluster of multiple nodes, creates indices with random numbers of shards, allows simulated failure of nodes in a cluster, etc. As far as I can see, this code is useful for testing Elasticsearch itself, but not so useful for testing client applications which use Elasticsearch.

Using ESRule

The ESRule class can be used as follows:

public class SomeTest {
  @Rule
  public ESRule esRule = new ESRule();

  @Test
  public void testSomething() {
    Client client = esRule.getClient();
    // and here, perform any operations supported by an Elasticsearch client object.
  }
}

This is particularly useful for testing code that uses the Elasticsearch “transport client” interface rather than the REST interface. It is probably possible to adapt this rule to allow testing via the REST interface, but I haven’t tried it.

An embedded Elasticsearch instance is started before each test, and stopped/deleted when the test completes. Starting up the embedded Elasticsearch instance takes around 5 seconds. It is therefore reasonable to use even for tests that run during a normal build of the application. However testing should generally be done in a few larger test-methods rather than many small independent methods, as the startup time is per-test-method.

Just as a side-note, somewhat related to test startup time: some purists argue that “each test should test just one thing”, but I personally take a more pragmatic approach - what makes the application quality better and developer lives easier with minimal cost and wasted time? As long as a test failure still points the finger clearly at the change that broke the test, I see no problem with a single method that verifies multiple features of the application.

Implementation

The implementation of ESRule is:

// Author: Simon Kitching
// This code is in the public domain
package net.vonos.testsupport.elastic;

import org.elasticsearch.client.Client;
import org.junit.rules.TestRule;
import org.junit.runner.Description;
import org.junit.runners.model.Statement;

import java.io.IOException;
import java.util.Collections;

/**
 * A JUnit rule which starts an embedded elastic-search instance.
 * <p>
 * Tests which use this rule will run relatively slowly, and should only be used when more conventional unit tests are
 * not sufficient - eg when testing DAO-specific code.
 * </p>
 */
public class ESRule implements TestRule {
    /** An elastic-search cluster consisting of one node. */
    private EmbeddedElasticsearchServer eserver;

    /** The internal-transport client that talks to the local node. */
    private Client client;

    /**
     * Return a closure which starts an embedded ES instance, executes the unit-test, then shuts down the
     * ES instance.
     */
    @Override
    public Statement apply(Statement base, Description description) {
        return new Statement() {
            @Override
            public void evaluate() throws Throwable {
                eserver = new EmbeddedElasticSearchServer();
                eserver.start();

                client = eserver.getClient();
                loader = new ESIndicesLoader(client, 1, 1);
                try {
                    base.evaluate(); // execute the unit test
                } finally {
                    eserver.shutdown();
                }
            }
        };
    }

    /** Return the object through which operations can be performed on the ES cluster. */
    public Client getClient() {
        return client;
    }

    /**
     * When data is added to an index, it is not visible in searches until the next "refresh" has been performed.
     * Refreshes are normally done every second, but this makes it explicit..
     */
    public void refresh(String index) {
        try {
            client.admin().indices().prepareRefresh(index).execute().get();
        } catch(Exception e) {
            throw new RuntimeException("Failed to refresh index", e);
        }
    }
}

The majority of the logic is implemented in class EmbeddedElasticsearchServer:

// Author: Simon Kitching
// This code is in the public domain
package net.vonos.testsupport.elastic;

import org.apache.commons.io.FileUtils;
import org.elasticsearch.client.Client;
import org.elasticsearch.cluster.ClusterName;
import org.elasticsearch.common.network.NetworkModule;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.node.Node;

import java.io.File;
import java.io.IOException;

/**
 * Test helper class which starts up an Elasticsearch instance in the current JVM.
 */
public class EmbeddedElasticsearchServer {

    // Suitable location for use with Maven
    private static final String DEFAULT_HOME_DIRECTORY = "target/elasticsearch-home";

    // The embedded ES instance
    private final Node node;

    // Setting "path.home" should point to the directory in which Elasticsearch is installed.
    private final String homeDirectory;

    /**
     * Default Constructor.
     */
    public EmbeddedElasticsearchServer() {
        this(DEFAULT_HOME_DIRECTORY);
    }

    /**
     * Explicit Constructor.
     */
    public EmbeddedElasticsearchServer(String homeDirectory) {
        try {
            FileUtils.deleteDirectory(new File(homeDirectory));
        } catch(IOException e) {
            throw new RuntimeException("Unable to clean embedded elastic-search home dir", e);
        }

        this.homeDirectory = homeDirectory;

        Settings.Builder elasticsearchSettings = Settings.builder()
            .put(Node.NODE_NAME_SETTING.getKey(), "testNode")
            .put(NetworkModule.TRANSPORT_TYPE_KEY, "local")
            .put(ClusterName.CLUSTER_NAME_SETTING.getKey(), "testCluster")
            .put(Environment.PATH_HOME_SETTING.getKey(), homeDirectory)
            .put(NetworkModule.HTTP_ENABLED.getKey(), false)
            .put("discovery.zen.ping_timeout", 0); // make startup faster

        this.node = new Node(elasticsearchSettings.build());
    }

    public void start() throws Exception {
        this.node.start();
    }

    public Client getClient() {
        return node.client();
    }

    public void shutdown() throws IOException {
        node.close();

        try {
            FileUtils.deleteDirectory(new File(homeDirectory));
        } catch (IOException e) {
            throw new RuntimeException("Could not delete home directory of embedded elasticsearch server", e);
        }
    }
}

Other Notes

The ESRule class is used in production code; there the class has some additional helper methods for defining indices and mappings based on json files in the application resources. That may be the subject of a future article..

The above implementation works with Elasticsearch 5.1 (current version at this point in time). It was originally written for Elasticsearch 2.3, and the changes needed were minimal (in case you need to back-port it again…).

Credits

Developing this simple test framework would have been vastly more effort without a number of other blog and stackoverflow postings. Sorry I didn’t track all the places I gathered information from, but thanks to you all..