An Elasticsearch5 Transport Client

Categories: BigData

Overview

Elasticsearch client applications implemented in Java can communicate with the Elasticsearch cluster via three different ways:

  • REST: send REST requests over http to one or more nodes;
  • Transport-client: send binary requests in round-robin order to one or more “coordinating” nodes of the cluster, but not integrate into the cluster; or
  • Node-client: become a “non-data-node” member of the cluster itself.

The REST approach can be used from any language. The Elasticsearch project provides a Java-based client library for this mode; external projects provide alternate Java libraries and libraries for other languages. One of the major advantages of the REST approach is that the client is only loosely coupled with the Elasticsearch server version; the REST api is fairly stable even across major releases. Another is that it brings a minimum of dependencies (other libraries).

Transport client mode is currently only possible for Java client applications, as it reuses code that Elasticsearch nodes themselves use - maintaining the same code in a different language would be significant work. This mode has a few limitations, including:

  • The client application either has a fixed set of nodes it talks to, or uses “sniffing” to determine the full set of “coordinating” nodes based on the initially configured set. The client must have network access to all the relevant nodes;
  • The client must use an Elasticsearch library version which is similar to the version of the cluster itself.
  • More dependencies - actually, code wanting to connect this way requires the entire Elasticsearch codebase in the classpath as the Elasticsearch team don’t provide a separate client library.
  • No security. Using REST over https allows gateways to enforce access rules.
  • No encryption - unless you set up IPSET or similar inter-host VPNs, or buy a licence for the Shield extension.

Transport client mode does have several advantages including:

  • High performance (requests go directly to the relevant nodes without proxying, and the internal binary network protocol is used), and
  • High availability (the internal Elasticsearch code has robust failover handling, and clients inherit this implementation)

Node client mode is also only possible for Java client applications. It has the same advantages and disadvantages as transport-client mode, but in addition:

  • Performance is even higher than transport-client mode as requests can be sent directly to non-coordinating nodes
  • Network traffic is higher (cluster topology changes are also propagated to the client node)
  • No security; Elasticsearch nodes trust each other and the client is acting like a node in the cluster.
  • Even higher availability (the client is notified of node failures and performs failover just like other nodes in the cluster).

Using Transport Client Mode in ES5

In Elasticsearch 2.x, the standard ES libraries include a TransportClient class; there is plenty of documentation available on how to use this.

In Elasticsearch 5.x this class has been moved into a separate library - and unfortunately this separate library sets up a lot more infrastructure than the old client does and thus has a bunch of transitive dependencies which were not present when using transport client mode in Elasticsearch 2.x.

Fortunately, it is possible to write an ES5 client app without the full transport-client library, just by copying and lightly modifying two classes.

Here is the transport-client itself:

// Author: Simon Kitching
// This code is in the public domain
package net.vonos.elasticsearch;

import io.netty.util.ThreadDeathWatcher;
import io.netty.util.concurrent.GlobalEventExecutor;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.network.NetworkModule;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.transport.Netty4Plugin;

import java.util.Collections;
import java.util.concurrent.TimeUnit;

/**
 * An Elasticsearch Client implementation which communicates over TCP using the "node client" protocol.
 * <p>
 * This class is derived from org.elasticsearch.transport.client.PreBuiltTransportClient in artifact
 * "org.elasticsearch.client:transport:5.1.1".
 * </p>
 */
public class BasicTransportClient extends TransportClient {

    public BasicTransportClient(Settings settings) {
        super(settings, Settings.EMPTY, addPlugins(Collections.singletonList(Netty4Plugin.class)), null);
    }

    @Override
    public void close() {
        super.close();
        if (NetworkModule.TRANSPORT_TYPE_SETTING.exists(settings) == false
            || NetworkModule.TRANSPORT_TYPE_SETTING.get(settings).equals(Netty4Plugin.NETTY_TRANSPORT_NAME)) {
            try {
                GlobalEventExecutor.INSTANCE.awaitInactivity(5, TimeUnit.SECONDS);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
            try {
                ThreadDeathWatcher.awaitInactivity(5, TimeUnit.SECONDS);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }
    }
}

And here is a factory class for the BasicTransportClient; it is spring-specific but that can easily be removed or ported to other frameworks if needed:

// Author: Simon Kitching
// This code is in the public domain
package net.vonos.elasticsearch;

import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Profile;
import vwg.audi.tracestore.common.exception.CommonMessages;

import java.net.InetAddress;
import java.net.UnknownHostException;

/**
 * A factory for ElasticSearch TransportClient instances.
 * <p>
 * Normally, the getClient() method is called only once, during application startup.
 * </p>
 */
@Configuration
public class ESTransportClientFactory {
    private static final Logger LOG = LoggerFactory.getLogger(ESTransportClientFactory.class);

    @Value("${es.hostList}")
    private String[] esHostList;

    @Value("${es.port}")
    private int esPort;

    @Value("${es.clusterName}")
    private String esClusterName;

    @Value("${es.sniff}")
    private boolean esSniff;

    /**
     * Return a singleton connection to elastic-search.
     */
    @Bean
    public Client getClient() {
        Settings settings = Settings.builder()
            .put("cluster.name", esClusterName)
            .put("client.transport.sniff", esSniff)
            .build();
        try {
            TransportClient client = new BasicTransportClient(settings);
            LOG.info("Created client instance '{}'", System.identityHashCode(client));
            for (String host : esHostList) {
                client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host), esPort));
            }
            return client;
        } catch (UnknownHostException uhe) {
            throw new RuntimeException(String.format(CommonMessages.GENERIC_LOG_MSG_WITH_ORIG_EXC,
                    CommonMessages.ES_UNKNOWN_HOST_ERR_CODE, CommonMessages
                            .ES_UNKNOWN_HOST_ERR_MSG), uhe);
        }
    }
}

References