Talend Basic Install on Linux - Wizard

Categories: BigData

Overview

This page describes how to install the core components of the Talend software suite on Linux using the official Talend Installation Wizard application. This article is intended to be read only after this parent article on installing Talend.

I personally do not recommend using the Talend install-wizard for several reasons:

  • the install approach does not result in a production install (poor security, no high-availability, etc)
  • the install (in version 7.0.1 at least) is just broken in too many ways, and fixing it is hard as you don’t know what the wizard has done.

In my opinion, installing “manually” produces a far better result (see parent article). However, as this was the first approach I tried, I have at least partial instructions anyway - so here they are.

See comments in the parent page for more details on the problems with the wizard-based install process.

Using the Wizard

The instructions below use the Talend “installer wizard” in text mode to install software on the “master” node. The wizard is reasonably easy to use. However it must be run as root - which is bad for security.

On the positive side, even when the installer is run as root, it creates a user “talenduser” for executing all services. Sadly not a user-per-service, but perhaps acceptable when the jobserver/runtime/esb components are manually set up to run as dedicated users.

The installer (based on the commercial Bitrock installer platform) can be used non-interactively by providing a config-file with settings for all the necessary config-options. This allows the installer to be executed from system configuration tools such as Puppet.

Note that the following instructions set up a mysql database rather than using the “toy” H2 database that is included with Talend. However in general, this page still describes how to set up a “play” environment rather than a production-ready setup. Note also that the “Data Quality Portal” component absolutely requires an external database (mysql/oracle/postgres) - it does not have any support for a bundled DB.

The actual installer binary used (platform-specific) is the one listed in the original email received from Talend after purchasing a license as:

Platform Installer (without download manager):
www.opensourceetl.net/tis/tdf_701/Talend-Installer-Starter-20180411_1414-V7.0.1-installer.zip

Even Talend do not recommend the installer wizard for production use; the following comment is present in the Talend installer guide on page 39:

Talend Installer allows you to get out-of-the-box Talend solutions that do not require any manual installation. However, these solutions are not provided in a production-ready environment as they may require additional configurations or optimizations according to your specific needs.

I would agree with this statement. Concerns about the installer for production use include:

  • installer running as root
  • all services running as the same user
  • no SSL enabled for services (all web services are http-only)
  • no high-availability features enabled
  • only one zookeeper node

Given that a development environment should at least somewhat resemble a production environment, these are (in my opinion) good reasons to also avoid the wizard for non-production installs.

Note that the real information on using this installer is present in the installer guide starting at page 39; earlier pages can be skipped.

Install Java

Java is needed - but sadly the installer of Talend 7.0.1 does not support the OpenJDK distribution of Java - the installer simply exits with “supported java version not found”. And Ubuntu 16.04 bundles only OpenJDK, not the Oracle JDK. It is therefore necessary to download JDK 1.8 for Linux x86-64 from java.oracle.com, unzip that and then use it. Note that later versions of the Talend installer do support OpenJDK - in which case just apt install openjdk will hopefully be enough..

Assuming that Oracle Java (tar-gzip-file) has been copied to the target machine, do the following as root:

tar zxf jdk-8*.tar.gz
JBASE=`basename jdk1.8*`  # get location the file was unpacked into
mkdir -p /opt/oracle/java
mv $JBASE /opt/oracle/java/
echo "export JAVA_HOME=/opt/oracle/java/$JBASE" >> ~/setjava.sh
echo 'export PATH=$JAVA_HOME/bin:$PATH' >> ~/setjava.sh

Install MySQL JDBC Driver

Sadly, the talend installer does not include the mysql JDBC driver - they claim that is forbidden due to mysql licensing conditions. There are two ways to ensure the driver is available to Talend services that need it:

  • enter the path to the driver jarfile when prompted by the installer wizard, or
  • install the services first, then copy the file in to the appropriate directory within the service and restart it

The first is obviously easiest.

The easiest way to get the driver itself is “apt install libmysql-java”. This creates:

  • /usr/share/java/mysql-connector-java-5.1.38.jar
  • two additional symbolic links in the same dir that point to the above
  • mysql-connector-java.jar
  • mysql.jar

Note that if you instead download the driver manually from the mysql website, then the jar-file may be missing file “org.gjt.mm.mysql.Driver” - that is an old and obsolete class, but the one that Talend mysql configuration defaults to.

If you prefer to do the “install after” approach (or miss a service) then:

# find apps
find /opt/Talend-7.0.1 -name "apache-tomcat"

# For each talend app that needs it, install and restart service. Example for talend-tac:
mkdir -q /opt/Talend-7.0.1/tac/apache-tomcat/endorsed
cp /usr/share/java/mysql-connector-java.jar /opt/Talend-7.0.1/tac/apache-tomcat/endorsed/
systemctl restart talend-tac-7.0.1.service

Warning: strangely, after install you can see that the tdqp (Data Quality Portal) component does include its own copy of the mysql driver (unlike the tac component) - it MUST, as the installer checks database connectivity during install time.

It isn’t clear why Talend recommend putting the mysql jarfile in an “apache-tomcat/endorsed” directory - endorsed jars are usually only needed to override code bundled in the JDK itself.

Download Required Talend Files

Ensure all necessary files are on the target server in the root user’s home directory, either by downloading them there directly (with wget or similar), or by downloading them on a non-server machine and then copying it to the target server (with scp or similar). The required files are:

  • Talend-Installer-*-linux64-installer.run
  • dist.dms (renamed to just dist)
  • license

Run Installer

Log into the target machine as user root and run the installer in text mode:

source setjava.sh   # script created when installing Java above
chmod +x Talend-Installer-*.run
./Talend-Installer-*.run --mode text

Note that the JAVA_HOME and PATH changes defined in “setjava.sh” do not need to be in root’s “.profile” file; these values are inserted into the startup scripts generated by the install process - ie only need to be defined while the installer runs. The Talend executable files run as user “talenduser”, not as root, so changes to root’s profile will not affect these after install completes.

The installer takes a long time. It is therefore recommended to run the installer using the “screen” linux tool so that if the SSH session dies, the install continues and you can reconnect to it later.

When the installer prompts for install-type, select “advanced”. Then select “custom”.

It would be easier to just choose “server install” - but I think it is important to NOT install the runtime and jobserver features on the master node:

  • because it would overload the master which also has lots of other things, and
  • can mask errors that do not properly distribute workloads across all available workers

The “server install” option would be appropriate if we wanted an “all on one server” installation.

When prompted for license, enter “license” (name of file in current working dir). Note that accepting the license takes about 20 seconds. The installer then prompts for a list of components to install. Accept the defaults except for:

  • runtime, jobserver, ESB - these will be installed on worker nodes
  • SAP RFC server – not needed
  • studio - will be installed separately on developer desktops

Note that “Server Services” is a pseudo-component that registers the other components for auto-start on server boot. It is this component that requires the installer to be run as root. There are more secure ways to design an installer, obviously..

The installer then prompts for lots of info, but usually the default is appropriate. Where appropriate, choose “mysql” as the database to use, and enter the mysql account details created earlier (see parent page).

For MDM, do not choose “$container” - instead just enter a fixed database “talend_mdm” (created in mysql earlier); the alternative requires giving Talend the mysql admin user account password so it can dynamically create databases. This is more complex than we need.

Once all data has been entered, the actual install takes 5-10 minutes.

The “server services” component will fail during install. Choosing “don’t remove existing services” at least leaves you with a partly-functioning system. However see the “problems” section later.

Check Services

Check service status in systemd:

systemctl status talend-*

Don’t get too excited about seeing talend actually integrated into systemd - the systemd unit files simply point at old-fashioned sysv-init startup scripts

Fix Startup Failures

Several of the services listed by systemctl will be marked as “failed”:

  • talend-zookeeper
  • talend-kafka
  • talend-nexus

The cause appears to just be a buggy installer; hopefully this will be fixed in future versions, but for v7.0.1 I found it necessary to fix the problem manually.

The problem is that although the value of $JAVA_HOME defined at the time the installer runs is inserted into talend startup scripts, it is not inserted into the startup scripts of the above non-talend products. They then fail to start as they cannot find the java binary.

The solution I recommend is to:

  • create a file “/home/talenduser/systemd-env” containing the line “JAVA_HOME=/opt/oracle/java/...
  • for each failing service, insert the following line into the “[Service]” section of file “/etc/systemd/system/talend-{service}
    • EnvironmentFile=/home/talenduser/systemd-env
  • run “systemctl daemon-reload
  • run “systemctl restart talend-*

Note that systemd runs the services as user “talenduser” but does not load “~talenduser/.profile”.

Problems

At this point, you will notice that a whole bunch of stuff is not there - most importantly the logging components. I have no idea what to do at this point - these are really necessary. It was here that I decided to switch to a “manual” install - which was itself complicated, but less complicated than trying to fix the installer. See the parent page for instructions on doing a Talend install “manually”.

Useful Files

Here are some filesystem locations used by Talend which might be useful to know:

  • /opt/Talend-7.0.1 - the main install directory
  • /opt/Talend-7.0.1/utils - contains all startup/shutdown scripts, plus systemd unit files
  • /etc/systemd/system/talend-* - installed service files for talend (ie copies of those in the above utils dir)
  • /etc/talend-mdm - config settings for MDM
  • <tomcat_path>/WEB-INF/classes/configuration.properties - the configuration settings for each app