A Simple Implementation of Cluster-lock/Leader-election

Categories: Programming

When running multiple instances of an application, it is sometimes necessary to ensure that a particular task is run on only one instance at a time. This is typical for scheduled background tasks, but may apply to other operations such as listening on a message-queue.

One option is to just run a dedicated instance responsible for all such background tasks, ie to run one instance of the cluster configured to be the “batch” instance. However this can complicate deployment processes and configuration. The alternative is for the instances to dynamically decide which of them is the “leader” or equivalently which is owner of a “cluster lock”.

The Apache Zookeeper project provides a server that supports distributed configuration and coordination, including exclusive locks and leader election. However it is a heavyweight solution; it’s software to install, configure, and keep updated, and it’s complicated to call it from applications.

The Shedlock library for Java implements basic cluster locking using a wide variety of optional strategies - including wrapping calls to Zookeeper.

However if the application you need to add cluster-locking/leader-election to is using a relational database (and well, most applications do) then it’s really trivial to implement this yourself. It’s just a matter of defining a single table and building a wrapper around a single SQL statement (or two if lock release is required).

This assumes that each instance in the cluster tries to run the scheduled task but simply “skips” the work if the lock cannot be obtained.

Table declaration required is something like:

-- define table
create table cluster_lock (
  lock_name varchar(32) primary key,
  expiry_date_time timestamp not null,
  owner_id varchar(32),
  owner_name varchar(32))

-- define all possible locks
insert into cluster_lock (lock_name, expiry_date_time) values ('some_lock_name', now);

The corresponding (pseudocode) SQL command is:

update cluster_lock
set expiry_date_time = now + :lockDuration, owner_id = :ownerId, owner_name = :ownerName
where lock_name = :lockName and (expiry_date_time < now or owner_id = :ownerId)

Update commands return the number of rows modified. When the value is zero then the attempt to obtain the lock failed (some other instance has ownership of the lock). When the value is one then the attempt to obtain the lock succeeded and whatever task is protected by the lock should now be executed by the calling application.

Each instance in the cluster does need to allocate itself a unique owner_id - but a random value chosen on startup is fine. An integer could be used here rather than a string, but these locks are not expected to be checked more often than every few seconds by each instance; even with a few hundred instances in the cluster scalability won’t be a concern here.

Column owner_name is optional, but nice to have. By providing some indication of which instance in the cluster is the current owner, any locking-related problems become easier to diagnose.

The lock has an expiry-time in order to handle crashes of the current lock owner; choose a value that is suitable for whatever task is being done. If the task is a very long-running one, then simply ensure the above SQL statement is repeated before the expiry time in order to “renew” the lock.

If lock release is required (see later) then either the above SQL should be modified to take curr_owner_id and new_owner_id (which are identical for locking, but where new_owner_id is null for release) or a separate SQL can be used:

update cluster_lock
set expiry_date_time = now, owner_id = null, owner_name = null
where lock_name = :lockName and owner_id = :ownerId

The SQL updates can be used in three ways:

  • cluster-lock mode: after task is completed, explicitly release the lock
  • cluster-lock auto mode: for short tasks that are run infrequently, don’t bother to release the lock; it will expire before next task run anyway
  • leader election mode: use a lockDuration which is longer than the scheduled task interval (eg 2x expected interval) and do not release the lock on task completion

In cluster-lock-modes, which instance next obtains the lock is random, ie work should be evenly distributed across the cluster. This approach also has the fastest recovery after crash of a lock owner.

In leader-election-mode, because the next task execution (including attempt to get a lock) occurs before the lock has expired, the existing owner retains ownership of the lock ie all work will be done on one node (unless it crashes). This does mean that on a crash, there will be a somewhat longer delay before another node takes over.

When an instance shuts down, it should ideally release any lock it holds, to reduce the interval until another node takes over. Whether this is really important, however, depends upon your use-case.

This simple algorithm has some great advantages over other solutions, assuming the app already uses a relational DB:

  • no extra processes needed
  • no extra libraries required
  • no extra background threads (other than the scheduled task thread itself)
  • very easy to understand
  • very easy to debug

The clocks of the components don’t even need to be synchronized; all timestamp comparisons are done within the database ie use db server time.