JobScheduler Backup Cluster

A JobScheduler backup cluster ensures fail-safe operation of a (primary) JobScheduler. The cluster comprises this primary JobScheduler and one or more reserve (backup) JobSchedulers. A fail-safe system consists of a primary JobScheduler and at least one backup, with both these JobSchedulers running on different computers.

All the JobSchedulers in a backup cluster show their own availability by sending out "heartbeats" and, at the same time, checking whether the other Schedulers in the cluster are available by monitoring their "heartbeats". Should one of the backup JobSchedulers determine the absence of the heartbeat from the primary JobScheduler over a longer period of time (ca. 1-2 minutes), then it will take over processing. This means that it will continue to process the open orders and jobs started by the primary JobScheduler and, if required, start new jobs.

At the most, only one JobScheduler in a cluster is active - the primary JobScheduler - and starts jobs and processes orders. The other backup JobSchedulers are inactive - that is they wait for the primary JobScheduler to fail before becoming active and taking over processing.

The requirements for the operation of a backup JobScheduler cluster are shown schematically in the following diagram and described in detail in the next section.

The diagram below shows schematically the situation where a backup JobScheduler has become active and taken over the processing of jobs and orders:

Conditions for Operating a JobScheduler Cluster

All the JobSchedulers use the same database - Oracle, DB2, MySQL and Postgres databases are supported.
The JobSchedulers must all use the same configuration file or an exact copy of the configuration file.
The primary JobScheduler and the backups in the cluster are all started using the same JobScheduler ID.
All the JobSchedulers - that is, the primary and the backups - must be started using .

Starting a JobScheduler Cluster

The JobSchedulers which form the cluster are to be started in arbitrary series. The active (primary) JobScheduler is the first one to be started without the option set.

Command Line Parameters

The following command line parameters configure a JobScheduler as a member of a backup cluster:

specifies that the JobScheduler is a member of the backup cluster.
specifies that a JobScheduler is to operate as a backup. Should this parameter not be set, then the JobScheduler is defined as being primary. Note that there can be more than one backup JobScheduler - should the active JobScheduler fail, then all the backup JobSchedulers have the same start priority.
is used to set the order in which backup JobSchedulers are made active. Should the active JobScheduler fail, then the JobScheduler with the smallest backup-precedence will become active.

Stopping a JobScheduler in a Backup Cluster Using Web Interface Functions

Job processes which are still running are allowed to finish when a JobScheduler is stopped. New processes are not started. The =<value> parameter can be used to specify a time after which running job processes are forced to stop immediately.

Stopping all the JobSchedulers in a Backup Cluster

A cluster is stopped in that the "terminate cluster" command is called from the JobScheduler Web Interface. This command stops all the JobSchedulers in the cluster.

The corresponding XML command is <terminate all_schedulers="yes">

Stopping all the JobSchedulers in a Backup Cluster Using Timeout

The JobScheduler Web Interface "terminate cluster within 60s" command is used to stop all the JobSchedulers in a cluster. This stops all the JobSchedulers in the cluster. All processes running are stopped within 60 seconds.

The corresponding XML command is <terminate all_schedulers="yes" timeout="60">

Restarting all the JobSchedulers in a Backup Cluster

After all the JobSchedulers have been restarted, then the primary JobScheduler is the active JobScheduler.

The corresponding XML command is <terminate all_schedulers="yes" restart="yes">

Restarting all the JobSchedulers in a Backup Cluster with Timeout

All JobSchedulers in a cluster are stopped when the "terminate and restart cluster" command is called from the JobScheduler Web Interface. This causes all the JobSchedulers in the cluster to be stopped and then restarted. The JobScheduler which was active before the restart will become active once more. All job processes still running will be stopped after 60 Seconds.

After all the JobSchedulers have been restarted, then the primary JobScheduler is the active JobScheduler.

The corresponding XML command is <terminate all_schedulers="yes" restart="yes" timeout="60">

Stopping the Active JobScheduler: Backup JobSchedulers Remain Started but Do Not Become Active

An active JobScheduler is stopped by calling the "terminate" command from the JobScheduler Web Interface. This command has no effect on backup JobSchedulers, they will not take over operation because no failure of the primary JobScheduler has occurred.

The corresponding XML command is <terminate>

Restarting a Primary JobScheduler: Backup JobSchedulers Remain Started but Do Not Become Active

A JobScheduler is stopped and then restarted by entering the "terminate and restart" command in the JobScheduler Web Interface.

The corresponding XML command is <terminate restart="yes">

Restarting a Backup JobScheduler

A JobScheduler is stopped and then restarted by entering the "terminate and restart" command in the JobScheduler Web Interface.

A backup JobScheduler restarted in this way will remain inactive after the restart. However, an inactive primary JobScheduler running in a cluster will become active after this command.

The corresponding XML command is <terminate restart="yes">

Reactivating a Primary JobScheduler

The primary JobScheduler is started. As a backup JobScheduler is already running, the primary JobScheduler does not become active and does not take over processing.
The Backup JobScheduler is then restarted (using "terminate and restart"). As the primary inactive JobScheduler becomes active, as soon as no other JobScheduler is active, it then takes over processing. Note that should there be more than one primary JobScheduler, the JobScheduler which will become active is not fixed.

Handing Over Processing to a Backup JobScheduler

The primary JobScheduler is stopped "fail-safe" from the Web Interface. A running backup JobScheduler then becomes active and takes over processing. When, however, the primary JobScheduler is stopped using restart, then it is not clear whether or not a backup JobScheduler will become active or whether the primary JobScheduler will remain the active processor.

Behavior As A Windows Service

Stopping by way of the Windows Service Panel has the same effect as using the <terminate> command. That is, the backup JobScheduler(s) do not become active. Should, however, a backup JobScheduler be stopped and there be an inactive primary JobScheduler, then this primary JobScheduler will become active.
Restarts of the Windows service are comparable with use of the <terminate restart="yes"> command. A primary JobScheduler and not the backup JobScheduler(s) becomes active.

Behavior When Restarting a Computer

When a computer (on which the active JobScheduler is running) is shut down, then a backup JobScheduler running on a second computer (continue_exclusive_operation="yes") will become active.
When both a primary and a backup JobScheduler are restarted, e.g. by server reboot, then it can be that the backup JobScheduler starts first. In this case, the backup JobScheduler does not become active immediately but first of all waits to see if it receives a heartbeat from the primary JobScheduler. Only when the backup JobScheduler has not received a heartbeat within 60 seconds does it start processing. This is comparable with the standard backup JobScheduler behavior in the event of a missing heartbeat.

Making an Inactive Backup JobScheduler the Active Primary JobScheduler

When an active backup JobScheduler has been stopped and is then restarted, then it will be inactive. Should in this situation the primary JobScheduler then be unavailable for a longer period of time, the backup JobScheduler must then be started as the primary JobScheduler. This can be done by using the start_exclusive parameter instead of start when calling the jobscheduler.cmd shell script.

Start Script Commands

The JobScheduler starts as specified in the Setup when the [start] parameter is given, without any further information.

The following additional commands are available for the operation of a JobScheduler in a backup cluster:

terminate_cluster Shuts down all the JobSchedulers in a backup cluster
restart_cluster Restarts all the JobSchedulers in a backup cluster. The primary JobScheduler active before the restart remains active.
terminate_fail-safe Stops a JobScheduler. Another (inactive) JobScheduler in the cluster becomes active.
start -exclusive Starts a primary JobScheduler in a backup cluster.
start -exclusive -backup Starts a backup JobScheduler in a cluster.
start -exclusive -backup -backup-precedence=[n] Starts a backup JobScheduler in a cluster with the backup-precedence [n].