A JobScheduler backup cluster ensures fail-safe operation of a (primary) JobScheduler. The cluster comprises this primary JobScheduler and one or more reserve (backup) JobSchedulers. A fail-safe system consists of a primary JobScheduler and at least one backup, with both these JobSchedulers running on different computers.
All the JobSchedulers in a backup cluster show their own availability by sending out "heartbeats" and, at the same time, checking whether the other Schedulers in the cluster are available by monitoring their "heartbeats". Should one of the backup JobSchedulers determine the absence of the heartbeat from the primary JobScheduler over a longer period of time (ca. 1-2 minutes), then it will take over processing. This means that it will continue to process the open orders and jobs started by the primary JobScheduler and, if required, start new jobs.
At the most, only one JobScheduler in a cluster is active - the primary JobScheduler - and starts jobs and processes orders. The other backup JobSchedulers are inactive - that is they wait for the primary JobScheduler to fail before becoming active and taking over processing.
The requirements for the operation of a backup JobScheduler cluster are shown schematically in the following diagram and described in detail in the next section.
The diagram below shows schematically the situation where a backup JobScheduler has become active and taken over the processing of jobs and orders:
The JobSchedulers which form the cluster are to be started in arbitrary series.
The active (primary) JobScheduler is the first one to be started without the
The following command line parameters configure a JobScheduler as a member of a backup cluster:
backup-precedence
will become active.
Job processes which are still running are allowed to finish when a JobScheduler is stopped.
New processes are not started.
The
A cluster is stopped in that the "terminate cluster" command is called from the JobScheduler Web Interface. This command stops all the JobSchedulers in the cluster.
The corresponding XML command is <terminate all_schedulers="yes">
The JobScheduler Web Interface "terminate cluster within 60s" command is used to stop all the JobSchedulers in a cluster. This stops all the JobSchedulers in the cluster. All processes running are stopped within 60 seconds.
The corresponding XML command is <terminate all_schedulers="yes" timeout="60">
All JobSchedulers in a cluster are stopped when the "terminate and restart cluster" command is called from the JobScheduler Web Interface. This causes all the JobSchedulers in the cluster to be stopped and then restarted.
After all the JobSchedulers have been restarted, then the primary JobScheduler is the active JobScheduler.
The corresponding XML command is <terminate all_schedulers="yes" restart="yes">
All JobSchedulers in a cluster are stopped when the "terminate and restart cluster" command is called from the JobScheduler Web Interface. This causes all the JobSchedulers in the cluster to be stopped and then restarted. The JobScheduler which was active before the restart will become active once more. All job processes still running will be stopped after 60 Seconds.
After all the JobSchedulers have been restarted, then the primary JobScheduler is the active JobScheduler.
The corresponding XML command is <terminate all_schedulers="yes" restart="yes" timeout="60">
An active JobScheduler is stopped by calling the "terminate" command from the JobScheduler Web Interface. This command has no effect on backup JobSchedulers, they will not take over operation because no failure of the primary JobScheduler has occurred.
The corresponding XML command is <terminate>
A JobScheduler is stopped and then restarted by entering the "terminate and restart" command in the JobScheduler Web Interface.
The corresponding XML command is <terminate restart="yes">
A JobScheduler is stopped and then restarted by entering the "terminate and restart" command in the JobScheduler Web Interface.
A backup JobScheduler restarted in this way will remain inactive after the restart. However, an inactive primary JobScheduler running in a cluster will become active after this command.
The corresponding XML command is <terminate restart="yes">
The primary JobScheduler is stopped "fail-safe" from the Web Interface. A running backup JobScheduler then becomes active and takes over processing. When, however, the primary JobScheduler is stopped using restart, then it is not clear whether or not a backup JobScheduler will become active or whether the primary JobScheduler will remain the active processor.
When an active backup JobScheduler has been stopped and is then restarted, then it will be inactive.
Should in this situation the primary JobScheduler then be unavailable for a longer period of time,
the backup JobScheduler must then be started as the primary JobScheduler.
This can be done by using the start_exclusive
parameter instead of start
when calling the jobscheduler.cmd
shell script.
The JobScheduler starts as specified in the Setup when the [start] parameter is given, without any further information.
The following additional commands are available for the operation of a JobScheduler in a backup cluster:
backup-precedence
[n].