Job Managers

By default the Pulsar will maintain its own queue of jobs. Under Linux however, Pulsar can be configured to maintain multiple such queues with different properties or to delegate to external job queues (via DRMAA, qsub/qstat CLI commands, or Condor).

To configure job managers, uncomment the managers section of app.yml and modify it as needed. For instance, the default job manager corresponds to a configuration of

managers:
  _default_:
    type: queued_python
    num_concurrent_jobs: 1

The type of queued_python is indicating that the jobs are queued but that the queue is managed locally by Pulsar. Other possible values for type include queued_drmaa, queued_condor, queued_cli, queued_external_drmaa (examples of each follow).

Named Managers

The managers section can contain any number of named managers. For example:

managers:
  _default_:
    type: queued_python
    num_concurrent_jobs: 1

  example:
    type: queued_python
    num_concurrent_jobs: "*"

In this instance, Pulsar creates a second named queued (example) that will run as many concurrent jobs as the server has cores. The Galaxy Pulsar url should have /managers/example appended to it to use a named manager such as this.

DRMAA

The queued_python manager type is easy to configure but has serious limitations - for instance jobs running when Pulsar is restarted will be lost. For these reasons it is best to configure a real external job manager when possible.

Likely the cleanest way to interface with an external queueing system is going to be DRMAA. This method will likely work with Slurm, PBS Torque, LSF, etc…. In this case, one should likely setup a local_env.sh file and update it to set DRMAA_LIBRARY_PATH to point to the correct libdrmaa.so file. Also, the Python drmaa module must be installed (e.g. via pip install drmaa).:

managers:
  _default_:
    type: queued_drmaa
    native_specification: "-P bignodes -R y -pe threads 8"

Here the optional native_specification is going to depend on the underlying job manager.

In addition to the default dependencies described in the installation documentation, a DRMAA library will need to be installed and the python dependency drmaa will need to be installed as well to use the queued_drmaa manager. This can be done by activating Pulsar’s virtual environment and running:

pip install drmaa

If you are using DRMAA, be sure to define DRMAA_LIBRARY_PATH in Pulsar’s local_env.sh file.

Condor

Condor can also be used as a backend.

managers:
  _default_:
    type: queued_condor
    # Optional attributes...
    submit_universe: vanilla
    submit_request_memory: 32
    submit_requirements: 'OpSys == "LINUX" && Arch =="INTEL"'
    submit_rank: "Memory >= 64"

This would set universe, request_memory, requirements, and rank in the condor submission file to the specified values. For more information on condor submission files see the HTCondor quickstart for more information.

CLI

Pulsar can manage jobs via command-line execution of qsub, qdel, stat on the local machine.

managers:
  _default_:
    type: queued_cli
    job_plugin: Torque

job_plugin can also be Slurm (to use srun, etc…) or SlurmTorque (to use the Slurm varaint of qsub, etc…).

Pulsar can also login into a remote host before executing these commands if the job manager is not accessible from the Pulsar host.

managers:
  _default_:
    type: queued_cli
    job_plugin: Torque
    shell_plugin: SecureShell
    shell_hostname: queuemanager
    shell_username: queueuser

This will login to queuemanager as user queueuser to submit jobs. Be sure keyless SSH between Pulsar and the remote host is configured in this case.

Run-As-Real User DRMAA

All of the proceeding will run jobs as the same operating system user that Pulsar is running as. The queued_external_drmaa manager type will actually run DRMAA jobs via the user requested by the client (e.g. the Galaxy user).

managers:
  _default_:
    type: queued_external_drmaa
    production: true
    # Following are optional - should leave as defaults in most cases.
    #chown_working_directory_script: scripts/chown_working_directory.bash
    #drmaa_kill_script: scripts/drmaa_kill.bash
    #drmaa_launch_script: scripts/drmaa_launch.bash

For more information on running jobs as the real user, check out this discussion from the Galaxy mailing list.

More Options

Most managers can override the staging_directory used by setting this property in its configuration section.

The min_polling_interval: 0.5 option can be set on any manager to control how frequently Pulsar will poll the resource manager for job updates.

For staging actions initiated by Pulsar (e.g. when driving Pulsar by message queue) - the following parameters can be set to control retrying these actions (if they) fail. (XXX_max_retries=-1 => no retry, XXX_max_retries=0 => retry forever - this may be a bit counter-intuitive but is consistent with Kombu.

preprocess_action_max_retries: -1
preprocess_action_interval_start: 2
preprocess_action_interval_step: 2
preprocess_action_interval_max: 30
postprocess_action_max_retries: -1
postprocess_action_interval_start: 2
postprocess_action_interval_step: 2
postprocess_action_interval_max: 30