.. _configure: ------------------ Configuring Pulsar ------------------ If either installation procedure has been followed, your Pulsar directory should contain two files of interest: ``app.yml`` to configure the Pulsar application and ``server.ini`` to configure the web server (unless you are running Pulsar without a web server). Default values are specified for all configuration options that will work if Pulsar is running on the same host as Galaxy (e.g. for testing and development). Otherwise, the ``host`` setting of ``server.ini`` will need to be modified to listen for external requests. ``app.yml`` settings can be overridden by setting environment variables, just as with Galaxy, by prefixing the config setting name with ``PULSAR_CONFIG_OVERRIDE_``. For example:: $ export PULSAR_CONFIG_OVERRIDE_PRIVATE_TOKEN=changed $ pulsar Defaults can also be set via environment variables by prefixing them with ``PULSAR_CONFIG_``. For example, ``PULSAR_CONFIG_PRIVATE_TOKEN``. Security -------- Out of the box, **Pulsar essentially allows anyone with network access to the Pulsar server to execute arbitrary code and read and write any files the web server can access.** Hence, in most settings steps should be taken to secure the Pulsar server. Private Token ````````````` If running Pulsar with a web server, you *must* specify a private token (a shared secret between Pulsar and the Galaxy server) to prevent unauthorized access. This is done by simply setting ``private_token`` in ``app.yml`` to some long random string. Once a private token is configured, Galaxy job destinations should include a ``private_token`` parameter to authenticate these jobs. Pulsar Web Server ````````````````` The default Pulsar web server, `Paste`_ can be configured to use SSL and to require the client (i.e. Galaxy) to pass along a private token authorizing use. .. tip:: SSL support is built in to `uWSGI`_, an alternate webserver that can be installed (see :ref:`install`). ``pyOpenSSL`` is required to configure a Pulsar web server to server content via HTTPS/SSL. This dependency can be difficult to install and seems to be getting more difficult. Under Linux you will want to ensure the needed dependencies to compile pyOpenSSL are available - for instance in a fresh Ubuntu image you will likely need:: $ sudo apt-get install libffi-dev python3-dev libssl-dev Then pyOpenSSL can be installed with the following command (be sure to source your virtualenv if setup above):: $ pip install pyOpenSSL Once installed, you will need to set the option ``ssl_pem`` in ``server.ini``. This parameter should reference an OpenSSL certificate file for use by the Paste_ server. This parameter can be set to ``*`` to automatically generate such a certificate. An unsigned certificate for testing purposes can be manually generated by the following method:: $ openssl genrsa 1024 > host.key $ chmod 400 host.key $ openssl req -new -x509 -nodes -sha1 -days 365 \ -key host.key > host.cert $ cat host.cert host.key > host.pem $ chmod 400 host.pem More information can be found in the `paste httpserver documentation `_. Message Queue ````````````` If Pulsar is processing requests via a :ref:`message queue ` instead of a web server the underlying security mechanisms of the message queue should be used to secure communication - deploying Pulsar with SSL and a ``private_token`` described above are not applicable. This can be done via two (not mutually exclusive) methods: client SSL certificates, or password authentication. In either case, you should configure your AMQP server with SSL. If using client certificates, you will likely need to set the appropriate (for your PKI) combination of ``amqp_connect_ssl_ca_certs``, ``amqp_connect_ssl_keyfile``, ``amqp_connect_ssl_certfile``, and ``amqp_connect_ssl_cert_reqs``, in Pulsar's ``app.yml`` file. See ``app.yml.sample`` for more details. If using password authentication, this information can be set in the ``message_queue_url`` setting in ``app.yml``, e.g., with SSL:: message_queue_url: amqps://user:password@mqserver.example.org:5671// You can consult the `Kombu documentation `__ for even more information. User Authentication/Authorization ````````````` You can configure Pulsar to authenticate user during request processing and check if this user is allowed to run a job. Various authentication/authorization plugins can be configured in `app.yml` to do that and plugin parameters depend on auth type. For example, the following configuration uses `oidc` plugin for authentication and `userlist` for authorization:: user_auth: authentication: - type: oidc oidc_jwks_url: https://login.microsoftonline.com/xxx/discovery/v2.0/keys oidc_provider: azure oidc_username_in_token: preferred_username oidc_username_template: *. authorization: - type: userlist userlist_allowed_users: - xxx see `plugins folder `_ for available plugins and their parameters. Customizing the Pulsar Environment (\*nix only) ----------------------------------------------- For many deployments, Pulsar's environment will need to be tweaked. For instance to define a ``DRMAA_LIBRARY_PATH`` environment variable for the ``drmaa`` Python module or to define the location to a find a location of Galaxy (via ``GALAXY_HOME``) if certain Galaxy tools require it or if Galaxy metadata is being set by the Pulsar. The file ``local_env.sh`` (created automatically by ``pulsar-config``) will be source by ``pulsar`` before launching the application and by child process created by Pulsar that require this configuration. Job Managers (Queues) --------------------- By default the Pulsar will maintain its own queue of jobs. While ideal for simple deployments such as those targeting a single Windows instance, if Pulsar is going to be used on more sophisticated clusters, it can be configured to maintain multiple such queues with different properties or to delegate to external job queues (via DRMAA, qsub/qstat CLI commands, or Condor). For more information on configured external job managers, see :ref:`job_managers`. Galaxy Tools ------------ Some Galaxy tool wrappers require a copy of the Galaxy codebase itself to run. Such tools will not run under Windows, but on \*nix hosts the Pulsar can be configured to add the required Galaxy code a jobs ``PYTHON_PATH`` by setting ``GALAXY_HOME`` environment variable in the Pulsar's ``local_env.sh`` file (described above). Most Galaxy tools require external command-line tools, known as *Galaxy Tool Dependencies*, to execute correctly. In Galaxy, these are provided by its `Dependency Resolution`_ system. Pulsar uses this same system, which can be configured via the ``dependency_resolution`` option in ``app.yml``. See the example in `app.yml.sample`_ for additional information. In its default configuration, Pulsar will automatically install Conda but not automatically install missing tool dependencies. Administrators sending large numbers of tools to Pulsar most likely want to enable the ``auto_install`` option on the ``conda`` dependency resolver or the ``conda_auto_install`` global option so that it is not necessary to manually install dependencies for tools sent to Pulsar. Both options are documented in the `app.yml.sample`_ file. Message Queue (AMQP) -------------------- Galaxy and Pulsar can be configured to communicate via a message queue instead of a Pulsar web server. In this mode, Pulsar and Galaxy will send and receive job control and status messages via an external message queue server using the `AMQP`_ protocol. This is sometimes referred to as running Pulsar "webless". Information on configuring `RabbitMQ`_, one such compatible message queue, can be found in :ref:`galaxy_with_rabbitmq_conf`. In addition, when using a message queue, Pulsar will download files from and upload files to Galaxy instead of the inverse. Message queue mode may be very advantageous if Pulsar needs to be deployed behind a firewall or if the Galaxy server is already set up (via proxy web server) for large file transfers. A template configuration for using Galaxy with a message queue can be created by ``pulsar-config``:: $ pulsar-config --mq You will also need to ensure that the ` ``kombu`` Python dependency is installed (``pip install kombu``). Once this is available, simply set the ``message_queue_url`` property in ``app.yml`` to the correct URL of your configured `AMQP`_ endpoint. AMQP does not guarantee message receipt. It is possible to have Pulsar (and Galaxy) require acknowledgement of receipt and resend messages that have not been acknowledged, using the ``amqp_ack*`` options documented in `app.yml.sample`_, but beware that enabling this option can give rise to the `Two Generals Problem`_, especially when Galaxy or the Pulsar server are down (and thus not draining the message queue). In the event that the connection to the AMQP server is lost during message publish, the Pulsar server can retry the connection, governed by the ``amqp_publish*`` options documented in `app.yml.sample`_. Caching (Experimental) ---------------------- Pulsar and its client can be configured to cache job input files. For some workflows this can result in a significant decrease in data transfer and greater throughput. On the Pulsar server side - the property ``file_cache_dir`` in ``app.yml`` must be set. See Galaxy's `job_conf.xml `_ example file for information on configuring the client. More discussion on this can be found in `this galaxy-dev mailing list thread `_ and future plans and progress can be tracked on `this Trello card `_. .. _Dependency Resolution: https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html .. _Paste: https://pythonpaste.readthedocs.io/en/latest/ .. _uWSGI: https://uwsgi-docs.readthedocs.io/ .. _AMQP: http://en.wikipedia.org/wiki/AMQP .. _RabbitMQ: https://www.rabbitmq.com/ .. _app.yml.sample: https://github.com/galaxyproject/pulsar/blob/master/app.yml.sample .. _Two Generals Problem: https://en.wikipedia.org/wiki/Two_Generals%27_Problem