Installation#
Compatible Job Schedulers#
For optimal performance the flux framework is recommended as job scheduler. Even when the Simple Linux Utility for Resource Management (SLURM) or any other job scheduler is already installed on the HPC cluster flux framework can be installed as a secondary job scheduler to leverage flux framework for the distribution of resources within a given allocation of the primary scheduler.
Alternatively, pympipool
can directly create job steps in a SLURM allocation using the srun
command. Still this always
queries the central database of the SLURM job scheduler which can decrease the performance of the job scheduler and is
not recommended.
pympipool with Flux Framework#
The flux framework uses libhwloc
and pmi
to understand the hardware it is running on and to booststrap MPI.
libhwloc
not only assigns CPU cores but also GPUs. This requires libhwloc
to be compiled with support for GPUs from
your vendor. In the same way the version of pmi
for your queuing system has to be compatible with the version
installed via conda. As pmi
is typically distributed with the implementation of the Message Passing Interface (MPI),
it is required to install the compatible MPI library in your conda environment as well.
AMD GPUs with mpich / cray mpi#
For example the Frontier HPC cluster at Oak Ridge National Laboratory uses
AMD MI250X GPUs with cray mpi version which is compatible to mpich 4.X
. So the corresponding versions can be installed
from conda-forge using:
conda install -c conda-forge flux-core flux-sched mpich>=4 pympipool
Nvidia GPUs with mpich / cray mpi#
For example the Perlmutter HPC at the National Energy Research Scientific
Computing (NERSC) uses Nvidia A100 GPUs in combination with cray mpi which is compatible to mpich 4.X
. So the
corresponding versions can be installed from conda-forge using:
conda install -c conda-forge flux-core flux-sched libhwloc=*=cuda* mpich>=4 pympipool
When installing on a login node without a GPU the conda install command might fail with an Nvidia cuda related error, in this case adding the environment variable:
CONDA_OVERRIDE_CUDA="11.6"
With the specific Nvidia cuda library version installed on the cluster enables the installation even when no GPU is present on the computer used for installing.
Intel GPUs with mpich / cray mpi#
For example the Aurora HPC cluster at Argonne National Laboratory uses Intel Ponte
Vecchio GPUs in combination with cray mpi which is compatible to mpich 4.X
. So the corresponding versions can be
installed from conda-forge using:
conda install -c conda-forge flux-core flux-sched mpich=>4 pympipool
Alternative Installations#
Flux is not limited to mpich / cray mpi, it can also be installed in compatibility with openmpi or intel mpi using the openmpi package:
conda install -c conda-forge flux-core flux-sched openmpi pympipool
Test Flux Framework#
To validate the installation of flux and confirm the GPUs are correctly recognized, you can start a flux session on the login node using:
flux start
This returns an interactive shell which is connected to the flux scheduler. In this interactive shell you can now list the available resources using:
flux resource list
The output should return a list compareable to the following example output:
STATE NNODES NCORES NGPUS NODELIST
free 1 6 1 ljubi
allocated 0 0 0
down 0 0 0
As flux only lists physical cores rather than virtual cores enabled by hyper-threading the total number of CPU cores might be half the number of cores you expect.
Flux Framework as Secondary Scheduler#
When the flux framework is used inside an existing queuing system, you have to communicate the available resources to
the flux framework. For SLURM this is achieved by calling flux start
with srun
. For an interactive session use:
srun --pty flux start
Alternatively, to execute a python script <script.py>
which uses pympipool
you can call it with:
srun flux start python <script.py>
PMI Compatibility#
When pmi version 1 is used rather than pmi version 2 then it is possible to enforce the usage of pmi-2
during the
startup process of flux using:
srun –mpi=pmi2 flux start python <script.py>
Without Flux Framework#
It is possible to install pympipool
without flux, for example for using it on a local workstation or in combination
with the Simple Linux Utility for Resource Management (SLURM). While this is not recommended
in the high performance computing (HPC) context as pympipool
with block_allocation=False
is going to create a SLURM
job step for each submitted python function.
In this case pympipool
can be installed using:
conda install -c conda-forge pympipool
This also includes workstation installations on Windows and MacOS.