Hardware management
Overview
Due to the heavy computational load of the method, the software requires the presence of at least one NVIDIA GPU. The spring module is built upon three software layers:
The upper layer is written in Python, and it is directly importable by the user via the objects
spring.Pattern()
,spring.Settings()
,spring.Result()
andspring.MPR()
.Internally, the
spring.MPR()
object calls functions written in C++, accessible by Python thanks to the PyBind11 library.Computationally heavy tasks, most of which are related to the execution of iterative phase retrieval algorithms, are offloaded to GPUs with source code written in CUDA language.
Hardware settings
The spring module takes advantage of both multiple computing cores and multiple GPUs as long as they belong to the same computing node (i.e., they have a shared memory space) by parallelizing computations. Parallelization on distributed memory systems is currently not supported.
There are two parameters in spring.Settings()
that can be tuned to optimize the performance, i.e. the time to solution. Those are threads
and gpus
in the global
section of the spring.Settings()
object (see also Setting up the parameters).
threads
:This value indicates the number of CPU threads to use per GPU. For example, if
threads=8
and two GPUs are selected for computation, the total number of CPU threads actually used is 16. Having more than a single thread per GPU is convenient for two reasons:While most of the work is offloaded to GPUs, there are still some calculations performed on the CPU side that can be boosted by parallel CPU execution.
The offloading of calculations to GPUs is controlled by the CPU, that has to preare the data and transfer it back and forth from the system memory to the GPU memory. Data transfer can then be optimized by having multiple threads handling calculations on a single GPU, as communication and calculations can be overlapped.
Depending on the GPU and CPU models, the optimal number of threads may vary. A number of threads equal to 8 is tipically a good compromise for high-end GPUs.
Note
The user should ensure that the total number of used threads is not exceeding the number of physical computing cores available. If a higher number of threads is set, this tipycally brings a significat drop in performance due to overheads.
gpus
:This parameter is important in the case of a system equipped with multiple GPUS. It can be either an integer number or a list of positive integers. The behavior is the following:
- value:
Example:
gpus=2
. In this case, two GPUs are used, and are selected by their GPU id (i.e. GPU-0 and GPU-1). Ifgpus<0
orgpus
exceeds the number of available GPUs, all GPUs detected are selected for computation.
Note
This behavior can be changed by activating a special flag at compilation time, to select the
N
gpus with lowest current computing load instead of the firstN
gpus available. Further details are given in Unmanaged multi-GPU systems.- list:
Example:
gpus=[0,2,3]
. In this case, the specific GPU ids are selected, i.e. GPU-0, GPU-2 and GPU-3 are selected.
Note
The settings
gpus=2
andgpus=[2]
have different outcomes. In the first case, the first two GPUs are selected (GPU-0 and GPU-1). In the second, only GPU-2 (typically the third one after GPU-0 and GPU-1) is selected.For computing environments managed by job schedulers (like SLURM), it is often convenient to set
gpus=-1
, because even on a multi-GPU computing node only the requested GPUs are visible.
Unmanaged multi-GPU systems
For unmanaged computing nodes, i.e. those where the distribution of resources is not demanded to a job scheduler (e.g. SLURM), the users have to agree on the use of computing resources. This may be tricky especially in the case of a multi-GPU system with multiple users.
The outcome of the gpus=N
parameter in spring.Settings()
can be changed to select the least loaded N GPUs instead of the ones with the first N GPU ids. This allows for a more flexible and optimal sharing of computing resources on a single multi-GPU computing node.
This option has to be enabled at installation time (see also Installation) by setting the environment variable WITH_NVCC=On
. The installation via pip
is then
$ WITH_NVCC=On python3 -m pip install .
In this way, when the reconstruction is launched via the spring.MPR.run()
or spring.MPR.runasync()
, the current load on the available GPUs is inspected and those with less computing load are selected.
Note
The option WITH_NVML=On
requires the presence of the NVIDIA Management Library on the system. This means that the header files must be reachable at compilation time and the corresponding shared library must be reachable at runtime.
Systems without GPUs
At the current status, the core of spring is only implemented as GPU code, so it is not possible to run the imaging algorithm on systems without GPUs.
For testing purposes, there are however two options that, while not allowing actual reconstructions on CPU-only systems, enable the user to test the installation process and use all spring objects apart from spring.MPR
.
It is possible to install the CUDA compiler even on systems not equipped with GPUs.
It is possible to compile the code without a proper CUDA compiler, by setting the enironment variable
WITH_CUDA=Off
before installation with pip, i.e.WITH_CUDA=Off python3 -m pip install .
The user will get an error at runtime if a reconstruction is attemped.