Partitions
Partitions group nodes with similar characteristics like resources, priorities or run-time limits.
What is a Partition?
In Slurm, a partition is essentially a named group of compute nodes that acts like a queue you submit jobs to, and from a user’s perspective it defines both where your job can run and under what conditions. When you submit a job, you specify a partition (for example with --partition=long), and instead of choosing a specific machine, you are telling Slurm to run your job on any available node within that group.
Each partition is configured with its own rules and characteristics, such as maximum runtime limits, available hardware (like CPUs, GPUs, or high-memory nodes), job priority, and sometimes access restrictions based on user groups. This means partitions are often organized by purpose—for example, a debug partition for quick test jobs, a long partition for extended computations.
From a practical standpoint, choosing the right partition is important because it affects how long your job can run, how quickly it may start, and what resources it can use, but you do not need to worry about the exact node your job runs on, since Slurm automatically handles that selection within the chosen partition.
| Partition | Use when |
|---|---|
debug |
Testing job scripts, quick experiments (max 30 min) |
main |
Standard production jobs up to 8 hours |
long |
Jobs that need more than 8 hours (up to 7 days) |
highmem |
Jobs requiring higher memory capacity per CPU than available in main |
Available Partitions
When in doubt, start with debug to verify your job works, then move to main or long for production runs. If you can estimate how long your job will run, set that value increased with some reasonable margin with the option --time to improve the chances for it to start quickly and to let the scheduler optimize the allocation of resources.
The sinfo 1 command lists partitions and their states:
| Option | Description |
|---|---|
-s, --summarize |
Lists only a partition state summary with no node state details. |
-o, --format |
Specifies the output columns to print, please refer to the manual page for more details. |
Following show an example of overall resource allocation on partitions. The column “NODES(A/I/O/T)” indicates resource state, capital letter are abbreviations for Available, Idle, Other and Total:
>>> sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
debug up 30:00 0/6/0/6 ccsub[0001-0006]
work up 8:00:00 0/10/0/10 ccwks[0001-0010]
highmem up 7-00:00:00 35/4/3/42 ccexe[0327-0346,0348-0369]
main* up 8:00:00 245/20/3/268 ccexe[0001-0049,0056-0065,0068-0083,0090-0191,0233-0277,0280,0282-0326]
long up 7-00:00:00 245/20/3/268 ccexe[0001-0049,0056-0065,0068-0083,0090-0191,0233-0277,0280,0282-0326]
grid up 3-00:00:00 8/25/7/40 ccexe[0192-0199,0201-0232]Show default runtime with limits:
>>> sinfo -o "%9P %6g %11L %10l %5D %20C"
PARTITION GROUPS DEFAULTTIME TIMELIMIT NODES CPUS(A/I/O/T)
debug all 10:00 30:00 6 1044/1260/0/2304
work all 2:00:00 8:00:00 10 0/960/0/960
highmem all 1:00:00 7-00:00:00 42 2306/7422/1024/10752
main* all 2:00:00 8:00:00 268 48390/27202/3256/788
long all 2:00:00 7-00:00:00 268 48390/27202/3256/788
grid all 2:00:00 3-00:00:00 40 2048/6400/1792/10240List CPUs configuration and memory per node:
>>> sinfo -o "%9P %6g %4c %10z %8m %5D %20C"
PARTITION GROUPS CPUS S:C:T MEMORY NODES CPUS(A/I/O/T)
debug all 384 2:96:2 772000 6 520/1784/0/2304
work all 96 2:24:2 191000 10 0/960/0/960
highmem all 256 2:64:2 1031000 42 2306/7422/1024/10752
main* all 256+ 2:64+:2 514000+ 268 48234/27358/3256/788
long all 256+ 2:64+:2 514000+ 268 48234/27358/3256/788
grid all 256 2:64:2 515000 40 2048/6400/1792/10240Print a comprehensive list of idle nodes including available resources:
sinfo -Nel -t idleAn asterisk as suffix ‘*’ indicates the default partition. Compute jobs will be send to the default partition unless a different partition is selected by option.
It is recommended to test your application launch in the debug partition first. This partition has a very short runtime and therefore allows a very quick resource allocation, which prevents long waiting times in the scheduler queue.
Select a Partition
salloc, srun, and sbatch support the following command options to select a partition, which is typically used in conjunction with other options related to resource allocation:
| Option | Description |
|---|---|
-p, --partition |
Request a specific partition for the resource allocation. |
For example to request allocating resources from the debug partition:
sbatch --partition=debug ...You can also overwrite the default partition configuration with the following environment variables:
| Variable | Description |
|---|---|
SLURM_PARTITION |
Interpreted by the srun command |
SALLOC_PARTITION |
Interpreted by the salloc command |
SBATCH_PARTITION |
Interpreted by the sbatch command |
Footnotes
sinfomanual page, SchedMD
https://slurm.schedmd.com/sinfo.html↩︎