Partitions
Partitions group nodes with similar characteristics like resources, priorities or run-time limits.
What is a Partition?
In Slurm, a partition is essentially a named group of compute nodes that acts like a queue you submit jobs to, and from a user’s perspective it defines both where your job can run and under what conditions. When you submit a job, you specify a partition (for example with --partition=long), and instead of choosing a specific machine, you are telling Slurm to run your job on any available node within that group.
Each partition is configured with its own rules and characteristics, such as maximum runtime limits, available hardware (like CPUs, GPUs, or high-memory nodes), job priority, and sometimes access restrictions based on user groups. This means partitions are often organized by purpose—for example, a debug partition for quick test jobs, a long partition for extended computations, or a gpu partition for jobs requiring accelerators.
From a practical standpoint, choosing the right partition is important because it affects how long your job can run, how quickly it may start, and what resources it can use, but you do not need to worry about the exact node your job runs on, since Slurm automatically handles that selection within the chosen partition.
| Partition | Use when |
|---|---|
debug |
Testing job scripts, quick experiments (max 30 min) |
main |
Standard production jobs up to 8 hours |
long |
Jobs that need more than 8 hours (up to 7 days) |
high_mem |
Jobs requiring higher memory capacity per CPU than available in main |
gpu |
Jobs using GPU accelerators |
Available Partitions
When in doubt, start with debug to verify your job works, then move to main or long for production runs.
The sinfo 1 command lists partitions and their states:
| Option | Description |
|---|---|
-s, --summarize |
Lists only a partition state summary with no node state details. |
-o, --format |
Specifies the output columns to print, please refer to the manual page for more details. |
Following show an example of overall resource allocation on partitions. The column “NODES(A/I/O/T)” indicates resource state, capital letter are abbreviations for Available, Idle, Other and Total:
>>> sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
debug up 30:00 0/7/3/10 lxbk[0719-0722,1130-1135]
main* up 8:00:00 254/129/57/440 lxbk[0724-1033,1136-1265]
grid up 3-00:00:00 148/106/56/310 lxbk[0724-1033]
high_mem up 7-00:00:00 24/3/19/46 lxbk[1034-1079]
gpu up 7-00:00:00 14/3/33/50 lxbk[1080-1129]
long up 7-00:00:00 197/89/56/342 lxbk[0717-0718,0824-1033,1136-1265]Show default runtime with limits:
>>> sinfo -o "%9P %6g %11L %10l %5D %20C"
PARTITION GROUPS DEFAULTTIME TIMELIMIT NODES CPUS(A/I/O/T)
debug all 5:00 30:00 10 0/1664/384/2048
main* all 2:00:00 8:00:00 440 23058/33838/6144/630
grid all 1:00:00 3-00:00:00 310 10380/14004/5376/297
high_mem all 1:00:00 7-00:00:00 46 2296/4616/4864/11776
gpu all 2:00:00 7-00:00:00 50 1202/430/3168/4800
long all 2:00:00 7-00:00:00 342 19074/28574/6048/536List CPUs configuration and memory per node:
>>> sinfo -o "%9P %6g %4c %10z %8m %5D %20C"
PARTITION GROUPS CPUS S:C:T MEMORY NODES CPUS(A/I/O/T)
debug all 128+ 2:32+:2 257500+ 10 0/1664/384/2048
main* all 96+ 2:24+:2 191388+ 440 23056/33840/6144/630
grid all 96 2:24:2 191388 310 10378/14006/5376/297
high_mem all 256 8:16:2 1031342 46 2296/4616/4864/11776
gpu all 96 2:24:2 515451 50 1202/430/3168/4800
long all 96+ 2:24+:2 191388+ 342 19072/28576/6048/536Print a comprehensive list idle nodes including available resources:
sinfo -Nel -t idleAn asterisk as suffix ‘*’ indicates the default partition. Compute jobs will be send to the default partition unless a specific partition is selected by option.
It is recommended to test your application launch in the debug partition first. This partition has a very short runtime and therefore allows a very quick resource allocation, which prevents long waiting times in the scheduler queue.
Select a Partition
salloc, srun, and sbatch support following command options to select a partition, which is typically used in conjunction with other options related to resource allocation:
| Option | Description |
|---|---|
-p, --partition |
Request a specific partition for the resource allocation. |
For example request resource from the debug partition:
sbatch --partition=debug ...Overwrite the default partition configuration with following environment variables:
| Variable | Description |
|---|---|
SLURM_PARTITION |
Interpreted by the srun command |
SALLOC_PARTITION |
Interpreted by the salloc command |
SBATCH_PARTITION |
Interpreted by the sbatch command |
Footnotes
sinfomanual page, SchedMD
https://slurm.schedmd.com/sinfo.html↩︎