Cluster

Modified

March 20, 2026

Abstract

The Virgo cluster uses Slurm for workload management. This section covers how to configure, submit, and monitor compute jobs.

Introduction

Think of an HPC cluster like a shared supercomputer that many people use at the same time. A scheduler like Slurm1 is the system that decides who gets to use which part of the computer, when, and for how long. The basic idea is: You don’t run programs directly like on your laptop. Instead, you ask the scheduler for resources, and it runs your job when those resources are available.

Following illustrates a step-by-step workflow:

  1. Login — You connect via SSH to a submit node.
  2. Batch script — Write a batch job script that defines what program to run, which resources (CPUs, RAM) are required, and how long it will execute.
  3. Submit job — You send the script to Slurm, and your job goes into a queue.
  4. Waiting in the queue — Slurm keeps all user jobs in a queue and starts them on free resources when available.
  5. Job runs — Slurm assigns compute resources and your job is started using the batch script.
  6. Access output — After the job finished access output data and log-files.

Overview

Overview of sections in this chapter
Page Description
Environment Slurm commands, meta-commands, working directory, I/O redirection
Partitions Available partitions and their resource limits
Resource Allocation Interactive jobs, batch jobs, and recurring jobs with scrontab
Resource Constrains Runtime, memory, CPU, and feature constraints
Scheduler Queue Job states and job queue priority
Monitoring & Efficiency Monitor resource usage, analyse job failures and efficiency
Accounts Slurm accounts, coordinators, and fair-share
Reservations Requesting and using resource reservations

Footnotes

  1. Slurm Overview, SchedMD Documentation
    https://slurm.schedmd.com/overview.html↩︎