First Job

Modified

May 7, 2026

Abstract

The examples introduced in this section use the debug partition to test an application launch. Remember, that the debug partition provides limited runtime, and short wall and response times in order to execute jobs on a short notice.

Before You Begin

Make sure you have completed the following steps before continuing:

You can log in to a submit node via SSH
You have a Slurm account association (see Accounts)
You know the path to your Lustre working directory (see Lustre)

Display an overview of the allocatable resources from the debug partitions with the sinfo ¹ command:

sinfo -lN -p debug

Setup

The examples on this page use the following environment variables. Adjust LUSTRE_HOME to match your group’s directory on shared storage:

# shared storage on the cluster (adjust to your group's path)
export LUSTRE_HOME=/lustre/$(id -g -n)/$USER

# use the debug partition for quick allocation times in this tutorial
export SBATCH_PARTITION=debug
export SALLOC_PARTITION=debug

First Job

In case you are interested in more elaborate examples how to run applications on the cluster, we would like to draw your attention to The Virgo Blog. The cluster group plans to continually publish articles illustrating common use-cases and best practices.

Let’s move you through the steps required to execute your first application (cf. batch jobs) on the cluster. Slurm expects an executable as argument to the sbatch command. Typically this is a wrapper-script including Slurm meta-commands setting runtime configuration options for Slurm and all the specifics to launch a user application.

Following is a very simplistic wrapper script that identifies the compute node executing the job. Create it in your $LUSTRE_HOME working directory:

cat > $LUSTRE_HOME/sleep.sh <<'EOF'
#!/bin/bash
#SBATCH --output %j_%N.out
hostname ; sleep ${1:-30}  # first positional argument, default: 30
EOF
chmod +x $LUSTRE_HOME/sleep.sh

Once you have created the file above, submit it to the Slurm workload management system using the sbatch command. In order to simplify monitoring of this job, specify a job name with the command line option --job-name. We will use your $LUSTRE_HOME as working-directory:

# submit a job to sleep for 300 seconds
sbatch --job-name sleep --chdir $LUSTRE_HOME -- $LUSTRE_HOME/sleep.sh 300

The system answers with the JOBID, if it has been accepted. The job label can be used as an option to the squeue command, which prints the state of the scheduling queue:

# list all jobs with a given name
squeue --name sleep

In the debug queue, the job-state should quickly become R for running. Of course it is possible to submit multiple jobs with the same label, but each will be identifiable by its unique JOBID. Go on and submit some more jobs with different sleep times. The scontrol command can be used to show details about the runtime configuration of a job. Jobs can be removed from the system using scancel ².

# show the runtime configuration of the latest sleep job
scontrol show job $(squeue -h -o %A -n sleep | tail -n1)

Once a job disappears from squeue, it has finished. Use sacct to check completed jobs and their exit status:

# show recent jobs for your user
sacct --starttime now-1hour --format=JobID,JobName,State,ExitCode,Elapsed

The output file specified by #SBATCH --output (in this example %j_%N.out) will be written to the working directory.

First Issue

In case of a failure during job execution, it is important to distinguish between a problem internal to the application and an issue of the job execution. In case you want to report a problem with the runtime environment on the cluster, follow the guidelines in report-issues. For this example, we will work with a “broken” program called segfaulter, which is a variant of the famous “Hello World” program.

// segfaulter.c
int main(void)
{
    char *s = "hello world";
    *s = 'H';
}

The following commands use $LUSTRE_HOME/bin and $LUSTRE_HOME/src:

export PATH=$LUSTRE_HOME/bin:$PATH
mkdir -p $LUSTRE_HOME/bin $LUSTRE_HOME/src

Build an executable binary from this small C program using following commands. Execution of this program yields a segmentation fault ³ with non zero program exit code:

# Compile the program...
» gcc -g $LUSTRE_HOME/src/segfaulter.c -o $LUSTRE_HOME/bin/segfaulter

# ...and execute it
» $LUSTRE_HOME/bin/segfaulter ; echo $?
zsh: segmentation fault  $LUSTRE_HOME/bin/segfaulter
139            # <-- exit code

In the execution environment of a compute cluster, it is important to monitor the runtime requirements of an application as well as the runtime behavior of the application itself. This information will help to determine problems occurring during job execution. Users can implement environment checks and basic application monitoring within an application wrapper script.

#!/bin/bash

function puts() {
  echo \[$(date +%Y/%m/%dT%H:%M:%S)\] "$@"
}

uid=$(id -u)

puts JOB CONFIGURATION ----------------------
scontrol show job -d -o $SLURM_JOB_ID
puts JOB CONFIGURATION END ------------------
puts JOB ENVIRONMENT ------------------------
env | grep ^SLURM_
puts JOB ENVIRONMENT END --------------------
puts NUMA CONFIGURATION ----------------------
lscpu | grep -e '^Model name' -e '^NUMA node[0-9]'
puts NUMA CONFIGURATION END ------------------

#####################################################
## APPLICATION
#####################################################

# Generate defined load on the execution node
command="srun -- $@"

#####################################################

puts EXEC $command
$command &

# The process ID of the last spawned child process
child=$!
sleep 1 # wait for start-up
puts PROCESSES -------------------------------
ps -u $USER -o user,pid,cpuid,args -H
puts PROCESSES END ---------------------------

puts WAIT PID $child
wait $child
# Exit signal of the child process
state=$?
puts EXIT $state
# Propagate last signal to the system
exit $state

For the following example, we store the code above in a file called generic-wrapper and use it to monitor the segfaulter program during execution. The segfaulter program is passed as first argument to the generic-wrapper, which will execute it using srun ⁴.

# make the wrapper script executable
chmod +x $LUSTRE_HOME/generic-wrapper

# use the wrapper script to launch the segfaulter program
sbatch --partition debug \
       --job-name segfaulter \
       --chdir $LUSTRE_HOME \
       --no-requeue \
       -- $LUSTRE_HOME/generic-wrapper \
          $LUSTRE_HOME/bin/segfaulter

In the job standard output (cf. I/O redirection) you will find following information:

The runtime configuration of the job in Slurm including:
- Submit node and submit time
- Slurm job id with partition and account details
- Start time, working directory, output streams
- Resource allocation details, including execution node(s)
The Slurm environment variables available during job execution
The hardware NUMA architecture
The application launch command executed
The application process ID, and a process tree during runtime
Exit code of the program

The information described above will be instrumental to report-issues to the cluster support group.

Footnotes

sinfo Manual Page, SchedMD
https://slurm.schedmd.com/sinfo.html ↩︎
scancel Manual Page, SchedMD
https://slurm.schedmd.com/scancel.html ↩︎
Segmentation Fault, Wikipedia
https://en.wikipedia.org/wiki/Segmentation_fault ↩︎
srun Manual Page, SchedMD
https://slurm.schedmd.com/srun.html ↩︎

--- title: First Job date-modified: 2026/05/07 abstract: > The examples introduced in this section use the debug partition to test an application launch. Remember, that the debug partition provides limited runtime, and short wall and response times in order to execute jobs on a short notice. --- ## Before You Begin Make sure you have completed the following steps before continuing: - You can **log in** to a [submit node](../access/submit-nodes.md) via SSH - You have a **Slurm account** association (see [Accounts](../cluster/accounts.md)) - You know the path to your **Lustre working directory** (see [Lustre](../storage/lustre.md)) Display an overview of the allocatable resources from the `debug` [partitions][4G8R2] with the `sinfo` [^KKAXi] command: [4G8R2]: ../cluster/partitions.html [^KKAXi]: `sinfo` Manual Page, SchedMD <https://slurm.schedmd.com/sinfo.html> ```sh sinfo -lN -p debug ``` ## Setup The examples on this page use the following environment variables. Adjust `LUSTRE_HOME` to match your group's directory on [shared storage][TgajF]: [TgajF]: ../storage.html#shared-storage ```bash # shared storage on the cluster (adjust to your group's path) export LUSTRE_HOME=/lustre/$(id -g -n)/$USER # use the debug partition for quick allocation times in this tutorial export SBATCH_PARTITION=debug export SALLOC_PARTITION=debug ``` ## First Job ::: {.callout-note appearance="simple"} In case you are interested in more elaborate examples how to run applications on the cluster, we would like to draw your attention to [The Virgo Blog][xII5y]. The cluster group plans to continually publish articles illustrating common use-cases and best practices. [xII5y]: ../../blog/index.html ::: Let's move you through the steps required to execute your first application (cf. [batch jobs][Zprlw]) on the cluster. Slurm expects an executable as argument to the `sbatch` command. Typically this is a **wrapper-script** including Slurm [meta-commands][ka5PU] setting runtime configuration options for Slurm and all the specifics to launch a user application. [Zprlw]: ../cluster/resource-allocation.html#batch-jobs [ka5PU]: ../cluster/environment.html#meta-commands Following is a very simplistic wrapper script that identifies the compute node executing the job. Create it in your `$LUSTRE_HOME` [working directory][gwgLz]: [gwgLz]: ../cluster/environment.html#working-directory ```sh cat > $LUSTRE_HOME/sleep.sh <<'EOF' #!/bin/bash #SBATCH --output %j_%N.out hostname ; sleep ${1:-30} # first positional argument, default: 30 EOF chmod +x $LUSTRE_HOME/sleep.sh ``` Once you have created the file above, submit it to the Slurm workload management system using the `sbatch` command. In order to simplify monitoring of this job, specify a [job name][5Oqpl] with the command line option `--job-name`. We will use your `$LUSTRE_HOME` as working-directory: [5Oqpl]: ../cluster/environment.html#job-name ```sh # submit a job to sleep for 300 seconds sbatch --job-name sleep --chdir $LUSTRE_HOME -- $LUSTRE_HOME/sleep.sh 300 ``` The system answers with the **JOBID**, if it has been accepted. The job label can be used as an option to the `squeue` command, which prints the state of the scheduling queue: ```sh # list all jobs with a given name squeue --name sleep ``` In the debug queue, the [job-state][lmVPZ] should quickly become `R` for running. Of course it is possible to submit multiple jobs with the same label, but each will be identifiable by its unique JOBID. Go on and submit some more jobs with different sleep times. The `scontrol` command can be used to show details about the runtime configuration of a job. Jobs can be removed from the system using `scancel` [^xnM2o]. [lmVPZ]: ../cluster/scheduler-queue.html#job-state [^xnM2o]: `scancel` Manual Page, SchedMD <https://slurm.schedmd.com/scancel.html> ```sh # show the runtime configuration of the latest sleep job scontrol show job $(squeue -h -o %A -n sleep | tail -n1) ``` Once a job disappears from `squeue`, it has finished. Use `sacct` to check completed jobs and their exit status: ```sh # show recent jobs for your user sacct --starttime now-1hour --format=JobID,JobName,State,ExitCode,Elapsed ``` The output file specified by `#SBATCH --output` (in this example `%j_%N.out`) will be written to the [working directory][gwgLz]. ## First Issue In case of a failure during job execution, it is important to **distinguish between a problem internal to the application and an issue of the job execution**. In case you want to report a problem with the runtime environment on the cluster, follow the guidelines in [report-issues][TI078]. For this example, we will work with a "broken" program called `segfaulter`, which is a variant of the famous "Hello World" program. [TI078]: ../../help/report-issues.html ```c // segfaulter.c int main(void) { char *s = "hello world"; *s = 'H'; } ``` The following commands use `$LUSTRE_HOME/bin` and `$LUSTRE_HOME/src`: ```sh export PATH=$LUSTRE_HOME/bin:$PATH mkdir -p $LUSTRE_HOME/bin $LUSTRE_HOME/src ``` Build an executable binary from this small C program using following commands. Execution of this program yields a segmentation fault [^segflt] with non zero program exit code: [^segflt]: Segmentation Fault, Wikipedia <https://en.wikipedia.org/wiki/Segmentation_fault> ```sh # Compile the program... » gcc -g $LUSTRE_HOME/src/segfaulter.c -o $LUSTRE_HOME/bin/segfaulter # ...and execute it » $LUSTRE_HOME/bin/segfaulter ; echo $? zsh: segmentation fault $LUSTRE_HOME/bin/segfaulter 139 # <-- exit code ``` In the execution environment of a compute cluster, it is important to monitor the runtime requirements of an application as well as the runtime behavior of the application itself. This information will help to determine problems occurring during job execution. Users can **implement environment checks and basic application monitoring within an application wrapper script**. ```sh #!/bin/bash function puts() { echo \[$(date +%Y/%m/%dT%H:%M:%S)\] "$@" } uid=$(id -u) puts JOB CONFIGURATION ---------------------- scontrol show job -d -o $SLURM_JOB_ID puts JOB CONFIGURATION END ------------------ puts JOB ENVIRONMENT ------------------------ env | grep ^SLURM_ puts JOB ENVIRONMENT END -------------------- puts NUMA CONFIGURATION ---------------------- lscpu | grep -e '^Model name' -e '^NUMA node[0-9]' puts NUMA CONFIGURATION END ------------------ ##################################################### ## APPLICATION ##################################################### # Generate defined load on the execution node command="srun -- $@" ##################################################### puts EXEC $command $command & # The process ID of the last spawned child process child=$! sleep 1 # wait for start-up puts PROCESSES ------------------------------- ps -u $USER -o user,pid,cpuid,args -H puts PROCESSES END --------------------------- puts WAIT PID $child wait $child # Exit signal of the child process state=$? puts EXIT $state # Propagate last signal to the system exit $state ``` For the following example, we store the code above in a file called `generic-wrapper` and use it to monitor the `segfaulter` program during execution. The `segfaulter` program is passed as first argument to the `generic-wrapper`, which will execute it using `srun` [^t0ay2]. [^t0ay2]: `srun` Manual Page, SchedMD <https://slurm.schedmd.com/srun.html> ```sh # make the wrapper script executable chmod +x $LUSTRE_HOME/generic-wrapper # use the wrapper script to launch the segfaulter program sbatch --partition debug \ --job-name segfaulter \ --chdir $LUSTRE_HOME \ --no-requeue \ -- $LUSTRE_HOME/generic-wrapper \ $LUSTRE_HOME/bin/segfaulter ``` In the job standard output (cf. [I/O redirection][r0SwZ]) you will find following information: [r0SwZ]: ../cluster/environment.html#io-redirection * The runtime configuration of the job in Slurm including: - Submit node and submit time - Slurm job id with partition and account details - Start time, working directory, output streams - Resource allocation details, including execution node(s) * The Slurm environment variables available during job execution * The hardware NUMA architecture * The application launch command executed * The application process ID, and a process tree during runtime * Exit code of the program The information described above will be instrumental to [report-issues][TI078] to the cluster support group.