Slurm

Slurm frequent commands

Table. List of common commands for Slurm job management

Command Description
sbatch Submit script file for execution (man pages)
scancel Signal (kill) jobid for termination (man pages)
squeue View information about jobs from an user (man pages)
srun Allocate resources and execute jobs (man pages)
sinfo View information about nodes and partitions (man pages)
sacct Display accounting data for running/past jobs (man pages)
scontrol View/modify jobs, configurations, ... (man pages)

Running jobs

Below are the main commands that can be used for job execution:

  • salloc, to allocate resources. Allows for subsequent execution of an application;
  • srun, to allocate resources and execute an application interactively;
  • sbatch, submit a script for batch (non-interactive) execution;

Batch execution

Batch jobs are, by far, the most common type of jobs in HPC systems. Batch jobs are resource provisions that run applications on compute nodes and do not require supervision or user interaction. Batch jobs are commonly used for applications that run for long periods of time.

Executing an application in batch mode requires construction of a job script, say slurm-script.sh, which can be latter submitted to Slurm using the sbatch command:

user@hsn:~$ sbatch slurm-script.sh

Making a Job Script

Although there can be a variety of different scripts users can utilize when running their own jobs, knowing how to draft a job script can be quite handy if you need to debug any errors in your jobs or you need to make substantial changes to a script.

A job script looks something like this:

#!/bin/bash

# Slurm directives
#SBATCH --job-name=myjob      # job name (just a simple label)
#SBATCH --time=0-01:00:00     # execution time limit
#SBATCH --ntasks=16           # total number of tasks
#SBATCH --constraint=scratch  # creates /scratch/$SLURM_JOB_ID directory

# Prepare environment
# Bash scripting states here - no more #SBATCH directives will be processed
module purge                  # clean up application environment
module load mymodule          # load required application environment

# Run the executable
user@hsn:~$ srun /path/to/my/exec > $SCRATCH_DIR/output.file

The batch script may contain one or more directives beginning with #SBATCH followed by any of the CLI options documented here. Once the first non-comment, non-whitespace line has been reached in the script, no more #SBATCH directives will be processed. See example above.

Some useful directive options are listed in the table below.

Table. Command line options for job submission commands.

Option Description
--job-name=<name> Job name
--account=<accountID> Account to be charged for resources used
--begin=<yyyy-mm-dd> Initiate job after specified date/time
--time=<dd-hh:mm:ss> Wall clock duration limit for the job
--partition=<name> Partition/queue in which to run the job
--nodes <minnodes[-maxnodes]> Minimum and maximum nodes required for the job
--ntasks=<count> Number of tasks to be launched
--ntasks-per-socket=<count> Number of tasks to be launched per socket
--ntasks-per-node=<count> Number of tasks to be launched per node
--distribution=block:block Distribute blocks of tasks over nodes:sockets
--mem=<MB> Memory required per node
--mem-per-cpu=<MB> Memory required per CPU allocated
--exclusive[=user] Allocated nodes can not be jobs/users
--mail-user=<address> user email address
--mail-type=<begin,end,...> E-mail notification type

Interactive execution

Interactive jobs allow a user to interact with applications in real-time within an HPC environment. With interactive jobs, users request time and resources to work on a compute node directly. Users can then compile applicatins, execute scripts, or run other commands directly on a compute node.

You can run an interactive job, by asking Slurm for a certain amount of resources, and with them, launch a bash shell.

The following command with reserve 4 cores so you can run 4 MPI/OMP tasks for the maximum time allowed by Slurm's configuration. Then we run the make command using those 4 cores:

user@hsn:~$ srun --ntasks=4 --pty bash -i
user@cn1:~$ make -j 4

Launching a bash shell like that will land you on a compute node. Note that MODULEPATH is automatically reconfigured so that the Genoa software stack becomes readily available.

The following command requests one node, and runs the make command using up to 96 cores:

user@hsn:~$ srun --nodes=1 --pty bash -i
user@cn1:~$ make -j 96 

You can specify the node hostname and secure it exclusively. In the following example I am running a threaded job:

user@hsn:~$ srun --nodelist=cn2 --exclusive --pty bash -i
user@cn1:~$ export OMP_NUM_THREADS=48
user@cn1:~$ ./prog.x

Other options, including those listed in the table above, can be added to the srun command.

Keeping interactive jobs alive

Interactive jobs die when you disconnect from mc2 either by choice or by network connection problems. To keep a job alive you can use a terminal multiplexer like tmux. You should start tmux on the login node before you start an interactive Slurm session:

user@hsn:~$ tmux
user@hsn:~$ srun --nodes=1 --pty bash -i
user@cn1:~$

In case of a disconnect, simply reconnect mc2 and attach to your tmux session again by typing:

user@hsn:~$ tmux attachpurge
user@cn1:~$

In case you have multiple sessions:

user@hsn:~$ tmux list-session
0: 1 windows (created Mon Feb 10 15:19:54 2025) (attached)
1: 3 windows (created Mon Feb 10 15:21:50 2025) (attached)
user@hsn:~$ tmux attach 1

In the above I attached to sesstion 1. You can get a full list of tmux commands and short-cuts by pressing Ctrl-B followed by ?. See also tmux home page at GitHub.

Finding information on the queue: the squeue command

The squeue command is a tool we use to pull up information about the jobs currently in the Slurm queue. Usually, you wouldn’t need information for all jobs queued in the system, so you can specify your jobs only with the --user flag:

user@hsn:~$ squeue --me
JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
  2396     debug     job1    user1  PENDING       0:00 3-00:00:00      2 (Resources)
  2431     debug     job2    user2  RUNNING    1:13:35 3-00:00:00      1 cn2
  2432     debug     job3    user3  RUNNING    1:13:35 3-00:00:00      1 cn2
  ...

The squeue command also provides users with a means to calculate a job’s estimated start time by adding the --start flag to our command.

user@hsn:~$ squeue --start
JOBID PARTITION     NAME     USER ST          START_TIME ...
 2396     debug     job1    user1 PD 2037-02-07T11:11:04 ...

Note however that the start time provided can be inaccurate - the calculated time is based on jobs queued or running in the system, onsidering their time limits. If a job with a higher priority is queued after the command is run, your job may be delayed.

The squeue command details a variety of information on job’s status with STATE and REASON codes. Job state codes describe a job’s current state in the queue (e.g. pending, completed).

The following table outline a variety of job STATE codes you may encounter when using the squeue to check on your jobs.

STATE CODE Explanation
COMPLETED CD Completed successfully.
COMPLETING CG Finishing (still active).
FAILED F Terminated with a non-zero exit code.
PENDING PD Waiting for resource allocation.
PREEMPTED PR Terminated due to preemption by other job.
RUNNING R Allocated and running.
SUSPENDED S Paused with its cores released to other jobs.
STOPPED ST Stopped with its cores retained.

The following tables outline a variety of job REASON codes you may encounter when using squeue to check on your jobs.

REASON Explanation
Priority Higher priority jobs are in the queue.
Dependency Waiting for a dependent job to complete.
Resources Currently grabbing resources for running.
TimeLimit The job exhausted its time limit.
PartitionTimeLimit Requested time exceeds partition's allowed limit.
Reservation Waiting advanced reservation to become available.
InvalidAccount Job’s account is invalid.
InvaldQoS Job’s QoS is invalid.
QOSGrpCpuLimit CPUs assigned to job's specified QoS are in use.
QOSGrpMaxJobsLimit Maximum number of jobs limited by QoS
QOSGrpNodeLimit Nodes specified QoS are being useed.
PartitionCpuLimit CPUs specified in partition are in use.
PartitionMaxJobsLimit Maximum number of jobs in partition has been met.
PartitionNodeLimit Nodes assigned to partition are in use.
AssociationCpuLimit CPUs assigned to association are in use.
AssociationMaxJobsLimit Maximum number of jobs in association was met.
AssociationNodeLimit Nodes assigned to association are in use.

Stopping running/queued jobs: the scancel command

Sometimes you may need to stop a job entirely while it’s running or when it is still pending in the queue. The best way to accomplish this is with the scancel command. The scancel command allows you to cancel jobs you are running on mc2 using the job’s ID:

user@hsn:~$ scancel <job_id>

where <job_id> is the job serial identity number. To cancel multiple jobs, you can use a space-separated list of job IDs:

user@hsn:~$ scancel <job1_id> <job2_id> ...

Show priority of jobs: the sprio command

sprio is used to view the components of a job's scheduling priority. sprio is a read-only utility that extracts information from the multi-factor priority plugin. By default, sprio returns information for all pending jobs.

user@hsn:~$ sprio
JOBID PARTITION   PRIORITY       SITE
 2396 debug              1          0

Analyzing past jobs: the sacct command

The sacct command allows users to pull up status information about past jobs. By default, sacct will only pull up your jobs that ran on the current day:

user@hsn:~$ sacct --allocations
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2419               job1      debug    default         16    RUNNING      0:0
2420               job2      debug    default         16    RUNNING      0:0

We can use the --starttime flag to tell the command to look for past jobs:

user@hsn:~$ sacct --allocations 
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1659             oldjob      debug    default         16  COMPLETED      0:0
1665             oldjob      debug    default         16     FAILED      1:0
...
2419               job1      debug    default         16    RUNNING      0:0
2420               job2      debug    default         16    RUNNING      0:0

We can use the --format flag to choose what we want in our output. The syntax is --format=fld_1,fld_2, ... ,fld_N, where fld_i are Slurm fields, some of which are listed in the table below. For example:

user@hsn:~$ sacct --allocations --starttime=2026-02-3 --format=jobID,NNodes,NCPUS,Elapsed,WorkDir
JobID          NNodes      NCPUS    Elapsed              WorkDir
------------ -------- ---------- ---------- --------------------
2385                1          0   00:00:00 /home/user/work/+
2386                1          0   00:00:00 /home/user/work/+
...
2421                1         16   08:27:37 /home/user/work/+
2422                1         16   08:26:37 /home/user/work/+

When using the --format option for listing various fields you can put a %NUMBER afterwards to specify how many characters should be printed:

user@hsn:~$ sacct --allocations --starttime=2026-02-3 --format=jobID,WorkDir%40
JobID                                           WorkDir
------------ ------------------------------------------
2385               /home/user/work/path/to/calculation1
2386               /home/user/work/path/to/calculation2
...               
2421               /home/user/work/path/to/calculation3
2422               /home/user/work/path/to/calculation4

You may also limit the info to one specific job:

user@hsn:~$ sacct --allocations --jobs=2421 --format=Elapsed,CPUTime,ExitCode
   Elapsed    CPUTime ExitCode
---------- ---------- --------
  08:37:05 5-17:53:20      0:0

Table. Some fields that can be reported by the sacct command.

Option Description
CPUTime Time used by a job or step
Elapsed The job's elapsed time
End Termination time of the job
ExitCode The exit code returned by the job script
FailedNode The name of the node that failed
JobID The identification number of the job
JobName The name of the job or job step
MaxDiskRead Maximum number of I/O bytes readed
MaxDiskWrite Maximum number of I/O bytes written
NCPUS Total number of CPUs allocated
NNodes Number of nodes in a job or step
NodeList List of nodes in job/step
NTasks Total number of tasks in a job/step
QOS Name of Quality of Service
Start Initiation time of the job
StdOut Display the "filename pattern" for stdout
WorkDir The directory used by the job

View and control jobs: the scontrol command

The scontrol command provides users extended control of their jobs run through Slurm. This includes actions like suspending a job, holding a job from running, or pulling extensive status information on jobs.

To suspend a job that is currently running on the system:

user@hsn:~$ scontrol suspend <job_id>

To resume a paused (suspended) job, we use scontrol with the resume sub-command:

user@hsn:~$ scontrol resume <job_id>

Slurm also provides a utility to hold jobs that are queued in the system. Holding a job will place the job in the lowest priority, effectively “holding” the job from being run. A job can only be held if it’s waiting on the system to be run.

user@hsn:~$ scontrol hold <job_id>

We can then release a held job using the release command:

user@hsn:~$ scontrol release <job_id>

scontrol can also provide information on jobs using the show sub-command. The information provided by this command is quite extensive and detailed, so be sure to either clear your terminal window, grep certain information from the command, or pipe the output to a separate text file:

user@hsn:~$ scontrol show job <job_id>

Administration commands

Usage reports

The sreport command Generate reports from the slurm accounting of Trackable Resources (TRES) saved to the Slurm Database, slurmdbd.

Report cluster utilization:

user@hsn:~$ sreport cluster utilization Start=now-1weeks End=now
----------------------------------------------------------------
Cluster Utilization 2026-01-28T23:00:00 - 2026-02-04T22:59:59
Usage reported in CPU Minutes
----------------------------------------------------------------
  Cluster Allocate     Down  Planned     Idle  Planned  Reported
--------- -------- -------- -------- -------- -------- ---------
      mc2   798177    39322  1011893    74448  1011893   1923840

Report top usage in percent for a specific account:

user@hsn:~$ sreport user topusage start=2021-05-01 -t percent account=default
--------------------------------------------------------------------------------
Top 10 Users 2021-05-01T00:00:00 - 2026-02-03T23:59:59 (150339600 secs)
Usage reported in Percentage of Total
------------------------------------------------------------------------------
Cluster     Login     Proper Name         Account      Used   Energy
------- --------- --------------- --------------- --------- --------
    mc2     user1     User Name 1         default    17.61%    0.00%
    mc2     user2     User Name 2         default     9.15%    0.00%
    mc2     user3     User Name 3         default     3.91%    0.00%
...

Configuration commands

Print all Slurm configuration details in slurm.conf (including defaults)

root@hsn:~# scontrol show config

Reconfiguration of services after changing configuration files:

root@hsn:~# scontrol reconfigure

Manage state of nodes:

root@hsn:~# scontrol update NodeName=cn1,cn2 State=DRAIN Reason="Maintenance"
root@hsn:~# scontrol update NodeName=cn1,cn2 State=RESUME Reason="Maintenance finished"

Reservation management

Reserve one node (creat reservation) for two users:

root@hsn:~# scontrol create reservation reservationname="my-reservation" starttime=NOW \
            duration=UNLIMITED flags=IGNORE_JOBS users="user1,user2" nodes=cn1

or a similar reservation starting on a specified date/time::

root@hsn:~# scontrol create reservation reservationname="my-reservation" starttime=2025-09-24T14:00 \
            duration=UNLIMITED flags=IGNORE_JOBS users="user1,user2" nodes=cn1

or for a specified number of nodes:

root@hsn:~# scontrol create reservation ReservationName="dft-meeting" users="user1,user2" \
            StartTime=2025-12-30T08:00:00 Duration=04:00:00 Flags=IGNORE_JOBS TRES=node=1

Check existing reservations

root@hsn:~# scontrol show reservations

Job execution on reserved nodes should be issued as:

user@hsn:~$ sbatch --reservation="my-reservation" slurm.sh

A reservation can be deleted with,

root@hsn:~# sudo scontrol delete ReservationName="my-reservation"

Reserve entire system for maintenance

root@hsn:~# scontrol create reservation starttime=NOW \
            duration=UNLIMITED user=root flags=maint,ignore_jobs nodes=ALL

Account and user management

The synopsis for the sacctmgr command is:

root@hsn:~# sacctmgr <options> <command>

where notable include:

  • --immediate, commits changes without asking for confirmation;
  • --parsable, Pretty (tabular) output format

Table. Commands and options for the sacctmgr command.

Commands for <sacctmgr> Description
reconfigure Reconfigures the SlurmDBD
shutdown Shutdown the server
create <entity> <specs> Create an entity
remove <entity> where <specs> Delete entities
show <entity> [<specs>] Display info about entities
modify <entity> where <specs> set <specs> Modify entity

Table. List of entities for the sacctmgr commabd.

Entity
account bank account
association Used to group information for list and show commands
coordinator Usually an account manager
event Events like downed or draining nodes
job Used to modify specific fields of a job
stats Used with list and show commands to view statistics
tres Used with list and show commands to list Trackable RESources
user Login user name

List runaway (ghost) jobs and fix them. Runaway jobs are jobs that don't exist in the controller but are still considered running or pending in the database.

root@hsn:~# sacctmgr show runaway jobs

Manage Slurm daemons and services:

# slurmctld -Dvvvv               # Run Slurm control daemon in the foreground (Head Node)
# slurmdbd -Dvvvv                # Run Slurm database daemon in the foreground (Head Node)
# slurmd -Dvvvv                  # Run Slurm daemon in the foreground (Compute Nodes)
# systemd status slurmctld       # Check daemon status
# systemd stop slurmctld         # Stop daemon
# systemd restart slurmctld      # Restart daemon
# ssh cn1 systemd stop slurmd    # Stop slurmd daemon via SSH on a Compute Node
root@hsn:~# journalctl -xeu slurmctld      # Print control daemon activity log