Slurm
Slurm frequent commands
Table. List of common commands for Slurm job management
| Command | Description |
|---|---|
sbatch |
Submit script file for execution (man pages) |
scancel |
Signal (kill) jobid for termination (man pages) |
squeue |
View information about jobs from an user (man pages) |
srun |
Allocate resources and execute jobs (man pages) |
sinfo |
View information about nodes and partitions (man pages) |
sacct |
Display accounting data for running/past jobs (man pages) |
scontrol |
View/modify jobs, configurations, ... (man pages) |
Running jobs
Below are the main commands that can be used for job execution:
salloc, to allocate resources. Allows for subsequent execution of an application;srun, to allocate resources and execute an application interactively;sbatch, submit a script for batch (non-interactive) execution;
Batch execution
Batch jobs are, by far, the most common type of jobs in HPC systems. Batch jobs are resource provisions that run applications on compute nodes and do not require supervision or user interaction. Batch jobs are commonly used for applications that run for long periods of time.
Executing an application in batch mode requires construction of a job
script, say slurm-script.sh, which can be latter submitted to Slurm using
the sbatch command:
user@hsn:~$ sbatch slurm-script.sh
Making a Job Script
Although there can be a variety of different scripts users can utilize when running their own jobs, knowing how to draft a job script can be quite handy if you need to debug any errors in your jobs or you need to make substantial changes to a script.
A job script looks something like this:
#!/bin/bash
# Slurm directives
#SBATCH --job-name=myjob # job name (just a simple label)
#SBATCH --time=0-01:00:00 # execution time limit
#SBATCH --ntasks=16 # total number of tasks
#SBATCH --constraint=scratch # creates /scratch/$SLURM_JOB_ID directory
# Prepare environment
# Bash scripting states here - no more #SBATCH directives will be processed
module purge # clean up application environment
module load mymodule # load required application environment
# Run the executable
user@hsn:~$ srun /path/to/my/exec > $SCRATCH_DIR/output.file
The batch script may contain one or more directives beginning with #SBATCH
followed by any of the CLI options documented here.
Once the first non-comment, non-whitespace line has been reached in the
script, no more #SBATCH directives will be processed. See example above.
Some useful directive options are listed in the table below.
Table. Command line options for job submission commands.
| Option | Description |
|---|---|
--job-name=<name> |
Job name |
--account=<accountID> |
Account to be charged for resources used |
--begin=<yyyy-mm-dd> |
Initiate job after specified date/time |
--time=<dd-hh:mm:ss> |
Wall clock duration limit for the job |
--partition=<name> |
Partition/queue in which to run the job |
--nodes <minnodes[-maxnodes]> |
Minimum and maximum nodes required for the job |
--ntasks=<count> |
Number of tasks to be launched |
--ntasks-per-socket=<count> |
Number of tasks to be launched per socket |
--ntasks-per-node=<count> |
Number of tasks to be launched per node |
--distribution=block:block |
Distribute blocks of tasks over nodes:sockets |
--mem=<MB> |
Memory required per node |
--mem-per-cpu=<MB> |
Memory required per CPU allocated |
--exclusive[=user] |
Allocated nodes can not be jobs/users |
--mail-user=<address> |
user email address |
--mail-type=<begin,end,...> |
E-mail notification type |
Interactive execution
Interactive jobs allow a user to interact with applications in real-time within an HPC environment. With interactive jobs, users request time and resources to work on a compute node directly. Users can then compile applicatins, execute scripts, or run other commands directly on a compute node.
You can run an interactive job, by asking Slurm for a certain amount of resources,
and with them, launch a bash shell.
The following command with reserve 4 cores so you can run 4 MPI/OMP tasks
for the maximum time allowed by Slurm's
configuration. Then we run the make command using those 4 cores:
user@hsn:~$ srun --ntasks=4 --pty bash -i
user@cn1:~$ make -j 4
Launching a bash shell like that will land you on a compute node. Note that
MODULEPATH is automatically reconfigured so that the Genoa
software stack becomes readily available.
The following command requests one node, and runs the
make command using up to 96 cores:
user@hsn:~$ srun --nodes=1 --pty bash -i
user@cn1:~$ make -j 96
You can specify the node hostname and secure it exclusively. In the following example I am running a threaded job:
user@hsn:~$ srun --nodelist=cn2 --exclusive --pty bash -i
user@cn1:~$ export OMP_NUM_THREADS=48
user@cn1:~$ ./prog.x
Other options, including those listed in the table above, can be
added to the srun command.
Keeping interactive jobs alive
Interactive jobs die when you disconnect from mc2 either by choice or by
network connection problems. To keep a job alive you can use a terminal
multiplexer like tmux. You should start tmux on the login node before you
start an interactive Slurm session:
user@hsn:~$ tmux
user@hsn:~$ srun --nodes=1 --pty bash -i
user@cn1:~$
In case of a disconnect, simply reconnect mc2 and attach to your tmux session again by typing:
user@hsn:~$ tmux attachpurge
user@cn1:~$
In case you have multiple sessions:
user@hsn:~$ tmux list-session
0: 1 windows (created Mon Feb 10 15:19:54 2025) (attached)
1: 3 windows (created Mon Feb 10 15:21:50 2025) (attached)
user@hsn:~$ tmux attach 1
In the above I attached to sesstion 1. You can get a full list of tmux
commands and short-cuts by pressing Ctrl-B followed by ?. See also
tmux home page at GitHub.
Finding information on the queue: the squeue command
The squeue command is a tool we use to pull up information about the jobs
currently in the Slurm queue. Usually, you wouldn’t need information for
all jobs queued in the system, so you can specify your jobs only with the
--user flag:
user@hsn:~$ squeue --me
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
2396 debug job1 user1 PENDING 0:00 3-00:00:00 2 (Resources)
2431 debug job2 user2 RUNNING 1:13:35 3-00:00:00 1 cn2
2432 debug job3 user3 RUNNING 1:13:35 3-00:00:00 1 cn2
...
The squeue command also provides users with a means to calculate a job’s
estimated start time by adding the --start flag to our command.
user@hsn:~$ squeue --start
JOBID PARTITION NAME USER ST START_TIME ...
2396 debug job1 user1 PD 2037-02-07T11:11:04 ...
Note however that the start time provided can be inaccurate - the calculated time is based on jobs queued or running in the system, onsidering their time limits. If a job with a higher priority is queued after the command is run, your job may be delayed.
The squeue command details a variety of information on job’s status with
STATE and REASON codes. Job state codes describe a job’s current state in
the queue (e.g. pending, completed).
The following table outline a variety of job STATE codes you may encounter
when using the squeue to check on your jobs.
| STATE | CODE | Explanation |
|---|---|---|
COMPLETED |
CD |
Completed successfully. |
COMPLETING |
CG |
Finishing (still active). |
FAILED |
F |
Terminated with a non-zero exit code. |
PENDING |
PD |
Waiting for resource allocation. |
PREEMPTED |
PR |
Terminated due to preemption by other job. |
RUNNING |
R |
Allocated and running. |
SUSPENDED |
S |
Paused with its cores released to other jobs. |
STOPPED |
ST |
Stopped with its cores retained. |
The following tables outline a variety of job REASON codes you may encounter when using squeue to check on your jobs.
| REASON | Explanation |
|---|---|
Priority |
Higher priority jobs are in the queue. |
Dependency |
Waiting for a dependent job to complete. |
Resources |
Currently grabbing resources for running. |
TimeLimit |
The job exhausted its time limit. |
PartitionTimeLimit |
Requested time exceeds partition's allowed limit. |
Reservation |
Waiting advanced reservation to become available. |
InvalidAccount |
Job’s account is invalid. |
InvaldQoS |
Job’s QoS is invalid. |
QOSGrpCpuLimit |
CPUs assigned to job's specified QoS are in use. |
QOSGrpMaxJobsLimit |
Maximum number of jobs limited by QoS |
QOSGrpNodeLimit |
Nodes specified QoS are being useed. |
PartitionCpuLimit |
CPUs specified in partition are in use. |
PartitionMaxJobsLimit |
Maximum number of jobs in partition has been met. |
PartitionNodeLimit |
Nodes assigned to partition are in use. |
AssociationCpuLimit |
CPUs assigned to association are in use. |
AssociationMaxJobsLimit |
Maximum number of jobs in association was met. |
AssociationNodeLimit |
Nodes assigned to association are in use. |
Stopping running/queued jobs: the scancel command
Sometimes you may need to stop a job entirely while it’s running or
when it is still pending in the queue. The best
way to accomplish this is with the scancel command. The scancel command
allows you to cancel jobs you are running on mc2 using the job’s ID:
user@hsn:~$ scancel <job_id>
where <job_id> is the job serial identity number. To cancel multiple jobs,
you can use a space-separated list of job IDs:
user@hsn:~$ scancel <job1_id> <job2_id> ...
Show priority of jobs: the sprio command
sprio is used to view the components of a job's scheduling priority.
sprio is a read-only utility that extracts information from the multi-factor
priority plugin. By default, sprio returns information for all pending jobs.
user@hsn:~$ sprio
JOBID PARTITION PRIORITY SITE
2396 debug 1 0
Analyzing past jobs: the sacct command
The sacct command allows users to pull up status information about past jobs. By default, sacct will only pull up your jobs that ran on the current day:
user@hsn:~$ sacct --allocations
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2419 job1 debug default 16 RUNNING 0:0
2420 job2 debug default 16 RUNNING 0:0
We can use the --starttime flag to tell the command to look for past jobs:
user@hsn:~$ sacct --allocations
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1659 oldjob debug default 16 COMPLETED 0:0
1665 oldjob debug default 16 FAILED 1:0
...
2419 job1 debug default 16 RUNNING 0:0
2420 job2 debug default 16 RUNNING 0:0
We can use the --format flag to choose what we want in our output. The syntax
is --format=fld_1,fld_2, ... ,fld_N, where fld_i are Slurm fields, some
of which are listed in the table below. For example:
user@hsn:~$ sacct --allocations --starttime=2026-02-3 --format=jobID,NNodes,NCPUS,Elapsed,WorkDir
JobID NNodes NCPUS Elapsed WorkDir
------------ -------- ---------- ---------- --------------------
2385 1 0 00:00:00 /home/user/work/+
2386 1 0 00:00:00 /home/user/work/+
...
2421 1 16 08:27:37 /home/user/work/+
2422 1 16 08:26:37 /home/user/work/+
When using the --format option for listing various fields you can put a
%NUMBER afterwards to specify how many characters should be printed:
user@hsn:~$ sacct --allocations --starttime=2026-02-3 --format=jobID,WorkDir%40
JobID WorkDir
------------ ------------------------------------------
2385 /home/user/work/path/to/calculation1
2386 /home/user/work/path/to/calculation2
...
2421 /home/user/work/path/to/calculation3
2422 /home/user/work/path/to/calculation4
You may also limit the info to one specific job:
user@hsn:~$ sacct --allocations --jobs=2421 --format=Elapsed,CPUTime,ExitCode
Elapsed CPUTime ExitCode
---------- ---------- --------
08:37:05 5-17:53:20 0:0
Table. Some fields that can be reported by the sacct command.
| Option | Description |
|---|---|
CPUTime |
Time used by a job or step |
Elapsed |
The job's elapsed time |
End |
Termination time of the job |
ExitCode |
The exit code returned by the job script |
FailedNode |
The name of the node that failed |
JobID |
The identification number of the job |
JobName |
The name of the job or job step |
MaxDiskRead |
Maximum number of I/O bytes readed |
MaxDiskWrite |
Maximum number of I/O bytes written |
NCPUS |
Total number of CPUs allocated |
NNodes |
Number of nodes in a job or step |
NodeList |
List of nodes in job/step |
NTasks |
Total number of tasks in a job/step |
QOS |
Name of Quality of Service |
Start |
Initiation time of the job |
StdOut |
Display the "filename pattern" for stdout |
WorkDir |
The directory used by the job |
View and control jobs: the scontrol command
The scontrol command provides users extended control of their jobs run
through Slurm. This includes actions like suspending a job, holding a job
from running, or pulling extensive status information on jobs.
To suspend a job that is currently running on the system:
user@hsn:~$ scontrol suspend <job_id>
To resume a paused (suspended) job, we use scontrol with the resume
sub-command:
user@hsn:~$ scontrol resume <job_id>
Slurm also provides a utility to hold jobs that are queued in the system. Holding a job will place the job in the lowest priority, effectively “holding” the job from being run. A job can only be held if it’s waiting on the system to be run.
user@hsn:~$ scontrol hold <job_id>
We can then release a held job using the release command:
user@hsn:~$ scontrol release <job_id>
scontrol can also provide information on jobs using the show sub-command.
The information provided by this command is quite extensive and detailed, so
be sure to either clear your terminal window, grep certain information from
the command, or pipe the output to a separate text file:
user@hsn:~$ scontrol show job <job_id>
Administration commands
Usage reports
The sreport command Generate reports from the slurm accounting of
Trackable Resources (TRES) saved to the Slurm Database, slurmdbd.
Report cluster utilization:
user@hsn:~$ sreport cluster utilization Start=now-1weeks End=now
----------------------------------------------------------------
Cluster Utilization 2026-01-28T23:00:00 - 2026-02-04T22:59:59
Usage reported in CPU Minutes
----------------------------------------------------------------
Cluster Allocate Down Planned Idle Planned Reported
--------- -------- -------- -------- -------- -------- ---------
mc2 798177 39322 1011893 74448 1011893 1923840
Report top usage in percent for a specific account:
user@hsn:~$ sreport user topusage start=2021-05-01 -t percent account=default
--------------------------------------------------------------------------------
Top 10 Users 2021-05-01T00:00:00 - 2026-02-03T23:59:59 (150339600 secs)
Usage reported in Percentage of Total
------------------------------------------------------------------------------
Cluster Login Proper Name Account Used Energy
------- --------- --------------- --------------- --------- --------
mc2 user1 User Name 1 default 17.61% 0.00%
mc2 user2 User Name 2 default 9.15% 0.00%
mc2 user3 User Name 3 default 3.91% 0.00%
...
Configuration commands
Print all Slurm configuration details in slurm.conf (including defaults)
root@hsn:~# scontrol show config
Reconfiguration of services after changing configuration files:
root@hsn:~# scontrol reconfigure
Manage state of nodes:
root@hsn:~# scontrol update NodeName=cn1,cn2 State=DRAIN Reason="Maintenance"
root@hsn:~# scontrol update NodeName=cn1,cn2 State=RESUME Reason="Maintenance finished"
Reservation management
Reserve one node (creat reservation) for two users:
root@hsn:~# scontrol create reservation reservationname="my-reservation" starttime=NOW \
duration=UNLIMITED flags=IGNORE_JOBS users="user1,user2" nodes=cn1
or a similar reservation starting on a specified date/time::
root@hsn:~# scontrol create reservation reservationname="my-reservation" starttime=2025-09-24T14:00 \
duration=UNLIMITED flags=IGNORE_JOBS users="user1,user2" nodes=cn1
or for a specified number of nodes:
root@hsn:~# scontrol create reservation ReservationName="dft-meeting" users="user1,user2" \
StartTime=2025-12-30T08:00:00 Duration=04:00:00 Flags=IGNORE_JOBS TRES=node=1
Check existing reservations
root@hsn:~# scontrol show reservations
Job execution on reserved nodes should be issued as:
user@hsn:~$ sbatch --reservation="my-reservation" slurm.sh
A reservation can be deleted with,
root@hsn:~# sudo scontrol delete ReservationName="my-reservation"
Reserve entire system for maintenance
root@hsn:~# scontrol create reservation starttime=NOW \
duration=UNLIMITED user=root flags=maint,ignore_jobs nodes=ALL
Account and user management
The synopsis for the sacctmgr command is:
root@hsn:~# sacctmgr <options> <command>
where notable
--immediate, commits changes without asking for confirmation;--parsable, Pretty (tabular) output format
Table. Commands and options for the sacctmgr command.
Commands for <sacctmgr> |
Description |
|---|---|
reconfigure |
Reconfigures the SlurmDBD |
shutdown |
Shutdown the server |
create <entity> <specs> |
Create an entity |
remove <entity> where <specs> |
Delete entities |
show <entity> [<specs>] |
Display info about entities |
modify <entity> where <specs> set <specs> |
Modify entity |
Table. List of entities for the sacctmgr commabd.
| Entity | |
|---|---|
account |
bank account |
association |
Used to group information for list and show commands |
coordinator |
Usually an account manager |
event |
Events like downed or draining nodes |
job |
Used to modify specific fields of a job |
stats |
Used with list and show commands to view statistics |
tres |
Used with list and show commands to list Trackable RESources |
user |
Login user name |
List runaway (ghost) jobs and fix them. Runaway jobs are jobs that don't exist in the controller but are still considered running or pending in the database.
root@hsn:~# sacctmgr show runaway jobs
Manage Slurm daemons and services:
# slurmctld -Dvvvv # Run Slurm control daemon in the foreground (Head Node)
# slurmdbd -Dvvvv # Run Slurm database daemon in the foreground (Head Node)
# slurmd -Dvvvv # Run Slurm daemon in the foreground (Compute Nodes)
# systemd status slurmctld # Check daemon status
# systemd stop slurmctld # Stop daemon
# systemd restart slurmctld # Restart daemon
# ssh cn1 systemd stop slurmd # Stop slurmd daemon via SSH on a Compute Node
root@hsn:~# journalctl -xeu slurmctld # Print control daemon activity log