site_logo

SLURM manual


Introduction

The CS department HPC is currently running a new environment based on the SLURM queue manager that has replaced the existing SGE on March 2019. This manual is intended to be the entry point to help users switch to the new environment, which is based on:


Access and configuration

The entry point is the server named logincluster.cs.upc.edu. It can only be accessed through the CS department wired network. Once inside, you will have your home and software available, both local and shared through /home/soft.

To access to the SLURM commands you should update your PATH variable, adding the route "/usr/local/slurm/bin" to it.


Queues

The same queues (called partitions in SLURM) that SGE has available are available for SLURM:

PARTITION        AVAIL  TIMELIMIT  NODES  STATE NODELIST

short*              up 1-00:00:00      4   idle node[208,316-317,408]

medium              up 7-00:00:00      2   idle node[110,318]

long                up   infinite      2   idle node[210,315]

gpu                 up   infinite      1   idle node800

backup-mysql        up   infinite      4   idle backup[1-4]

backup-nextcloud    up   infinite      4   idle backup[1-4]

backup-cluster      up   infinite      4   idle backup[1-4]

textserver          up   infinite      2   idle node[211,409]
                            

All queues but backup*, which are for RDLab's internal usage, are configured with the same privileges and quota as SGE.


Commands

Bellow you can find the SLURM equivalent to SGE typical commands:

User Commands SGE SLURM
Interactive login qlogin srun --pty <shellname>
Job submission qsub [script_file] sbatch [script_file]
Job deletion qdel [job_id] scancel [job_id]
Job status by job qstat -u \* [-j job_id] squeue [job_id]
Job status by user qstat [-u user_name] squeue -u [user_name]
Job hold qhold [job_id] scontrol hold [job_id]
Job release qrls [job_id] scontrol release [job_id]
Queue list qconf -sql squeue
List nodes qhost sinfo -N OR scontrol show nodes
Cluster status qhost -q sinfo
GUI qmon sview

Bellow you can find the SLURM equivalent to SGE job parameters:

Job Specification SGE SLURM
Script directive #$ #SBATCH
queue -q [queue] -p [queue]
count of nodes N/A -N [min[-max]]
CPU count -pe [PE] [count] -n [count]
Wall clock limit -l h_rt=[seconds] -t [min] OR -t [days-hh:mm:ss]
Standard out file -o [file_name] -o [file_name]
Standard error file -e [file_name] -e [file_name]
Combine STDOUT & STDERR files -j yes (use -o without -e)
Copy environment -V --export=[ALL | NONE | variables]
Event notification -m abe --mail-type=[events]
send notification email -M [address] --mail-user=[address]
Job name -N [name] --job-name=[name]
Restart job -r [yes|no] --requeue OR --no-requeue (NOTE: configurable default)
Set working directory -wd [directory] --workdir=[dir_name]
Resource sharing -l exclusive --exclusive OR --shared
Memory size -l mem_free=[memory][K|M|G] --mem=[mem][M|G|T] OR
--mem-per-cpu=[mem][M|G|T]
Charge to an account -A [account] --account=[account]
Tasks per node (Fixed allocation_rule in PE) --tasks-per-node=[count]
--cpus-per-task=[count]
Job dependancy -hold_jid [job_id | job_name] --depend=[state:job_id]
Job project -P [name] --wckey=[name]
Job host preference -q [queue]@[node] OR -q
[queue]@@[hostgroup]
--nodelist=[nodes] AND/OR --exclude=[nodes]
Quality of service --qos=[name]
Job arrays -t [array_spec] --array=[array_spec] (Slurm version 2.6+)
Generic Resources -l [resource]=[value] --gres=[resource_spec]
Begin Time -a [YYMMDDhhmm] --begin=YYYY-MM-DD[THH:MM[:SS]]

Examples

Some examples on the commands and parameters above:

SGE SLURM
qstat squeue
qstat -u username squeue -u username
qstat -f squeue -al
qsub sbatch
qsub -N jobname sbatch -J jobname
qsub -l h_rt=24:00:00 sbatch -t 24:00:00
qsub -pe make 8 sbatch -n 8
qsub -l mem=4G sbatch --mem=4000
qsub -o filename sbatch -o filename
qsub -e filename sbatch -e filename
qsub -q gpu sbatch -p gpu --gres=gpu:n (*)
qlogin srun --pty bash
qdel scancel

Job Template

Below, the equivalent on a template job:

#!/bin/bash
#
#
#$ -N test
#$ -j y
#$ -o test.output
#$ -cwd
#$ -M $USER@cs.upc.edu
#$ -m bea
# Request 5 hours run time
#$ -l h_rt=5:0:0
#
#$ -l mem=4G
#$ -q short 
 
<call your app here>


                                        
#!/bin/bash -l
# NOTE the -l flag!
#
#SBATCH -J test
#SBATCH -o test."%j".out
#SBATCH -e test."%j".err
# Default in slurm
#SBATCH --mail-user $USER@cs.upc.edu
#SBATCH --mail-type=ALL
# Request 5 hours run time
#SBATCH -t 5:0:0

#SBATCH --mem=4000
#SBATCH -p short
 
<call your app here>