/rdlab HPC service manual
Introduction
The CS department High Performance Computing system (HPC) is running a queue manager environment that collects all the user requests / jobs. Then, the queue scheduler sorts and prioritizes every user task using several defined criteria (user quota, estimated execution time, RAM, CPU cores…).
When a gap that fits the job is available in the HPC system, the user request is transferred to an execution node and controlled by the queue system. If the user process tries to user more resources (RAM, time…) the queue system will kill the job to ensure system stability.
The /rdlab HPC system is currently using:
Warning: In a common server/laptop/computer the user just executes whatever he wants directly. In an HPC environment, all user requests/processes/jobs must be queued and controlled through the queue system.
Our HPC system groups all the execution nodes through several specific queues based on time criteria or specific hardware.
The user available queues are:
Queue name (partition) | Purpose and limits |
---|---|
short | execution time < 1 day |
medium | execution time < 1 week |
long | execution time unlimited |
gpu | execution nodes with GPU |
Warning: Other queues are reserved for internal or specific services and are only available to HPC administrators.
Access and configuration
Our HPC environment is only accesible though the Secure SHell (ssh) protocol. For security reasons you have different access paths:
- Inside CS Department network:
- ssh <username>@logincluster.cs.upc.edu
- Worldwide access:
- ssh <username>@login1.cs.upc.edu
- ssh <username>@logincluster.cs.upc.edu
logincluster.cs.upc.edu is “just” a bridge between the user and the HPC system is not an execution node, so you shouldn’t run any program in this system because is the less powerful server in the HPC environment. On the other hand, the system will kill all user processes in logincluster.cs.upc.edu after about 60 minutes of execution time.
Warning: In order to access to the SLURM commands directly you should update your PATH variable, adding the route “/usr/local/slurm/bin” to it.
You should be familiar on Linux environments. You can find lots of easy tutorials and Howtos for Linux beginners.
Ex: https://maker.pro/linux/tutorial/basic-linux-commands-for-beginners
Uploading/Downloading data to the HPC environment
You can upload and download files to/from the HPC environment using the SecureCopy (scp) or the Secure File Transfer Protocol (sftp) services.
- Inside CS Department network you can connect directly:
- Ex: from your computer to the HPC system:
scp [filename] <username>@logincluster.cs.upc.edu:~/ - Ex: from HPC system to your computer:
scp [filename] <username>@<your_ip_address>:~/
- Ex: from your computer to the HPC system:
- World Wide access: You only can connect from HPC system to your computer:
- Ex: from HPC system to your computer:
scp [filename] <username>@<your_ip_address>:~/ - Ex: to HPC system from your computer (your computer must be accessible from the Internet):
scp <username>@<your_ip_address>:<file_path> .
- Ex: from HPC system to your computer:
Warning:
You can create a .tar file if multiple directorys or files must be transfered between the systems.
https://alvinalexander.com/unix/edu/examples/tar.shtml
https://www.tecmint.com/18-tar-command-examples-in-linux/
My first HPC job (Hello world!)
In the next example we will create a C program and we will run it using the HPC system. For educational purposes this user job will use 1 CPU core, 1024Mbytes RAM on the short queue.
- Create a Shell script file named “helloworld.c” and copy:
#include
int main() { printf("Hello, world! \n"); sleep(60); printf("Finishing after 60 seconds waiting \n"); return 0; } - Create a Shell script file named "helloworld.sh" and copy:
#!/bin/bash -l # #SBATCH -J my-hello-world #SBATCH -o my-hello-world.”%j".out #SBATCH -e my-hello-world.”%j".err # #SBATCH --mail-user $USER@cs.upc.edu #SBATCH --mail-type=ALL # #SBATCH --mem=1024M #SBATCH -c 1 #SBATCH -p short # firstly we will compile the .c program gcc -o helloworld.exe helloworld.c # Secondly we will run the program ./helloworld.exe
- Send your job to the HPC queue system:
sbatch ./helloworld.sh - Check your queue status:
squeue -u <your_username> - Wait until the queue system finds a free execution node that fits your job needs.
You can monitor “in real time” your job output through the output file (my-hello-world.out) and the error file (my-hello-world.err)
tail -f ./*.out ./*.err
Warning: If the execution nodes are “full” of user jobs or your executions time last much time, you can leave the system and you will be notified by the HPC system via email when your job execution is completed.
My first interactive job
If you need to compile or execute any program in interactive mode (never run programs in logincluster) you can ask for a interactive shell enviroment that fits your needs in a execution node.
- Request a shell with 3 CPU cores, 1024MBytes RAM and less than 24h (short queue):
srun -p short --mem=1024M -c 3 --pty bash - Request a shell with 6 CPU cores, 16GBytes RAM and less than 24h (short queue):
srun -p short --mem=16G -c 6 --pty bash
Warning: This interactive shell will be only available if there are any free execution nodes which can fulfill your request. Otherwise you will receive and error because no interactive shell can be provided at this time because all the execution resources are currently used. Besides, you should also have enough quota available, or you will not be able to execute your interactive job.
SGE to Slurm Commands table
Bellow you can find the SLURM equivalent to SGE typical commands:
User Commands | SGE | SLURM |
---|---|---|
Interactive login | qlogin |
srun --pty <shellname> |
Job submission | qsub [script_file] |
sbatch [script_file] |
Job deletion | qdel [job_id] |
scancel [job_id] |
Job status by job | qstat -u \* [-j job_id] |
squeue [job_id] |
Job status by user | qstat [-u user_name] |
squeue -u [user_name] |
Job hold | qhold [job_id] |
scontrol hold [job_id] |
Job release | qrls [job_id] |
scontrol release [job_id] |
Queue list | qconf -sql |
squeue |
List nodes | qhost |
sinfo -N OR scontrol show nodes |
Cluster status | qhost -q |
sinfo |
GUI | qmon |
sview |
Bellow you can find the SLURM equivalent to SGE job parameters:
Job Specification | SGE | SLURM |
---|---|---|
Script directive | #$ |
#SBATCH |
queue | -q [queue] |
-p [queue] |
count of nodes | N/A |
-N [min[-max]] |
CPU count | -pe [PE] [count] |
-c [count] |
Wall clock limit | -l h_rt=[seconds] |
-t [min] OR -t [days-hh:mm:ss] |
Standard out file | -o [file_name] |
-o [file_name] |
Standard error file | -e [file_name] |
-e [file_name] |
Combine STDOUT & STDERR files | -j yes |
(use -o without -e) |
Copy environment | -V |
--export=[ALL | NONE | variables] |
Event notification | -m abe |
--mail-type=[events] |
send notification email | -M [address] |
--mail-user=[address] |
Job name | -N [name] |
--job-name=[name] |
Restart job | -r [yes|no] |
--requeue OR --no-requeue (NOTE:
configurable default) |
Set working directory | -wd [directory] |
--workdir=[dir_name] |
Resource sharing | -l exclusive |
--exclusive OR --shared |
Memory size | -l mem_free=[memory][K|M|G] |
--mem=[mem][M|G|T] OR |
Charge to an account | -A [account] |
--account=[account] |
Tasks per node | (Fixed allocation_rule in PE) |
--tasks-per-node=[count] |
--cpus-per-task=[count] |
||
Job dependancy | -hold_jid [job_id | job_name] |
--depend=[state:job_id] |
Job project | -P [name] |
--wckey=[name] |
Job host preference | -q [queue]@[node] OR -q |
--nodelist=[nodes] AND/OR --exclude=[nodes] |
Quality of service | --qos=[name] |
|
Job arrays | -t [array_spec] |
--array=[array_spec] (Slurm version 2.6+) |
Generic Resources | -l [resource]=[value] |
--gres=[resource_spec] |
Begin Time | -a [YYMMDDhhmm] |
--begin=YYYY-MM-DD[THH:MM[:SS]] |
Examples
Some examples on the commands and parameters above:
SGE | SLURM |
---|---|
qstat | squeue |
qstat -u username |
squeue -u username |
qstat -f |
squeue -al |
qsub | sbatch |
qsub -N jobname |
sbatch -J jobname |
qsub -l h_rt=24:00:00 |
sbatch -t 24:00:00 |
qsub -pe make 8 |
sbatch -c 8 |
qsub -l mem=4G |
sbatch --mem=4000 |
qsub -o filename |
sbatch -o filename |
qsub -e filename |
sbatch -e filename |
qsub -q gpu |
sbatch -p gpu --gres=gpu:n |
qlogin | srun --pty bash |
qdel | scancel |
Running GPU jobs
In order to run regular jobs on gpus, you must request the specific gpu partition, the number of gpus and optionally the type of gpu.
- List available gpu nodes with gpu names
sinfo -p gpu -o "%20N %10c %10m %25G %15E" - Sending a batch job
sbatch -p gpu --gres=gpu[:GpuName]:NumOfGpus my_job.sh
Examples
- Batch job that uses just one gpu
sbatch -p gpu --gres=gpu:1 my_job.sh - Batch job that uses one specific gpu
sbatch -p gpu --gres=gpu:k40c:1 my_job.sh
Accounting
Slurm lets you know some useful information about your queued, running and finished jobs
- Information about a queued or running job
scontrol show jobid - Customizable information about queued/running/finished jobs
sacct -j jobid - You can customize the output of the sacct command with the use of the "--format" option. A comprehensive list of fields can be found at https://slurm.schedmd.com/sacct.html
- A typical sacct query with execution time (elapsed) and maximum memory usage (MaxRss)
sacct -j jobid --format=User,JobID,Jobname%15,partition,state%15,time,start,end,elapsed,MaxRss - seff is an alternative command to sacct that shows the most commonly requested information in a more readable way. Notice the Memory Efficiency field that lets you know how much memory your job used / requested memory. An interesting information in order to optimize your memory quota
seff jobid
Debugging basics
- Why do my jobs don't start running?
Your resources are limited by number of cpu cores and memory quota. The cluster itself is used by several users so at a any given time there might not be enought resources to cover the needs of your jobs. The squeue command NODELIST(REASON) field shows the node(s) where a job is running or the reason why it is not beeing running if queued
- Why did my job die abruptly?
Sometimes your job ends sudently or sooner than expected. Slurm controls the execution time and the memory consumed by jobs. A job can't run for more time that the maximum time allowed by the queue (24 hours for short and 7 days for medium). In addition a job can't sue more memory than requested
sacct and seff commands report the ending state of every finished job. When a job is killed the Status filed shows the reason: OUT_OF_MEMORY for exceeding the memory reserved and TIMEOUT for exceeding the time limit of the queue