site_logo

/rdlab HPC service manual


Introduction

The CS department High Performance Computing system (HPC) is running a queue manager environment that collects all the user requests / jobs. Then, the queue scheduler sorts and prioritizes every user task using several defined criteria (user quota, estimated execution time, RAM, CPU cores…).

When a gap that fits the job is available in the HPC system, the user request is transferred to an execution node and controlled by the queue system. If the user process tries to user more resources (RAM, time…) the queue system will kill the job to ensure system stability.

The /rdlab HPC system is currently using:

Warning: In a common server/laptop/computer the user just executes whatever he wants directly. In an HPC environment, all user requests/processes/jobs must be queued and controlled through the queue system.

Our HPC system groups all the execution nodes through several specific queues based on time criteria or specific hardware.
The user available queues are:

Queue name (partition) Purpose and limits
short execution time < 1 day
medium execution time < 1 week
long execution time unlimited
gpu execution nodes with GPU

Warning: Other queues are reserved for internal or specific services and are only available to HPC administrators.


Access and configuration

Our HPC environment is only accesible though the Secure SHell (ssh) protocol. For security reasons you have different access paths:

  • Inside CS Department network:
    • ssh <username>@logincluster.cs.upc.edu

  • Worldwide access:
    1. ssh <username>@login1.cs.upc.edu
    2. ssh <username>@logincluster.cs.upc.edu

logincluster.cs.upc.edu is “just” a bridge between the user and the HPC system is not an execution node, so you shouldn’t run any program in this system because is the less powerful server in the HPC environment. On the other hand, the system will kill all user processes in logincluster.cs.upc.edu after about 60 minutes of execution time.

Warning: In order to access to the SLURM commands directly you should update your PATH variable, adding the route “/usr/local/slurm/bin” to it.

You should be familiar on Linux environments. You can find lots of easy tutorials and Howtos for Linux beginners.
Ex: https://maker.pro/linux/tutorial/basic-linux-commands-for-beginners


Uploading/Downloading data to the HPC environment

You can upload and download files to/from the HPC environment using the SecureCopy (scp) or the Secure File Transfer Protocol (sftp) services.

  • Inside CS Department network you can connect directly:
    • Ex: from your computer to the HPC system:
      scp [filename] <username>@logincluster.cs.upc.edu:~/

    • Ex: from HPC system to your computer:
      scp [filename] <username>@<your_ip_address>:~/

  • World Wide access: You only can connect from HPC system to your computer:
    • Ex: from HPC system to your computer:
      scp [filename] <username>@<your_ip_address>:~/

    • Ex: to HPC system from your computer (your computer must be accessible from the Internet):
      scp <username>@<your_ip_address>:<file_path> .

Warning: You can create a .tar file if multiple directorys or files must be transfered between the systems. https://alvinalexander.com/unix/edu/examples/tar.shtml
https://www.tecmint.com/18-tar-command-examples-in-linux/


My first HPC job (Hello world!)

In the next example we will create a C program and we will run it using the HPC system. For educational purposes this user job will use 1 CPU core, 1024Mbytes RAM on the short queue.

  1. Create a Shell script file named “helloworld.c” and copy:
    
    #include 
    int main() {
        printf("Hello, world! \n");
        sleep(60);
        printf("Finishing after 60 seconds waiting \n");
        return 0;
    }
                                        
  2. Create a Shell script file named "helloworld.sh" and copy:
    
    #!/bin/bash -l
    #
    #SBATCH -J my-hello-world
    #SBATCH -o my-hello-world.”%j".out
    #SBATCH -e my-hello-world.”%j".err
    #
    #SBATCH --mail-user $USER@cs.upc.edu
    #SBATCH --mail-type=ALL
    #
    #SBATCH --mem=1024M
    #SBATCH -c 1
    #SBATCH -p short
    
    # firstly we will compile the .c program
    gcc -o helloworld.exe helloworld.c
    
    # Secondly we will run the program
    ./helloworld.exe 
                                        
  3. Send your job to the HPC queue system:
    sbatch ./helloworld.sh

  4. Check your queue status:
    squeue -u <your_username>

  5. Wait until the queue system finds a free execution node that fits your job needs. You can monitor “in real time” your job output through the output file (my-hello-world.out) and the error file (my-hello-world.err)
    tail -f ./*.out ./*.err

Warning: If the execution nodes are “full” of user jobs or your executions time last much time, you can leave the system and you will be notified by the HPC system via email when your job execution is completed.


My first interactive job

If you need to compile or execute any program in interactive mode (never run programs in logincluster) you can ask for a interactive shell enviroment that fits your needs in a execution node.

  • Request a shell with 3 CPU cores, 1024MBytes RAM and less than 24h (short queue):
    srun -p short --mem=1024M -c 3 --pty bash

  • Request a shell with 6 CPU cores, 16GBytes RAM and less than 24h (short queue):
    srun -p short --mem=16G -c 6 --pty bash

Warning: This interactive shell will be only available if there are any free execution nodes which can fulfill your request. Otherwise you will receive and error because no interactive shell can be provided at this time because all the execution resources are currently used. Besides, you should also have enough quota available, or you will not be able to execute your interactive job.


SGE to Slurm Commands table

Bellow you can find the SLURM equivalent to SGE typical commands:

User Commands SGE SLURM
Interactive login qlogin srun --pty <shellname>
Job submission qsub [script_file] sbatch [script_file]
Job deletion qdel [job_id] scancel [job_id]
Job status by job qstat -u \* [-j job_id] squeue [job_id]
Job status by user qstat [-u user_name] squeue -u [user_name]
Job hold qhold [job_id] scontrol hold [job_id]
Job release qrls [job_id] scontrol release [job_id]
Queue list qconf -sql squeue
List nodes qhost sinfo -N OR scontrol show nodes
Cluster status qhost -q sinfo
GUI qmon sview

Bellow you can find the SLURM equivalent to SGE job parameters:

Job Specification SGE SLURM
Script directive #$ #SBATCH
queue -q [queue] -p [queue]
count of nodes N/A -N [min[-max]]
CPU count -pe [PE] [count] -c [count]
Wall clock limit -l h_rt=[seconds] -t [min] OR -t [days-hh:mm:ss]
Standard out file -o [file_name] -o [file_name]
Standard error file -e [file_name] -e [file_name]
Combine STDOUT & STDERR files -j yes (use -o without -e)
Copy environment -V --export=[ALL | NONE | variables]
Event notification -m abe --mail-type=[events]
send notification email -M [address] --mail-user=[address]
Job name -N [name] --job-name=[name]
Restart job -r [yes|no] --requeue OR --no-requeue (NOTE: configurable default)
Set working directory -wd [directory] --workdir=[dir_name]
Resource sharing -l exclusive --exclusive OR --shared
Memory size -l mem_free=[memory][K|M|G] --mem=[mem][M|G|T] OR
--mem-per-cpu=[mem][M|G|T]
Charge to an account -A [account] --account=[account]
Tasks per node (Fixed allocation_rule in PE) --tasks-per-node=[count]
--cpus-per-task=[count]
Job dependancy -hold_jid [job_id | job_name] --depend=[state:job_id]
Job project -P [name] --wckey=[name]
Job host preference -q [queue]@[node] OR -q
[queue]@@[hostgroup]
--nodelist=[nodes] AND/OR --exclude=[nodes]
Quality of service --qos=[name]
Job arrays -t [array_spec] --array=[array_spec] (Slurm version 2.6+)
Generic Resources -l [resource]=[value] --gres=[resource_spec]
Begin Time -a [YYMMDDhhmm] --begin=YYYY-MM-DD[THH:MM[:SS]]

Examples

Some examples on the commands and parameters above:

SGE SLURM
qstat squeue
qstat -u username squeue -u username
qstat -f squeue -al
qsub sbatch
qsub -N jobname sbatch -J jobname
qsub -l h_rt=24:00:00 sbatch -t 24:00:00
qsub -pe make 8 sbatch -c 8
qsub -l mem=4G sbatch --mem=4000
qsub -o filename sbatch -o filename
qsub -e filename sbatch -e filename
qsub -q gpu sbatch -p gpu --gres=gpu:n
qlogin srun --pty bash
qdel scancel

Running GPU jobs

In order to run regular jobs on gpus, you must request the specific gpu partition, the number of gpus and optionally the type of gpu.

  • List available gpu nodes with gpu names
    sinfo -p gpu -o "%20N %10c %10m %25G %15E"

  • Sending a batch job
    sbatch -p gpu --gres=gpu[:GpuName]:NumOfGpus my_job.sh


Examples

  • Batch job that uses just one gpu
    sbatch -p gpu --gres=gpu:1 my_job.sh

  • Batch job that uses one specific gpu
    sbatch -p gpu --gres=gpu:k40c:1 my_job.sh


Accounting

Slurm lets you know some useful information about your queued, running and finished jobs

  • Information about a queued or running job
    scontrol show jobid

  • Customizable information about queued/running/finished jobs
    sacct -j jobid

  • You can customize the output of the sacct command with the use of the "--format" option. A comprehensive list of fields can be found at https://slurm.schedmd.com/sacct.html

  • A typical sacct query with execution time (elapsed) and maximum memory usage (MaxRss)
    sacct -j jobid --format=User,JobID,Jobname%15,partition,state%15,time,start,end,elapsed,MaxRss

  • seff is an alternative command to sacct that shows the most commonly requested information in a more readable way. Notice the Memory Efficiency field that lets you know how much memory your job used / requested memory. An interesting information in order to optimize your memory quota
    seff jobid

Debugging basics

  • Why do my jobs don't start running?
    Your resources are limited by number of cpu cores and memory quota. The cluster itself is used by several users so at a any given time there might not be enought resources to cover the needs of your jobs. The squeue command NODELIST(REASON) field shows the node(s) where a job is running or the reason why it is not beeing running if queued

  • Why did my job die abruptly?
    Sometimes your job ends sudently or sooner than expected. Slurm controls the execution time and the memory consumed by jobs. A job can't run for more time that the maximum time allowed by the queue (24 hours for short and 7 days for medium). In addition a job can't sue more memory than requested
    sacct and seff commands report the ending state of every finished job. When a job is killed the Status filed shows the reason: OUT_OF_MEMORY for exceeding the memory reserved and TIMEOUT for exceeding the time limit of the queue