Cluster Quickstart

RDlab@lsi.upc.edu
February 2015

english Welcome to LSI.

The aim of this document is to be used as an introduction to the use of the Computing Cluster System from the Llenguatges and Sistemes Informàtics Department (LSI) at the Universitat Politècnica de Catalunya (UPC), managed by the Research and Development Lab (RDlab).

Introduction

Clustering, in a computing context, concerns to a group of hardware and software in a set of computers which, connected through high speed networks, work together in order to solve a problem.

LSI's Computing Cluster System at Universitat Politècnica de Catalunya is introduced as a powerful computing tool due to its high number of processors, memory and its large disk space. Execution nodes are grouped in queues, which collect user's submitted jobs and manage their assignation.

Indications

It is necessary to bear in mind some default parameters when submitting jobs to cluster's queues.

The cluster is divided in 4 execution queues. Each queue is defined by how much execution time can it's processes consume:

Short Queue for jobs longing one day (24 hours) or less.
Default queue if no other stated.
Medium Queue for jobs longing one week maximum.
Long Queue with unlimited job's execution time.

If the job duration is not specified, the job will be sent to the Short queue. If a job exceeds the maximum time a queue allows, it will be automatically terminated.

IMPORTANT: Users are advised to specify the duration of the job (qsub -l h_rt=hours:minutes:seconds) in order to avoid the sistem killing the job prematurely.

There is also a queue named Test available, which may be used for immediate trials. This queue is made by the less powerful nodes and may only be used to check the jobs' correctness, not for executions.

The maximum memory for a job is 2 Gigabytes by default. This value can be modified by the user.

The maximum number of processes that can be executed simultaneously is determined by the amount of slots and memory, values which vary according to the kind of user. May you need more information, please contact the RDlab

The default configuration uses the core binding property, which attaches (binds) each job to a processor core. This guarantees that a job will never be migrated to another core or processor, avoiding the context switch cost. In case of a parallel job, core binding will attach the job to as many processors as reserved, with the same guarantees.

Finally, the cluster's configuration defines that each job maps to one processor core exactly, which guarantees exclusivity in the use of the processor by the job.

Connection to the Computing Cluster System in text mode

To connect in text mode (terminal or Command Line Interface) in UNIX systems we will need a Secure Shell (SSH) client.
From the terminal, type the next command where <username> refers to the LSI department username.

ssh <username>@logincluster.lsi.upc.edu

The system will ask us for our password and, once introduced, we will gain access to the system.

In Windows environments, we should have a SSH client, like putty or others, available:
http://en.wikipedia.org/wiki/Comparison_of_SSH_clients

Connection to the Computing Cluster System in graphical mode

In case of willing to use a graphical environment in a UNIX system and wanting to redirect it, it is necessary to use the -X flag:

ssh -X <username>@logincluster.lsi.upc.edu

Or by executing the next command in our computer:

xhost +logincluster.lsi.upc.edu

And then, with an open connection with the cluster, executing:

setenv DISPLAY <our_ip>

In Windows environments, if we want graphic support, it is necessary to redirect de X server using a program like Xwin32 or similar.

User environment configuration (path)

Certain Open Grid - the queue manager - applications are architecture dependant and require system path redefinition. To do so, it is necessary to modify the .tcshrc file located in our home modifying the PATH variable:

set ARCH=`/usr/local/sge/util/arch`
set path=( /usr/local/sge/bin/${ARCH} $path)

It is important to take into account that if we have got any explicit reference to a concrete type of executable binary file, we will have to delete that reference from the path value:

setenv PATH /usr/local/sge/bin/lx24-x86

To apply the changes, it is necessary to close session and log-in back into the system.

Submitting a batch job

You can submit to the grid engine system all shell scripts that you can run from your command prompt by hand. Such shell scripts must not require a terminal connection, and the scripts must not need interactive user intervention. We are going to use the next script, which waits idle during 20 seconds, as an example:

simple.sh

You can find the following job in the file /sge-root/examples/jobs/simple.sh.
#!/bin/sh
#
#
# (c) 2004 Sun Microsystems, Inc. Use is subject to license terms.
# This is a simple example of a SGE batch script
# request Bourne shell as shell for job
#$ -S /bin/sh
#
# print date and time
date
# Sleep for 20 seconds
sleep 20
# print date and time again
date

To be able to execute the script, it is necessary to set the execution permission (755 or just +x permission)

chmod +x simple.sh

Submitting a job: qsub

We send the job to a queue by using the next command:

qsub simple.sh

If the job has been correctly sent, we will see the next on screen:

your job 1 (“simple.sh”)
has been submitted

Some important flags

The qsub statement, and also the script body, allows the user to specify flags (properties) to be applied by the time a job is executed. In case of wanting them specified when calling qsub, they should be added as parameters. For example:

qsub -m bea -o output.txt simple.sh

Otherwise, if we want them permanently, it should be done adding them to the script body of the job as shown:

#$ -m bea
#$ -o output.txt

Some of the most important flags are:

-e: Specifies where to place the error output file.

qsub -e error.txt simple.sh

-l: It allows the user to set special requirements for a job (e.g. execution time, memory, etc.)
To specify a different execution time than the default value:

qsub -l h_rt=hours:minutes:seconds simple.sh

If we send a job which's execution time will take less than the default time limit, specifying it at the time of the call would increase the job's priority over other jobs with the default value; in other words, it may be executed before other awaiting jobs with the default value.

To specify a memory amount different from the default value (currently set to 4Gb.):

qsub -l h_vmem=1G simple.sh

NOTE: If we now the maximum quantity of memory that your job is going to consume, it is recommended to specify it at the time of the call in order to speed up the job assignation to the nodes.

-m : It allows the user to specify when to receive mail: 'n'(none), 'a' (aborted), 'b' (begin), 'e' (end) , 's' (suspended).

qsub -m bea simple.sh

In order to use this flag it is necessary, through the -M flag, to provide the address where we want to receive the e-mail:

qsub -M <email-address>

-o : Specifies where to leave the standard output file.

qsub -o output.txt simple.sh

-q : Indicates user's queue.
Send "simple.sh" job to the short queue:

qsub -q short simple.sh

It is also possible to send a job to one or many concrete nodes.

qsub -q short@node112,short@node113 simple.sh

It specifies that we want the job to be executed at the short queue on the selected nodes: node112 or node113.

-S:Specifies which shell must be used on execution.

qsub -S /bin/sh  simple.sh

For further information:

man qsub

Querying queued job's state: qstat

After sending a job to any queue of nodes, it will not be immediately executed. The scheduler analizes the system status to execute the job in the best conditions. We can consult its state by using the next command:

qstat

The program shows us this information:

job-ID prior name user state submit/start at queue slots ja-task-ID
------------------------------------- ------------------------------------------------------
000001 0.55000 simple.sh gabriel qw 10/14/2010 11:16:56 short@node112 1

We can see only our jobs with the -u flag and typing our username:

qstat -u <username>

We can get extra information from a job with the -j flag and indicating the job number:

qstat -j <#job>

We will get a similar output:

qstat -j 000001
==============================================================
job_number:000001
exec_file:job_scripts/000001
submission_time:Fri Oct 15 13:42:05 2010
owner:gabriel
uid:64
group:staff
gid:50
sge_o_home:/home/usuarios/gabriel
sge_o_log_name:gabriel
sge_o_path:/usr/local/sge/bin/lx24-amd64:/usr/local/bin:/usr/bin:/bin
sge_o_shell:/bin/tcsh
sge_o_workdir:/home/usuarios/gabriel/prueba
sge_o_host:logincluster2
account:sge
hard resource_list:h_rt=604800,h_vmem=4G
mail_list:gabriel@logincluster2
notify:FALSE
job_name:simple.sh
jobshare:0
shell_list:NONE:/bin/sh
env_list:
script_file:simple.sh
usage 1:cpu=00:00:00, mem=0.00000 GBs, io=0.00000, vmem=N/A, maxvmem=N/A
scheduling info:queue instance "short@node112" dropped because it is disabled
queue instance "short@node113" dropped because it is disabled
queue instance "short@node114" dropped because it is disabled

Moreover, if the job is not being executed and remains awaiting, in addition it reports the reason why it is still queued. If qstat is not showing any output, it may mean that our job's execution is finished.

Deleting jobs from the queue: qdel

If we want to delete a job we have sent to any queue, we can do it by using the next command:

qdel <#job>
gabriel has deleted job 000001

NOTE: To obtain the job's identifier (#job) we can use the qstat command

Suspend and restart a job: qmod

If we want to suspend a job until we want no restart it, we have to execute:

qmod -sj <#job>

When we want to restart it, we have to run:

qmod -usj <#job>

Executed process output file

Due to the fact that the jobs we send are not instantly executed, the standard and error outputs are redirected to files. Once the job is being run, by default it leaves us the standard and error outputs each one in one file in our home, identified by:

<job_name>.o<job_id>
<job_name>.e<job_id>

simple.sh.e000001  simple.sh.o000001

Furthermore, if we want to redirect one (or both) output/s to any other place, we can do it, as we have seen before, by this flags:

-e : error
-o : output

How to debugar a job

If we want to know if our job has finished correctly, we must have a look at the job's accounting information focusing on the exit status and failed fields. To obtain the accounting information it is necessary to execute:

qacct -g <job_id>

==============================================================
qnameshort
hostnamenode119
grouplsi
owner fgalindo
project NONE
department sistemas
jobnamesimple.sh
jobnumber3221295
taskidundefined
accountsge
priority0
qsub_timeFri Mar 14 12:13:07 2014
start_timeFri Mar 14 12:13:22 2014
end_time Fri Mar 14 12:13:22 2014
granted_peNONE
slots1
failed0
exit_status1
ru_wallclock0
ru_utime0.001
ru_stime0.008
ru_maxrss4244
ru_ixrss0
ru_ismrss0
ru_idrss0
ru_isrss0
ru_minflt1860
ru_majflt0
ru_nswap0
ru_inblock0
ru_oublock16
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals0
ru_nvcsw15
ru_nivcsw1
cpu0.009
mem0.000
io0.000
iow0.000
maxvmem0.000
aridundefined

The failed field indicates the problem which occurred in case a job could not be started on the execution host. Besides, the exit_status field shows the job's exit code, determined by the normal shell conventions.

In case of error, a value of 128 is added to the value of the command. For instance, if a job dies through signal 9 (SIGKILL), then the exit status becomes 137.

If we tell the system to send us a mail when a job has finished or has been killed, we will also receive relevant information:

Job 3240728 (simple.sh) Aborted
Exit Status      = 137
Signal           = KILL
User             = fgalindo
Queue            = short@node112
Host             = node112
Start Time       = 03/19/2014 12:32:33
End Time         = 03/19/2014 12:32:35
CPU              = 00:00:00
Max vmem         = 14.855M
failed assumedly after job because:
job 3240728.1 died through signal KILL (9)
			

Qmon

Qmon is a graphical user interface that provides a job submission dialog box and a Job Control dialog box for the tasks of submitting and monitoring jobs. We can start Qmon by using the next command:

qmon

After the splash screen, we will access the dialog box which will allow us to select any of the program's options.

qmon_main

We can find further information on Qmon at:

External links

Node information and its properties: http://mastercluster1.lsi.upc.edu/info_nodes
Ganglia software monitoring program: http://mastercluster1.lsi.upc.edu/ganglia