CHPC

Center for High Performance Computing

Usage

Queuing system

The standard way of consuming cluster ressources is submitting a job which will then be executed on any suitable host according to the job specifics, current load situation and user priority. The queuing system in use is Sun Grid Engine (SGE). Optionally, we can make available some cluster node(s) for special run situations. In this case the user will be in close contact with our operations team and be granted direct (ssh-)access to some set of nodes.

Submitting jobs

Jobs can be submitted to the queuing system with the qsub command. Qsub accepts all necessary job information at the command line level or – alternatively – some job description script.

Example scripts

Singlethreaded job

laptop

#!/bin/sh

## The next line is an instruction to SGE: it tells SGE to email

## you when your job begins, aborts or ends.

#$ -M max@mustermann.at

#$ -m bae

#Shift directories and outputs to current working directory

#$ -cwd -V

#Set the name for the job

#$ -N pytest

/usr/bin/python my_python_program.py

## end of batch script

Multithreaded job

desktop_windows

#!/bin/sh

## The next line is an instruction to SGE: it tells SGE to email

## you when your job begins, aborts or ends.

#$ -M max@mustermann.at

#$ -m bae

# Shift directories and outputs to current working directory

#$ -cwd -V

# Set the name for the job

#$ -N smptest

# Use 32 threads in the smp parallel environment

#$ -pe smp 32

/absolute/path/to/my_parallel_application

## end of batch script

Distributed (MPI-) job

device_hub

#!/bin/sh

## The next line is an instruction to SGE: it tells SGE to email ## you when your job begins, aborts or ends.

#$ -M max@mustermann.at

#$ -m bae

#$ -cwd -V

# Shift directories and outputs to current working directory

# Set the name for the job

#$ -N smptest

# Use 128 threads in the orte parallel environment

#$ -pe orte 128

/opt/openmpi/bin/mpiexec -n $NSLOTS my_distributed_parallel_application

## end of batch script
Then you can submit your script with the command
qsub -b n /absolute/path/to/script
or if you directly call a program, e.g. Matlab
qsub -b y -N Matlab -pe smp 32 /home/apps/MATLAB/R2017a/bin/matlab <OPTIONS>
Warning: Do either not use whitespaces in the job name or surround the whole string with double quotes, e.g. -N "My Job"
Suggestion: Scripts are recommended since they can be easily reused and modified. The use of absolute paths in those scripts is endorsed or change to your working directory beforehand inside your script. Example:

cd /home/testuser/special_dir/example_dir

my_application <OPTIONS> <INPUT_WITH_RELATIVE_PATHS>

Interactive jobs

If you just want to 'get a shell' to quickly test or run some application (e.g. Java-code), you'd invoke an interactive job. This means that SGE allocates a free slot for you on one of the cluster nodes and connects you with a shell-window on that machine. Interactive jobs can be invoked by qlogin (without X11-support) or qrsh (including X11).

Example: Start Matlab for interactive use

qrsh xterm -iconic -e /home/apps/MATLAB/R2015a/bin/matlab
Remark: The above command is a work-around for some problem with X11-forwarding and Matlab. Should we, the administrators, find a solution, it will be posted here.

Multi-threaded interactive jobs

Please note, that interactive jobs are supposed to be single-threaded. If you intend to execute multi-threaded binaries you are asked to request the number of threads, too.

e.g. you need 8 threads in an interactive session:

qlogin -pe smp 8 or qrsh -pe smp 8

Array Jobs

In some cases you may want to call the same application with different inputs. Aside from manually starting the jobs or doing it via a shell script, SGE supports so called array jobs. As an example
-t 10-100:10
would tell the scheduling system to start an array job with 10 sub jobs. Every sub job gets an SGE_TASK_ID assigned. In this example the IDs are ranging from 10 to 100 with a stepsize of 10. That value can be used via $SGE_TASK_ID inside the job script or on the command line. The output files are named accordingly (.SGE_TASK_ID gets appended to every file).

Monitoring

qstat

This command shows information on your currently active as well as waiting jobs and their corresponding JOB_IDs.

qhost

This command shows the utilization of execution and login hosts.

qdel <JOB ID>

This command can be used to abort or stop a job.

More information on qsub, qstat, and qdel can be found in the man pages (e.g. „man qsub“) or on the web (e.g. University of Innsbruck, AT).