This is a short introduction into using the BSWIFT cluster based on the documentation on High Performance Computing of the Division of Information Technology. 

Requirements for authentication:

A TerpConnect/Glue account is required to access the unix environment on campus. The username and password for this will be the same as your campus Directory ID (username) and password, but it might need to be activated separately. You can find detailed instructions on how to activate your TerpConnect account in the campus knowledge base. If you are not a member of the University of Maryland (faculty, staff, or registered student), you can get a TerpConnect/Glue account if you are working with a faculty member who is willing to sponsor you as an affiliate. You need an account on the BSWIFT cluster, which provides you with a local home directory on BSWIFT and the right to run jobs on BSWIFT. Account requests are reviewed by a BSWIFT administrator, for student accounts we require the consent of the sponsoring faculty.

Connecting to BSWIFT:

BSWIFT is a high performance computing cluster running RedHat linux. It is assumed that you are familiar with basic unix commands. You can connect to BSWIFT using the secure shell protocol (ssh) and transfer files to or from BSWIFT using the secure file transfer protocol (sftp). On a Microsoft Windows computer you need a SSH client program like PuTTY and WinSCP (Type in the ‘host name’ field: login.bswift.umd.edu. Do not change the ‘Port Number’ (22)). On a MAC or linux (unix) terminal you can simply type:

      ssh username [at] login [dot] bswift [dot] umd [dot] edu    or:    ssh –l username login.bswift.umd.edu

‘username’ is your UMD directory ID. You will be asked for your campus password – at the first time a warning message will appear showing the RSA fingerprint of the host ‘login.bswift.umd.edu (128.8.204.140)’ and asking you: “are you sure you want to continue connecting (yes/no)?” On ‘yes’ you will be asked for your campus password.

Now you are logged in to your home directory on your default command shell: bash (or sh) or tcsh (or csh). You can check the command shell flavor by typing     ps –p $$.   When writing a job script, this might be important to know.

Note: If you want to open a remote BSWIFT display on your local machine (e.g. an editor like gedit or an image or pdf document), you have to enable X11-forwarding (option “-X”):

      ssh –X username [at] login [dot] bswift [dot] umd [dot] edu

If connecting via PuTTY, you need to enable X11 forwarding (on the left panel “Category” under Connection > SSH > X11), and you have to install and start a X Windows server like Xming to display the graphics.

Accessing installed software:

GLUE software packages (open-source and proprietary), that are not included in your default environment, must be loaded using the ‘module’ command (note: the ‘tab’ command is obsolete).

e.g.    module load matlab   or to load a specific version:   module load matlab/2016b

Note: if you want to load a different version, you have to unload this software package first.

e.g.     module unload matlab
            module load matlab/2016a

To list all available software packages and versions, type:     module avail

Job submission:

In order to run a program on an HPC cluster, resources must be allocated and the processing be scheduled. Resources are the requested compute nodes (or CPU cores) and authentication/licenses for running the job. The scheduler queues your job request, at BWIFT on a first-come basis; priority scheduling might be introduced in case that we experience many pending jobs.

The GLUE clusters use the slurm Workload Manager for resource management, scheduling, and accounting. On BSWIFT we only have standard queues (“partitions”) defined.

      login-1: sinfo

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
standard up 14-00:00:00 10 mix compute-4-[2-9,31-32]
standard up 14-00:00:00 29 idle compute-4-[10-30,33],compute-5-[1-7]
high-priority up 14-00:00:00 10 mix compute-4-[2-9,31-32]
high-priority up 14-00:00:00 29 idle compute-4-[10-30,33],compute-5-[1-7]
debug up 15:00 1 idle compute-4-1
scavenger up 31-00:00:00 10 mix compute-4-[2-9,31-32]
scavenger up 31-00:00:00 30 idle compute-4-[1,10-30,33],compute-5-[1-7]

Notes: If no partition is specified, the job runs on the standard partition. There is no difference between ‘standard’ and ‘high-priority’ since priority scheduling is not activated on BSWIFT.
Jobs on the ‘scavenger’ partition can run for a long time but will be killed whenever the resources (all compute nodes) are required for jobs in the other partitions.

You have to run your jobs in batch mode, except for testing your code you can do   sinteractive [options]. This will take you to an interactive shell on a compute node, on which you can do work (run your code or script); at the end you have to exit the shell.

e.g.       login-1: sinteractive –p debug
                     salloc: Granted job allocation 202020
                     DISPLAY is login-1.bswift.umd.edu:10.0
                     Try re-authentication (KS). You have no Kerberos tickets
              compute-4-1:   ... run your script or code ...
              compute-4-1:   exit
              login-1:

Note: usually you do not have to re-authenticate your Kerberos credentials, but if you need a Kerberos ticket for your work on this shell, you can type:     renew.

In general, all jobs on BSWIFT must be submitted, which allocates the requested resources as soon as they are available, using

      login-1: sbatch [options] myscript    or    sbatch [options] mycode
      (type:     man sbatch   for available options)

You can check the status of all queued (pending, running, failed, completed) jobs by typing:

      Login-1: squeue

JOBID PARTITION USER ST TIME NODES NODELIST
202020 standard whoever R 1-00:01:59 2 compute-4-31,compute-5-2
202020_1 standard whoever C 13:13:13 1 compute-4-31
202020_2 standard whoever R 23:23:23 1 compute-5-2
....

Note: Every batch or interactive job is given a Job ID (accessible via $SLURM_JOBID), which gets a suffix (e.g. _1) for each ‘step’, i.e. for each task or process as specified on command line or in the job script.

Detailed information about your job or the resources on the BSWIFT compute nodes is listed using    scontrol [command]     (terminal output) or    sview     (Xwindows graphics).

      login-1: scontrol show job
            JobId=202020 ArrayJobId=202020 ArrayTaskId=1 Jobname=myjob
            UserId=whoever(123456) GroupId=bswift-bsos(100903) MCS_label=N/A
            ....
      login-1: scontrol show node compute-4-X   (or   compute-5-Y)     (where: X=1...33, Y=1...7)
      .....

Creating a job script:

The easiest way to specify directives for a batch job is by using job submission scripts. It also makes it easier to submit several jobs (with similar directives) at the same time. The directives may include shell commands, like loading the required software or redirecting the output, for instance:

      login-1: cat myscript
            #!/bin/sh
            #SBATCH –t 16:59
            #SBATCH –N 4
            #SBATCH –mem=4096
            #SBATCH –J=myjob
            . ~/.profile
            module load matlab/2016a
            mybindir=$PWD
            myworkdir=/tmp/$USER/
            myoutdir=/data/bswift-1/$USER/
            [ -d $myworkdir ] || mkdir –p $myworkdir
            [ -d $myoutdir ] || mkdir –p $myoutdir
            cd $myworkdir
            srun $mybindir/myjob
            cp * $myoutdir

Notes:
Line 1 defines the unix shell for the commands in this script (local variables, conditions, loops, etc. are coded differently for bash,sh and csh,tcsh shells).
Lines 2-5 are optional batch directives (can be overwritten by command line options to sbatch).
Line 5 specifying a job name may help to easier identify this job.
Note: after the job finished, the file ‘slurm_%j.out’ contains all (stdout) messages during execution time, where %j denotes the job id (variable   ${SLURM_JOBID} ). You can change the name on the command line or in the script header ( #SBATCH –output=myjob.out ):
            login-1: sbatch –o=myjob.out myscript
Line 6 ( . ~/.profile ) is used to define your environment (only if your login shell is tcsh or csh).
Lines 8-13 are optional settings to define (and create if necessary) local work and output directories.
Line 14 the ‘srun’ command is only required if more than 1 CPU core is requested (see line 3).
Line 15 saves the output on the fileserver.