Guide to the PBS Queuing System
on Hokule'a
Table of Contents
- 1. Introduction
- 2. Anatomy of a Batch Script
- 2.1. Specify Your Shell
- 2.2. Required PBS Directives
- 2.2.1. Number of Cores per Node
- 2.2.2. Number of Nodes and Processes per Node
- 2.2.3. How Nodes Should Be Allocated
- 2.2.4. How Long to Run
- 2.2.5. Which Queue to Run In
- 2.2.6. Your Project ID
- 2.3. The Execution Block
- 3. Submitting Your Job
- 4. Simple Batch Script Example
- 5. Job Management Commands
- 6. Optional PBS Directives
- 6.1. Job Identification Directives
- 6.1.1. Application Name
- 6.1.2. Job Name
- 6.2. Job Environment Directives
- 6.2.1. Interactive Batch Shell
- 6.2.2. Export All Variables
- 6.2.3. Export Specific Variables
- 6.3. Reporting Directives
- 6.3.1. Redirecting Stdout and Stderr
- 6.3.2. Setting up E-mail Alerts
- 6.4. Job Dependency Directives
- 6.5. Requesting Large-Memory Nodes
- 7. Environment Variables
- 7.1. PBS Environment Variables
- 7.2. Other Important Environment Variables
- 8. Example Scripts
- 8.1. PBSPro batch script REQUIRED stanzas
- 8.2. PBSPro batch script OPTIONAL stanzas
1. Introduction
On large-scale computers, many users must share available resources. Because of this, you cannot just log on to one of these systems, upload your programs, and start running them. Essentially, your programs (called batch jobs) have to "get in line" and wait their turn. And, there is more than one of these lines (called queues) from which to choose. Some queues have a higher priority than others (like the express checkout at the grocery store). The queues available to you are determined by the projects that you are involved with.
The jobs in the queues are managed and controlled by a batch queuing system, without which, users could overload systems, resulting in tremendous performance degradation. The queuing system will run your job as soon as it can while still honoring the following:
- Meeting your resource requests
- Not overloading systems
- Running higher priority jobs first
- Maximizing overall throughput
We use the PBS Professional queuing system. The PBS module should be loaded automatically for you at login, allowing you access to the PBS commands.
2. Anatomy of a Batch Script
A batch script is simply a small text file that can be created with a text editor such as vi or notepad. You may create your own from scratch, or start with one of the sample batch scripts available in $SAMPLES_HOME. Although the specifics of a batch script will differ slightly from system to system, a basic set of components are always required, and a few components are just always good ideas. The basic components of a batch script must appear in the following order:
- Specify Your Shell
- Required PBS Directives
- The Execution Block
Note: Not all applications on Linux systems can read DOS-formatted text files. PBS does not handle ^M characters well, nor do some compilers. To avoid complications, please remember to convert all DOS-formatted ASCII text files with the dos2unix utility before use on any HPC system. Users are also cautioned against relying on ASCII transfer mode to strip these characters, as some file transfer tools do not perform this function.
2.1. Specify Your Shell
First of all, remember that your batch script is a script. It's a good idea to specify which shell your script is written in. Unless you specify otherwise, PBS will use your default login shell to run your script. To tell PBS which shell to use, start your script with a line similar to the following, where shell is either bash, sh, ksh, csh, or tcsh:
#!/bin/shell
2.2. Required PBS Directives
The next block of your script will tell PBS about the resources that your job needs by including PBS directives. These directives are actually a special form of comment, beginning with "#PBS". As you might suspect, the # character tells the shell to ignore the line, but PBS reads these directives and uses them to set various values. IMPORTANT!! All PBS directives MUST come before the first line of executable code in your script, otherwise they will be ignored.
Every script must include directives for the following:
- The number of cores per node
- The number of nodes and processes per node you are requesting
- How nodes should be allocated
- The maximum amount of time your job should run
- Which queue you want your job to run in
- Your Project ID
PBS also provides additional optional directives. These are discussed in Optional PBS Directives, below.
2.2.1. Number of Cores per Node
The Hokule'a architecture allows only one job per node. Because of this, no matter how many cores per node your job actually requires, your job will always be allocated all 160 cores on that node. However PBS can schedule your job, it still needs to know how many cores per node are required. This is accomplished with the "ncpus" directive. This value should always be set to the total number of cores on the node (160) and is included on the same line as "select" and "mpiprocs" directives. It can, if desired, be specified on a separate line.
Example: Serial code using all of the node's memory.
#PBS -l select=1:ncpus=160
#PBS -l place=excl
2.2.2. Number of Nodes and Processes per Node
Before PBS can schedule your job, it needs to know how many nodes you want. Before your job can be run, it will also need to know how many processes you want to run on each of those nodes. In general, you would specify one process per core, but you might want more or fewer processes depending on the programming model you are using. See Example Scripts (below) for alternate use cases.
Both the number of nodes and processes per node are specified using the same directive as follows, where N1 is the number of nodes you are requesting and N2 is the number of processes per node:
#PBS -l select=N1:ncpus=160:mpiprocs=N2
2.2.3. How Nodes Should Be Allocated
Because only one job per node can be scheduled on Hokule'a, the following PBS directive is required in all batch scripts:
#PBS -l place=scatter:excl
For an explanation of what this directive means, see the qsub man page.
2.2.4. How Long to Run
Next, PBS needs to know how long your job will run. For this, you will have to make an estimate. There are three things to keep in mind.
- Your estimate is a limit. If your job hasn't completed within your estimate, it will be terminated.
- Your estimate will affect how long your job waits in the queue. In general, shorter jobs will run before longer jobs.
- Each queue has a maximum time limit. You cannot request more time than the queue allows.
To specify how long your job will run, include the following directive:
#PBS -l walltime=HHH:MM:SS
2.2.5. Which Queue to Run In
Now, PBS needs to know which queue you want your job to run in. Your options here are determined by your project. Most users only have access to the debug, standard, interactive, and background queues. Other queues exist, but access to these queues is restricted to projects that have been granted special privileges due to urgency or importance, and they will not be discussed here. As their names suggest, the standard and debug queues should be used for normal day-to-day and debugging jobs. The background queue, however, is a bit special because although it has the lowest priority, jobs that run in this queue are not charged against your project allocation. Users may choose to run in the background queue for several reasons:
- You don't care how long it takes for your job to begin running.
- You are trying to conserve your allocation.
- You have used up your allocation.
To see the list of queues available on the system, use the show_queues command. To specify the queue you want your job to run in, include the following directive:
#PBS -q queue_name
2.2.6. Your Project ID
PBS now needs to know which project ID to charge for your job. You can use the show_usage command to find the projects that are available to you and their associated project IDs. In the show_usage output, project IDs appear in the column labeled "Subproject." Note: Users with access to multiple projects should remember that the project they specify may limit their choice of queues.
To specify the Project ID for your job, include the following directive:
#PBS -A Project_ID
2.3. The Execution Block
Once the PBS directives have been supplied, the execution block may begin. This is the section of your script that contains the actual work to be done. A well written execution block will generally contain the following stages:
- Environment Setup - This might include setting environment variables, loading modules, creating directories, copying files, initializing data, etc. As the last step in this stage, you will generally cd to the directory that you want your script to execute in. Otherwise, your script would execute by default in your home directory. Most users use "cd $PBS_O_WORKDIR" to run the batch script from the directory where they typed "qsub" to submit the job.
- Compilation - You may need to compile your application if you don't already have a pre-compiled executable available.
- Launching - Your application is launched using the aprun command for CRAY MPICH2 codes and ccmrun for any serial, shared-memory, or non-native MPI codes.
- Clean up - This usually includes archiving your results and removing temporary files and directories.
3. Submitting Your Job
Once your batch script is complete, you will need to submit it to PBS for execution using the qsub command. For example, if you have saved your script into a text file named run.pbs, you would type "qsub run.pbs".
Occasionally you may want to supply one or more directives directly on the qsub command line. Directives supplied in this way override the same directives if they are already included in your script. The syntax to supply directives on the command line is the same as within a script except that #PBS is not used. For example:
qsub -l walltime=HHH:MM:SS run.pbs
4. Simple Batch Script Example
The batch script below contains all of the required directives and common script components discussed above.
#!/bin/bash -l ##Specify your shell ## Required PBS Directives -------------------------------------- #PBS -A Project_ID #PBS -q standard #PBS -l select=4:ncpus=160:mpiprocs=160 #PBS -l place=scatter:excl #PBS -l walltime=12:00:00 #PBS -j oe ## Execution Block ---------------------------------------------- # Environment Setup # cd to your scratch directory in /work cd ${WORKDIR} # create a job-specific subdirectory based on JOBID and cd to it JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1` mkdir -p ${JOBID} cd ${JOBID} ## Launching ----------------------------------------------------- # copy executable from $HOME and submit it cp ${HOME}/mpicode.x . # The following two lines provide an example of setting up and running # an CRAY MPICH parallel code built with the default compiler. aprun -n 128 ./mpicode.x > out.dat # The following two lines provide an example of setting up and running # a CRAY MPICH parallel code built with the CRAY compiler. module swap PrgEnv-intel PrgEnv-cray aprun -n 128 ./mpicode.x > out.dat # The following two lines provide an example of setting up and running # a CRAY MPICH parallel code built with the non-default gcc compiler. module swap PrgEnv-intel PrgEnv-gnu aprun -n 128 ./mpicode.x > out.dat # The following two lines provide an example of setting up and running # a CRAY MPICH parallel code built with the PGI compiler. module swap PrgEnv-intel PrgEnv-pgi aprun -n 128 ./mpicode.x > out.dat ## Clean up ----------------------------------------------------- # Remove temporary files rm *.o *.temp
5. Job Management Commands
The table below contains commands for managing your jobs in PBS.
Command | Description |
---|---|
qsub | Submit a job. |
qstat | Check the status of a job. |
qview | A more user-friendly version of qstat. |
qstat -q | Display the status of all PBS queues. |
show_queues | A more user-friendly version of "qstat -q". |
qdel | Delete a job. |
qhold | Place a job on hold. |
qrls | Release a job from hold. |
tracejob | Display job accounting data from a completed job. |
pbsnodes | Display host status of all PBS batch nodes. |
apstat | Display attributes of and resources allocated to running jobs. |
qpeek | Lets you peek at the stdout and stderr of your running job. |
6. Optional PBS Directives
In addition to the required directives mentioned above, PBS has many other directives, but most users will only use a few of them. Some of the more useful optional directives are listed below.
6.1. Job Identification Directives
Job identification directives allow you to identify characteristics of your jobs. These directives are voluntary, but strongly encouraged. The following table contains some useful job identification directives.
Directive | Options | Description |
---|---|---|
-l application | application_name | Identify the application being used. |
-N | job_name | Name your job. |
6.1.1. Application Name
The "-l application" directive allows you to identify the application being used by your job. This helps the program to accurately assess application usage and to ensure that adequate software licenses and appropriate software are purchased. To use this directive, add a line in the following form to your batch script:
#PBS -l application=application_name
Or to your qsub command
qsub -l application=application_name
A list of application names for use with this directive can be found in $SAMPLES_HOME/Application_Name/application_names on Hokule'a.
6.1.2. Job Name
The "-N" directive allows you to designate a name for your job. In addition to being easier to remember than a numeric job ID, the PBS environment variable, $PBS_JOBNAME, inherits this value and can be used instead of the job ID to create job-specific output directories. To use this directive, add a line in the following form to your batch script:
#PBS -N job_20
Or to your qsub command
qsub -N job_20...
6.2. Job Environment Directives
Job environment directives allow you to control the environment in which your script will operate. The following table contains a few useful job environment directives.
Directive | Options | Description |
---|---|---|
-I | Request an interactive batch shell. | |
-V | Export all environment variables to the job. | |
-v | variable_list | Export specific environment variables to the job. |
6.2.1. Interactive Batch Shell
The "-I" directive allows you to request an interactive batch shell. Within that shell, you can perform normal Unix commands, including launching parallel jobs. To use "-I", append it to the end of your qsub request. You may also use the "-X" option to allow for X-Forwarding to run X-Windows-based Graphical interfaces on the compute node, such as the TotalView debugger. For example:
qsub -A Project_ID -q debug -l select=2:ncpus=160:mpiprocs=160 -l place=scatter:excl -l walltime=1:00:00 -X -I
6.2.2. Export All Variables
The "-V" directive tells PBS to export all of the environment variables from your login environment into your batch environment. To use this directive, add a line in the following form to your batch script:
#PBS -V
Or to your qsub command
qsub -V ...
6.2.3. Export Specific Variables
The "-v" directive tells PBS to export specific environment variables from your login environment into your batch environment. To use this directive, add a line in one of the following forms to your batch script:
#PBS -v DISPLAY
Or to your qsub command
qsub -v DISPLAY
Using either of these methods, multiple comma-separated variables can be included. It is also possible to set values for variables exported in this way, as follows:
qsub -v my_variable=my_value, ...
6.3. Reporting Directives
Reporting directives allow you to control what happens to standard output and standard error messages generated by your script. They also allow you to specify e-mail options to be executed at the beginning and end of your job.
6.3.1. Redirecting Stdout and Stderr
By default, messages written to stdout and stderr are captured for you in files named x.ojob_id and x.ejob_id, respectively, where x is either the name of the script or the name specified with the "-N" directive, and job_id is the ID of the job. If you want to change this behavior, the "-o" and "-e" directives allow you to redirect stdout and stderr messages to different named files. The "-j" directive allows you to combine stdout and stderr into the same file.
Directive | Options | Description |
---|---|---|
-e | file_name | Redirect standard error to the named file. |
-o | file_name | Redirect standard output to the named file. |
-j | oe | Merge stderr and stdout into stdout. |
-j | eo | Merge stderr and stdout into stderr. |
6.3.2. Setting up E-mail Alerts
Many users want to be notified when their jobs begin and end. The "-m" directive makes this possible. If you use this directive, you will also need to supply the "-M" directive with one or more e-mail addresses to be used.
Directive | Options | Description |
---|---|---|
-m | b | Send e-mail when the job begins. |
-m | e | Send e-mail when the job ends. |
-M | e-mail_address(es) | Set the e-mail to address(es) to be used. |
For example:
#PBS -m be #PBS -M joesmith@gmail.com,joe.smith@us.army.mil
6.4. Job Dependency Directives
Job dependency directives allow you to specify dependencies that your job may have on other jobs. This allows users to control the order jobs run in. These directives will generally take the following form:
#PBS -W depend=dependency_expression
where dependency_expression is a comma-delimited list of one or more dependencies, and each dependency is of the form:
type:jobids
where type is one of the directives listed below, and jobids is a colon-delimited list of one or more job IDs that your job is dependent upon.
Directive | Description |
---|---|
after | Execute this job after listed jobs have begun. |
afterok | Execute this job after listed jobs have terminated without error. |
afternotok | Execute this job after listed jobs have terminated with an error. |
afterany | Execute this job after listed jobs have terminated for any reason. |
before | Listed jobs may be run after this job begins execution. |
beforeok | Listed jobs may be run after this job terminates without error. |
beforenotok | Listed jobs may be run after this job terminates with an error. |
beforeany | Listed jobs may be run after this job terminates for any reason. |
For example, run a job after completion (success or failure) of job ID 1234:
#PBS -W depend=afterany:1234
Or, run a job after successful completion of job ID 1234:
#PBS -W depend=afterok:1234
For more information about job dependencies, see the qsub man page.
7. Environment Variables
7.1. PBS Environment Variables
While there are many PBS environment variables, you only need to know a few important ones to get started using PBS. The table below lists the most important PBS environment variables and how you might generally use them.
PBS Variable | Description |
---|---|
$PBS_JOBID | Job identifier assigned to job or job array by the batch system. |
$PBS_O_WORKDIR | The absolute path of directory where qsub was executed. |
$PBS_JOBNAME | The job name supplied by the user. |
The following additional PBS variables may be useful to some users.
PBS Variable | Description |
---|---|
$PBS_ARRAY_INDEX | Index number of subjob in job array. |
$PBS_ENVIRONMENT | Indicates job type: PBS_BATCH or PBS_INTERACTIVE |
$PBS_NODEFILE | Filename containing a list of vnodes assigned to the job. |
$PBS_O_HOST | Host name on which the qsub command was executed. |
$PBS_O_PATH | Value of PATH from submission environment. |
$PBS_O_SHELL | Value of SHELL from submission environment. |
$PBS_QUEUE | The name of the queue from which the job is executed. |
7.2. Other Important Environment Variables
In addition to the PBS environment variables, the table below lists a few other variables which are not specifically associated with PBS. These variables are not generally required, but may be important depending on your job.
Variable | Description |
---|---|
$OMP_NUM_THREADS | The number of OpenMP threads per node. |
$MPI_DSM_DISTRIBUTE | Ensures that memory is assigned closest to the physical core where each MPI process is running. |
$MPI_GROUP_MAX | Maximum number of groups within a communicator. |
8. Example Scripts
All of the script examples shown below contain a "Cleanup" section which demonstrates how to automatically archive your data using the transfer queue and clean up your $WORKDIR after your job completes. Using this method helps to avoid data loss, and ensures that your allocation is not charged for idle cores while performing file transfer operations.
8.1. PBSPro batch script REQUIRED stanzas
-l walltime=HH:MM:SS
DESCRIPTION:
Specifies requested wallclock time for job in hours:minutes:seconds
SYNTAX:
#PBS -l walltime=HH:MM:SS
-A Project_ID
DESCRIPTION:
The project account number assigned by S/AAA
SYNTAX:
#PBS -A Project_ID
-l select=[N]:mpiprocs=20:ncpus=20
DESCRIPTION:
Number of nodes is N
Number of mpiprocs and ncpus MUST be the same number
Number of mpiprocs and ncpus MUST be an integer from 1 - 20
Choosing less than 20 will allow for memory intensive code or highly parallel code to successfully run.
SYNTAX:
#PBS -l select=[N]:mpiprocs=[1-20]:ncpus=[1-20]
EXAMPLE:
#PBS -l select=3:mpiprocs=10:ncpus10
8.2. PBSPro batch script OPTIONAL stanzas
-o filename
DESCRIPTION:
Specifies when you would like the output file to be created.
It is suggested to use $WORKDIR.
SYNTAX:
#PBS -o /your_path/filename
-e filename
DESCRIPTION:
Specifies when you would like the output file to be created.
It is suggested to use $WORKDIR.
SYNTAX:
#PBS -e /your_path/filename
-q queue_name
DESCRIPTION:
Specifies which queue you would like to run in.
The default queue for your project will be used if not specified
SYNTAX:
#PBS -q queue_name
EXAMPLE:
#PBS -q standard
-----------
EXAMPLE: PBSPro Batch Script
#!/bin/bash
## Walltime in hours:minutes:seconds
#PBS -l walltime=05:00:00
## -o specifies output file
#PBS -o <full path>.out
## -e specifies error file
#PBS -e <full path>.error
## Nodes, Processors, CPUs (processors and CPUs should always match)
#PBS -l select=2:mpiprocs=20:ncpus=20
## Enter the proper queue
#PBS -q standard
## MHPCC Account/Project number
#PBS -A <project number>
## Run the MPI executable
cd <full path which you want to run out of>
mpirun <full path of executable>