MHPCC
MHPCC Archive Maintenance July 23rd, 2014 0800-1200
22 July 2014
The MHPCC Archive system will be undergoing maintenance on Wednesday, July 23rd, 2014 0800-1200 (HST).

The archive system will not be accessible/available during this period.

Please plan accordingly.
MHPCC User Services
HPC Portal Maintenance - Tuesday, July 29th, 2014
21 July 2014
The MHPCC Portal will be down for planned maintenance and upgrades on Tuesday, July 29th, from 12 p.m. to 5 p.m. (HST).

During this time the Portal will be unavailable.
Please plan accordingly.

Thank You,
Portal Development Team
MHPCC Maintenance - Infrastructure Coupled with Scheduled Maintenance
14 July 2014
Due to required infrastructure modifications, the MHPCC DSRC will undergo a full power outage on the weekend of

August 2nd - 4th, 2014.

MHPCC will forego the July maintenance and couple it in preparation for the infrastructure modifications.

MHPCC maintenance will begin on Friday, August 1st, 2014 0800 (HST). MHPCC systems are scheduled to be operational on Monday, August 4th, 2014 0000 (HST).

There will be no connectivity to any system at the MHPCC DSRC and all systems will be powered down during the infrastructure maintenance.

Please plan accordingly.

Thank you,
MHPCC User Services

Home » Documentation
Printable versionPrintable version

IBM iDataPlex (Riptide)
User Guide

Table Of Contents

 

Introduction

Document Scope and Assumptions

This document provides an overview and introduction to the use of the IBM iDataPlex (Riptide) located at the MHPCC DSRC, along with a description of the specific computing environment on Riptide. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:

Policies to Review

Users are expected to be aware of the following policies for working on Riptide.

Login Node Abuse Policy

Memory or CPU intensive programs running on the login nodes can significantly affect all users of the system. Therefore, only small applications requiring less than 10 minutes of runtime and less than 2 GBytes of memory are allowed on the login nodes. Any job running on the login nodes that exceeds these limits may be unilaterally terminated.

$WORKDIR Purge Policy

The /scratch/ directory is subject to a 60 day purge policy. A system "scrubber" monitors scratch space utilization, and if available space becomes low, files not accessed within 60 days are subject to removal, although files may remain longer if the space permits. There are no exceptions to this policy.

Scheduled Maintenance Policy

The Maui High Performance Computing Center may reserve the entire system on site for regularly scheduled maintenance the 4th Wednesday of every month from 8:00 am - 10:00 pm (HST). The reservation is scheduled the previous Friday. Every Monday afternoon, a committee convenes to determine if maintenance will be performed.

Additionally, the system may be down periodically for software and hardware upgrades at other times. Users are usually notified of such times in advance by "What's New" and by the login banner. Unscheduled downtimes are unusual but do occur. In such cases, notification to users may not be possible. If you cannot access the system during a non-scheduled downtime period, please send email to or call MHPCC at (808) 879-5077.

Archive Policy

MHPCC has provided on its web site information for best use of the Archive. Users that read/write thousands of files or very large files to the Archive, adversely impact the performance of the Archive for all users. A user that is negatively impacting the performance of the Archive will be notified and advised of how to best use the Archive. After being notified, if the user continues to adversely impact the Archive, the user's access to the Archive will be suspended until the user has agreed to follow best use practices. Data that is stored on the Archive must be for legitimate projects or task orders. Users will be asked to remove data from the Archive that is not for a sanctioned project or task order. If the user does not remove the unacceptable data from the archive, it will be removed by the MHPCC storage administrator.

Obtaining an Account

The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account". If you do not yet have a pIE User Account, please visit the HPC Centers: Obtaining An Account and follow the instructions there. Once you have an active pIE User Account, visit the MHPCC accounts page for instructions on how to request accounts on the MHPCC DSRC HPC systems. If you need assistance with any part of this process, please contact HPC Centers at accounts@ccac.hpc.mil.

Requesting Assistance

HPC Centers is available to help users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m. - 11:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).

System Configuration

System Summary

Riptide is an IBM iDataPlex. The login and compute nodes are populated with two Intel Sandy Bridge 8-core processors. Riptide uses the FDR 10 Infiniband interconnect and uses IBM's General Parallel File System (GPFS). Riptide has 756 compute nodes with Direct Water Cooling; memory is not shared across the nodes. Each diskless compute node has two 8-core processors (16 cores) with its own Red Hat Enterprise Linux OS, sharing 32 GBytes of memory. Riptide is rated at 251.6 peak TFLOPS.

Node Configuration
Login Nodes Compute Nodes
Total Nodes

4

756

Operating System

RedHat Linux

RedHat Linux

Cores/Node

16

16

Core Type

Intel Sandy Bridge

Intel Sandy Bridge

Core Speed

2.6 GHz

2.6 GHz

Memory/Node

64 GBytes

32 GBytes

Memory Model

Distributed

Distributed

Interconnect Type

Mellanox SX6036 FDR/10 Infiniband

Mellanox SX6036 FDR/10 Infiniband


File Systems on Riptide
Path Capacity Type
/scratch/<uid>

PBytes

GPFS

/gpfs/home/<uid>

TBytes

GPFS

/p/cwfs/<uid>

TBytes

GPFS

/mnt/archive/<uid>

32 TBytes

NFS

Processors

Riptide uses 2.6-GHz Intel Sandy Bridge processors on its login and compute nodes. There are 2 processors per node, each with 8 cores, for a total of 16 cores per node.

Memory

Riptide uses a distributed memory model. Memory is shared among all the cores on a node.

Each login node contains 64 GBytes of main memory. All memory and cores on the node are shared among all users who are logged in. Therefore, users should not use more than 2 GBytes of memory at any one time.

Operating System

The operating system on Riptide is RedHat Linux. The operating system supports 64-bit software.

File Systems

Riptide has the following file systems available for user storage:

/gpfs/home/<uid>

This file system is locally mounted from Riptide's GPFS file system. All users have a home directory located on this file system which can be referenced by the environment variable $HOME. Quotas enforced at 10GB by default.

/scratch/<uid>

These directories share Riptide's locally mounted GPFS file system. All users have a work directory located on scratch/<uid> which can be referenced by the environment variable $WORKDIR.

/mnt/archive/<uid>

This NFS mounted file system is accessible from the login nodes on Riptide. Files in this file system are subject to migration to tape and access may be slower due to the overhead of retrieving files from tape. All users have a directory located on this file system which can be referenced by the environment variable $ARCHIVE_HOME.

/p/cwfs/<uid>

This path is directed to the Center-Wide File System (CWFS) which is meant for short-term storage (no longer than 30 days), and is accessible from the login nodes on Riptide. All users have a directory defined in this file system. The environment variable for this is $CENTER.

Peak Performance

Riptide is rated at 251.6 peak TFLOPS.

Accessing the System

Kerberos

A Kerberos client kit must be installed on your desktop to enable you to get a Kerberos ticket. Kerberos is a network authentication tool that provides secure communication by using secret cryptographic keys. Only users with a valid HPCMP Kerberos authentication can gain access to Riptide. More information about installing Kerberos clients on your desktop can be found at the HPC Centers: Customer Service page.

Logging In

Riptide may be accessed via Kerberized SSH:

or by the following Login Nodes:

riptide01.mhpcc.hpc.mil (140.31.196.51)
riptide02.mhpcc.hpc.mil (140.31.196.52)
riptide03.mhpcc.hpc.mil (140.31.196.53)
riptide04.mhpcc.hpc.mil (140.31.196.54)

NOTE: riptide.mhpcc.hpc.mil serves as the rotating alias for the 4 login nodes.

Login nodes are shared access points for Riptide. Therefore, users should not run resource intensive processes on these nodes. MHPCC reserves the right to kill user processes without notice that may be affecting primary access functionality of the login nodes.

File Transfers

File transfers to DSRC systems (except for those to the local archive server) must be performed using Kerberized versions of the following tools: scp, mpscp, sftp, ftp, and kftp. Before using any Kerberized tool, you must use a Kerberos client to obtain a Kerberos ticket. Information about installing and using a Kerberos client can be found at the HPC Centers: Customer Service page.

The command below uses secure copy (scp) to copy a single local file into a destination directory on a Riptide login node. The mpscp command is similar to the scp command, but has a different underlying means of data transfer, and may enable greater transfer rate. The mpscp command has the same syntax as scp.

% scp local_file user@riptide0#.mhpcc.hpc.mil:/target_dir (# = 1 to 4)

Both scp and mpscp can be used to send multiple files. This command transfers all files with the .txt extension to the same destination directory. More information about mpscp can be found on the mpscp man page.

% scp *.txt user@riptide0#.mhpcc.hpc.mil:/target_dir (# = 1 to 4)

The example below uses the secure file transfer protocol (sftp) to connect to Riptide, then uses the sftp cd and put commands to change to the destination directory and copy a local file there. The sftp quit command ends the sftp session. Use the sftp help command to see a list of all sftp commands.

% sftp user@riptide0#.mhpcc.hpc.mil (# = 1 to 4)

sftp> cd target_dir
sftp> put local_file
sftp> quit

The Kerberized file transfer protocol (kftp) command differs from sftp in that your username is not specified on the command line, but given later when prompted. The kftp command may not be available in all environments.

% kftp riptide0#.mhpcc.hpc.mil (# = 1 to 4)

username> user
kftp> cd target_dir
kftp> put local_file
kftp> quit

Filezilla

Windows users may use a graphical file transfer protocol (ftp) client such as Filezilla.

The latest version/update/release of Filezilla has some issue with firewalls. Here is a workaround options for you to try after you have obtained a valid Kerberos ticket:

Within Filezilla:

File > Site Manager
Create a NEW SITE with these settings:
Host: system you want to go to EX: riptide02.mhpcc.hpc.mil
Port: 22
Servertype: SFTP using SSH2
Logontype: Normal
User: your user_id at MHPCC
Password: hit the spacebar once and save&exit or connect. If you choose connect you'll be on MHPCC if you save&exit, go back in and select and connect.

User Environment

Environment Variables

A number of environment variables are provided by default on all HPCMP HPC systems. We encourage you to use these variables in your scripts where possible. Doing so will help to simplify your scripts and reduce portability issues if you ever need to run those scripts on other systems.

Login Environment Variables

The following environment variables can be accessed in your shell or scripts:

 
Environment Variable Description
$ARCHIVE_HOME This is an individual user's directory on the permanent file system that serves a given compute platform. ARCHIVE_HOME is intended to be used as permanent file storage area by a user. It is not intended to be used by executing programs as this can significantly slow the I/O portion of a program. This directory can be accessed from the login nodes of Riptide.
$ARCHIVE_HOST This is the hostname of the archival system serving a particular compute platform. In cases where the archival system is not network mounted to the compute platform, the combination of ARCHIVE_HOST and ARCHIVE_HOME will give the exact location of a user's permanent archival storage.
$BCI_HOME Variable points to location of HPCMO common set of open source utilities per HPCMO Baseline Configuration Program.
$CENTER This variable contains the path for a user’s directory on the Center Wide File System (CWFS).
$CSE_HOME Variable points to location of Computational Science Environment (CSE) Software
$CSI_HOME Variable points to location of HPCMO Consolidated Software installed at MHPCC in relation to HPCMO CSI Program.
$JAVA_HOME This variable contains the path to the base directory of the default installation of JAVA on a particular compute platform. If the platform does not have JAVA installed, this variable should not be defined.
$PET_HOME This variable contains the path to the system-wide accessible directory containing the tools installed by the PET CE staff.
$SAMPLES_HOME Variable points to a directory that has example codes and scripts for this system. The directory contains an index file named INDEX.txt. This file includes the name and a brief explanation of each sample.
$WORKDIR This is an individual user's directory on the local temporary file system (i.e., local high speed disk) that is available on all HPCMP high performance computing (HPC) systems. WORKDIR is intended to be used by executing programs to perform file I/O that is local to that system in order to avoid slower file I/O across a network mounted file system, such as a user home or archive directories. It is not intended to be used as a permanent file storage area by users as files and directories older than 14 days are automatically deleted. Accordingly, this file system is NOT backed up or exported to any other system. In the event of file or directory structure deletion or a catastrophic disk failure, such files and directory structures are lost. Thus, it is the user's responsibility to transfer files that need to be saved to a location that allows for permanent file storage, such as the user's archival or home directory locations. This file system is also purged regularly based upon HPCMP Baseline requirements.

Batch-Only Environment Variables

In addition to the variables listed above, the following variables are automatically set only in your batch environment. That is, your batch scripts will be able to see them when they run. These variables are supplied for your convenience and are intended for use inside your batch scripts.

 
Environment Variable Description
$BC_CORES_PER_NODE Variable contains the number of cores per node for the compute node type to which a job is being submitted.
$BC_MEM_PER_NODE Variable contains the approximate maximum memory per node available to an end user program (in integer MBs) for the compute node type to which a job is being submitted.
$BC_MPI_TASKS_ALLOC This variable, intended to be referenced from inside a job script, shall contain the number of MPI tasks that are allocated for a particular job.
$BC_NODE_ALLOC This variable, intended to be referenced from inside a job script, shall contain the number of nodes allocated for a particular job.

Modules

Software modules are a very convenient way to set needed environment variables and include necessary directories in your path so commands for particular applications can be found. We strongly encourage you to use modules. For more information on using modules, see the Modules User Guide.

Archive Usage

Archive storage is provided through the /mnt/archive/<uid> NFS-mounted file system. All users are automatically provided a directory under this file system. However, it is only accessible from the login nodes. Since space in a user's login home area is limited, all large data files requiring permanent storage should be placed in/mnt/archive/<uid>. Also, it is recommended that all important smaller files for which a user requires long-term access be copied to/mnt/archive/<uid>as well. For more information on using the archive system, see the Archive System User Guide.

Program Development

Programming Models

Riptide supports two programming models: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP). MPI is an example of a message- or data-passing model. OpenMP only uses shared memory on a node by spawning threads.

Message Passing Interface (MPI)

Riptide has four MPI-2.0 standard library suites: IntelMPI, OpenMPI, MPICH2, and IBM PE. The modules for these MPI libraries are mpi/intelmpi/x.x.x, mpi/openmpi/x.x.x, mpi/mpich2/x.x.x, and mpi/ibmpe/x.x.x.x.

Open Multi-Processing (OpenMP)

OpenMP is available in Intel's Software Development suite for C, C++ and Fortran. Use the "-openmp" flag.

Available Compilers

Riptide has two compiler suites:

The paths for the compilers are already setup for users through the use of "modules". The default modules loaded can be viewed by executing the command module list

To see what modules are available execute the command module avail

To change environment to a different module/compiler execute the command module purge

GNU Compilers:

Compiler Description
gcc C compiler, found in path /usr/bin
g++ C++ compiler, found in path /usr/bin
g77 Fortran 77 compiler, found in path /usr/bin
mpicc Compiles and links MPI programs written in C
mpiCC Compiles and links MPI programs written in C++
mpif77 Compiles and links MPI programs written in Fortran 77

Intel Compilers:

Compiler Description
icc Intel C compiler
icpc Intel C++ compiler
ifort F77 and F90 compiler
mpicc Compiles and links MPI programs written in C
mpiCC Compiles and links MPI programs written in C++
mpif77 Compiles and links MPI programs written in Fortran 77
mpif90 Compiles and links MPI programs written in Fortran 90


NOTE: All MPI compilers are built for Infiniband interconnect communication. We do not support slower ethernet drivers.

Library paths:

/usr/lib
/usr/lib64

Batch Scheduling

Scheduler

The Portable Batch System (PBS) is currently running on Riptide. It schedules jobs and manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. PBS is able to manage both single-processor and multiprocessor jobs. The PBS module is automatically loaded by the Master module on Riptide at login.

Queue Information

The following table describes the PBS queues available on Riptide:

Queue Name Description
urgent Approved by HPCMP Director only
debug LESS than 30 minutes and LESS than or EQUAL to 64 processors
high Must be approved by Service/Agency Principal
challenge Must be a Challenge project approved by HPCMP
standard Must be an approved HPCMP project
mhpcc Non-HPCMP projects
background Lowest priority, 8 hour limit, no allocation subtraction

Interactive Logins

When you log in to Riptide, you will be running in an interactive shell on a login node. The login nodes provide login access for Riptide and support such activities as compiling, editing, and general interactive use by all users. Please note the Login Node Abuse policy. The preferred method to run resource intensive executions is to use an interactive batch session.

Interactive Batch Sessions

An interactive session on a compute node is possible using a proper PBS command line syntax from a login node. Once PBS has scheduled your request on the compute pool, you will be directly logged into a compute node, and this session can last as long as your requested wall time.

To submit an interactive batch job, use the following submission format:

qsub -I -X -l walltime=HH:MM:SS -l select=#_of_nodes:ncpus=16:mpiprocs=16 -l place=scatter:excl -A proj_id -q your_queue -V

Your batch shell request will be placed in the interactive queue and scheduled for execution. This may take a few minutes or a long time depending on the system load. Once your shell starts, you will be logged into the first compute node of the compute nodes that were assigned to your interactive batch job. At this point, you can run or debug applications interactively, execute job scripts, or start executions on the compute nodes you were assigned. The "-X" option enables X-Windows access, so it may be omitted if that functionality is not required for the interactive job.

Batch Request Submission

PBS batch jobs are submitted via the qsub command. The format of this command is:

qsub [ options ] batch_script_file

qsub options may be specified on the command line or embedded in the batch script file by lines beginning with "#PBS".

For a more thorough discussion of PBS Batch Submission, see the Riptide PBS Guide.

Batch Resource Directives

A listing of the most common batch Resource Directives is available in the Riptide PBS Guide.

Launch Commands

There are different commands for launching MPI executables from within a batch job depending on which MPI implementation your script uses.

To launch an IntelMPI executable, use the mpirun command as follows:

mpirun ./mympijob.exe

To launch an OpenMPI executable, use the openmpirun.pbs command as follows:

openmpirun.pbs ./mympijob.exe

To launch an MPICH2 executable, use the mpiexec command as follows:

mpiexec -launcher -n #_of_MPI_tasks ssh -f $PBS_NODEFILE ./mympijob.exe

To launch an IBM PE MPI executable, use the mpiexec command as follows:

mpiexec ./mympijob.exe

For OpenMP executables, no launch command is needed.

Sample Script

Examples are available in the Riptide PBS Guide and in the Sample Code Repository ($SAMPLES_HOME) on Riptide.

PBS Commands

The following commands provide the basic functionality for using the PBS batch system:

qsub: Used to submit jobs for batch processing.
qsub [ options ] my_job_script

qstat: Used to check the status of submitted jobs.
qstat PBS_JOBID ## check one job
qstat -u my_user_name ## check all of user's jobs

qdel: Used to kill queued or running jobs.
qdel PBS_JOBID

A more complete list of PBS commands is available in the Riptide PBS Guide.

Advance Reservations

An Advance Reservation Service (ARS) is available on Riptide for reserving cores for use, starting at a specific date/time, and lasting for a specific number of hours. The specific number of reservable cores changes frequently, but is displayed on the reservation page for each system in the ARS. The ARS is accessible via most modern web browsers at https://reservation.hpc.mil. Authenticated access is required. An ARS User's Guide is available online once you have logged in.

Software Resources

Application Software

All Commercial Off The Shelf (COTS) software packages can be found in the $CSI_HOME (/usr/cta) directory. A complete listing of software on Riptide with installed versions can be found on our software page. The general rule for all COTS software packages is that the two latest versions will be maintained on our systems. For convenience, modules are also available for most COTS software packages.

Hadoop On Riptide

Hadoop is an open-source software for distributed computing
on large clusters. What separates this software from other distributed computing software is that it allows for
the distributed processing of large data sets across clusters of commodity computers using simple
programming models. It can scale up from single server to thousands of machines offering local computation
and storage.

For more information, please review the Hadoop Implementation on Riptide PDF Document.

 

Useful Utilities

The following utilities are available on Riptide:

Useful Utilities
CommandDescriptionUsage
check_license Checks the status of ten HPCMP shared applications grouped into two distinct categories: Software License Buffer (SLB) applications and non-SLB applications. check_license package
node_use Displays memory-use and load-average information for all login nodes of the system on which it is executed. node_use -a
qpeek Returns the standard output (STDOUT) and standard error (STDERR) messages for any submitted PBS job from the start of execution. qpeek PBS_JOB_ID
qview Lists the status and current usage of all PBS queues on Riptide. "qview -h" shows all the qview options available.
showq User friendly, highly descriptive representation of the PBS queue specific to Riptide. showq
show_queues Lists the status and current usage of all PBS queues on Riptide. show_queues
showres An informative command regarding reservations. showres
show_storage Provides quota and usage information for the storage areas in which the user owns data on the current system. show_storage
show_usage Lists the project ID and total hours allocated / used in the current FY for each project you have on Riptide. show_usage

Links to Vendor Documentation

IBM Links

IBM Home: http://www.ibm.com
IBM iDataPlex: http://www-03.ibm.com/systems/x/hardware/rack/dx360m4/index.html

RedHat Links

RedHat Home: http://www.redhat.com/

GNU Links

GNU Home: http://www.gnu.org
GNU Compiler: http://gcc.gnu.org

Intel Links

Intel Home: http://www.intel.com
Intel Sandy Bridge Processor:
http://www.intel.com/technology/architecture-silicon/next-gen/
Intel Software Documentation Library: http://software.intel.com/en-us/articles/intel-software-technical-documentation/

Other Useful Linux Links

Linux High Performance Technical Computing: http://www.linuxhpc.org/
UnixGuide: http://unixguide.net/

^ top