MHPCC
License Server Outage - Saturday, 20th December 2014
18 December 2014
On Saturday, 20th December 2014, there will be a 20 minute license server and Advanced Reservation Service (ARS) outage due to DREN upgrades. The outage will occur sometime between 0900-1600 EST.

Software licenses: Production jobs running at all DSRCs that require licenses during the outage may fail during the outage period.

ARS: All scheduled advance reservations will continue as requested. During the outage, no new advanced reservations may be made.

We regret any inconvenience this required outage may cause. All emergency, current, and upcoming outages can be viewed on the front page of centers.hpc.mil.

If you have any questions, please contact the Consolidated Customer Assistance Center (CCAC) by phone at 877-CCAC-039 (877-222-2039) or via email at help@ccac.hpc.mil.

Thank you,
MHPCC User Services
MHPCC Facilities/Infrastructure Maintenance - Saturday, December 6th, 2014
26 November 2014
Due to required infrastructure modifications, the MHPCC DSRC will undergo a full power outage on Saturday, December 6th, 2014 0800 - 1700 (HST).

There will be no connectivity to any system at the MHPCC DSRC and all systems will be powered down.

Please plan accordingly.

Thank You,
MHPCC User Services

Home » Documentation
Printable versionPrintable version

Archive System User Guide

Table Of Contents

Introduction
System Configuration
Accessing The Archive System
Best Use Of The Archive
Environment Variables
Sample PBS Script
Transfer Queue Available
Limits
Transferring Archive Data to a Remote Site

Introduction

This document provides a system overview of the Archival Storage capability and usage at the MHPCC DSRC.

Archive Policy

MHPCC has provided on its web site information for best use of the Archive. Users that read/write thousands of files or very large files to the Archive, adversely impact the performance of the Archive for all users. A user that is negatively impacting the performance of the Archive will be notified and advised of how to best use the Archive. After being notified, if the user continues to adversely impact the Archive, the user's access to the Archive will be suspended until the user has agreed to follow best use practices. Data that is stored on the Archive must be for legitimate projects or task orders. Users will be asked to remove data from the Archive that is not for a sanctioned project or task order. If the user does not remove the unacceptable data from the archive, it will be removed by the MHPCC storage administrator.

System Configuration

Archive System

The Mass Storage Server system consists of a Sun Fire X4470 (with 32 TBytes of online disk storage for recently-accessed user data), a Sun StorageTek SL8500 tape library (with over 1 PByte of tape storage) and the Oracle SAM_QFS software. All data located in the archive file system ($ARCHIVE_HOME) is archived to tape and eventually to a remote disaster recovery center.

Accessing The Archive System

Users with accounts on mhpcc.hpc.mil systems will be able to archive their data. The archive is mounted on interactive/login nodes typically as /mnt/archive/$LOGNAME.

The ARCHIVE_HOME environment variable points to the actual archive directory.

The archive file system is NOT a high performance file system. Data that is stored on the archive file system is automatically migrated to tape. Accessing data that has been migrated to tape requires a tape mount and copy from tape to disk cache. If the data on the disk cache is not being accessed (file not open) that data is a candidate to be migrated back to tape.

Data on the archive is purged when the user's account is deleted.

All users are encouraged to back up their data to the archive and to use the archive for staging large data sets to/from scratch directories on high performance file systems.

You may pull/push data from the archive via a script file from any interactive node. Make sure that the script is excutable e.g. chmod +x archive.script. It is suggested that you nohup this script and place it in the background e.g. nohup archive.script &

Standard UNIX file/directory commands will function on your archive directory, e.g., ls, mkdir, cp, rmdir, rm.

NOTE: The PS Toolkit (PST) Archive commands are also supported for those familiar with that command set. See pstoolkit.org for more information. For man pages"man -M /mnt/cfs/pkgs/pstoolkit/man archive"

The pstoolkit archive command is available on the MHPCC Riptide system for users with the default login shell of bash or Borne.

To use the pstoolkit to stage files from the ARCHIVE, run a batch job and write resulting output to the ARCHIVE is a process that uses 2 files and 1 command.  A script file to execute the archive “get” command and submit the second file, your command/submit file.  The command is “nohup” to eliminate manual intervention.

Best Use of The Archive

Instead of writing many individual files to the archive, bundle them into one tar file and write the tar file to the archive. This will greatly speed up the retrieval process from the archive. It is faster to pull one tar file from the archive than many individual files. Please keep the size of your tar file under 500GB. When retrieving files from the archive, do it file by file in sequential order. Do not pull files from the archive in parallel. The files are on tape which is a sequential access medium.

Environment Variables

The following environment variables are automatically set in your login environment:

$ARCHIVE_HOME

This is an individual user's directory on the permanent file system that serves a given compute platform. ARCHIVE_HOME is intended to be used as permanent file storage area by a user. It is not intended to be used by executing programs as this can significantly slow the I/O portion of a program. This directory can be accessed from the login nodes of Riptide.

$ARCHIVE_HOST

This is the hostname of the archival system serving a particular compute platform. In cases where the archival system is not network mounted to the compute platform, the combination of ARCHIVE_HOST and ARCHIVE_HOME will give the exact location of a user's permanent archival storage.

Sample PBS Script

A generic script file example:
==========
 #!/bin/bash
cd $WORKDIR/dir (where you want to put the archive data for the run)
archive get -C /mnt/archive/user_id/dir/ file_names
cd $WORKDIR/dir (where command file resides)
qsub submit_file.cmd

NOTE: Make sure the SCRIPT file is executable by you.

A generic submit_file.cmd example:
==========
#!/bin/sh
#PBS -l walltime=00:10:00
#PBS -l select=1:mpiprocs=8:ncpus=8
#PBS -l place=scatter
#PBS -A VWXYZ00000000
#PBS -o file.out
#PBS -e file.error
#PBS -q standard

your setup

mpirun your_executable

your cleanup

/mnt/cfs/pkgs/pstoolkit/bin/archive put -C /mnt/archive/user_id/dir/ file_names

The nohup command:
=========
nohup ./SCRIPT_NAME > out$$ 2>&1 &

Explanation of the above:
“nohup” will cause SCRIPT_NAME to continue to execute after you log off
“out$$” the process ID will be appended to the “out” file, thus making it unique
“2>&1” standard out/error will be redirected to the file out$$ (will contain PBS job id)
“&” puts everything in the background so you can continue to use the terminal session

ps -p PID (to check on the progress of the SCRIPT_NAME)

For users that regularly perform this function, the command is:

archive get -C archive_directory_structure file_name
archive put -C archive_directory_structure file_name

Concisely,
- create a script that gets your archive data and submits a command file
- add the archive put in your command file
- execute the nohup command

Transfer Queue Available

MHPCC has created a transfer queue on the Riptide system. This queue is dedicated to archiving data. Its purpose is to allow archiving of data without expending CPU hours or using batch nodes for the sole purpose of data transfers. It is recommended to use this queue after your job has completed. Submit a standalone PBS transfer command file or modify your job to submit the PBS transfer command file upon successful job run completion, ie: a "chaining" of jobs.

Here is a sample standalone command file that you may use as a template:

============
#!/bin/bash
#PBS -l walltime=00:##:00
#PBS -l select=1:mpiprocs=1:ncpus=1
#PBS -l place=scatter
#PBS -o transfer.o
#PBS -e transfer.e
#PBS -A ABCDE00000000
#PBS -q transfer

TRANSFER_DIR=$ARCHIVE_HOME/transfer/$PBS_JOBID

if [ ! -d $TRANSFER_DIR ]; then mkdir -p $TRANSFER_DIR || exit 1;
fi

echo "starting transfer"

cd $WORKDIR

tar -pzcvf $TRANSFER_DIR/file_name.tgz file_name
============

This is an efficient and effective tool for archiving data and we encourage you to use the archive queue routinely. Please contact the HPC Centers: Customer Service or MHPCC Help Desk if you require assistance.

Limits

As 500GB is the size of one tape cartridge. it is not advisable due to performance and increased risk of data loss, to have files larger than 500GB on the archive system, as spanning tapes is not recommended. Large/Huge files take time to be written to tape, consequently they reside on the disk cache until they are completely written to tape and then released from the disk cache. This may cause an issue with your transfer and the archive system as a whole.

It is advised that all files be less than 500 GB to reduce the probability of any unforeseen issues from occurring.

EXAMPLE: tar -zcvlO <files|dirs> | split -b 5120m - $ARCHIVE_HOME/mylargefile.tgz

Transfer Archive Data to a Remote Site

If you have many small files on the archive (see Hints for best use of the archive) bundle them into a tar file of no larger than 500GB. It can take over 4 days to transfer a 500GB file to a remote location.

nohup tar -pzcf $WORKDIR/MY_FILE.tar.tgz $ARCHIVE_HOME/MY_DIR/ &

To transfer a file from the archive to a remote location:

kinit
nohup scp $ARCHIVE_HOME/file_name user@host:remote_file &

^ top