Maui High Performance Computing Center, Phone: (808) 879-5077 An Air Force Research Laboratory Center Managed by the University of Hawaii Home | Support | What's New |
Archiving DataNOTE: MHPCC does not backup any user data. It is strongly encouraged that users back up their data to the archive system frequently.
Users with accounts on mhpcc.hpc.mil systems will be able to archive their data. The archive is mounted on interactive/login nodes typically as /mnt/archive/$LOGNAME. The ARCHIVE_HOME environment variable points to the actual archive directory. The archive file system is NOT a high performance file system. Data that is stored on the archive file system is automatically migrated to tape. Accessing data that has been migrated to tape requires a tape mount and copy from tape to disk cache. If the data on the disk cache is not being accessed (file not open) that data is a candidate to be migrated back to tape. Data on the archive is purged when the user's account is deleted. All users are encouraged to back up their data to the archive and to use the archive for staging large data sets to/from scratch directories on high performance file systems. Hints for best use of the archive: Instead of writing hundreds of individual files to the archive, bundle them into one tar file and write the tar file to the archive. This will greatly speed up the retrieval process from the archive. It is faster to pull one tar file from the archive than hundreds of individual files. When retrieving files from the archive, do it file by file in sequential order. Do not pull files from the archive in parallel. The files are on tape which is a sequential access medium. You may pull/push data from the archive via a script file from any interactive Standard UNIX file/directory commands will function on your archive directory, e.g., ls, mkdir, cp, rmdir, rm. NOTE: The PS Toolkit (PST) Archive commands are
also supported for those familiar with that command set. See pstoolkit.org for
more information. For man pages"man -M /mnt/cfs/pkgs/pstoolkit/man archive"
The pstoolkit archive command is available on the MHPCC Mana system for users with the default login shell of bash or Borne. To use the pstoolkit to stage files from the ARCHIVE, run a batch job and write resulting output to the ARCHIVE is a process that uses 2 files and 1 command. A script file to execute the archive “get” command and submit the second file, your command/submit file. The command is “nohup” to eliminate manual intervention. A generic script file example: NOTE: Make sure the SCRIPT file is executable by you. A generic submit_file.cmd example: your setup mpirun your_executable your cleanup /mnt/cfs/pkgs/pstoolkit/bin/archive put -C /mnt/archive/user_id/dir/ file_names The nohup command: Explanation of the above: ps -p PID (to check on the progress of the SCRIPT_NAME) For users that regularly perform this function, the command is: archive get -C archive_directory_structure file_name Concisely, Transfer Queue AvailableMHPCC has created a transfer queue on the Mana system. This queue is dedicated to archiving data. Its purpose is to allow archiving of data without expending CPU hours or using batch nodes for the sole purpose of data transfers. It is recommended to use this queue after your job has completed. Submit a standalone PBS transfer command file or modify your job to submit the PBS transfer command file upon successful job run completion, ie: a "chaining" of jobs.Here is a sample standalone command file that you may use as a template: ============ This is an efficient and effective tool for archiving data and we encourage you to use the archive queue routinely. Please contact the CCAC or MHPCC Help Desk if you require assistance. LimitsAs 500GB is the size of one tape cartridge. it is not advisable due to performance and increased risk of data loss, to have files larger than 500GB on the archive system, as spanning tapes is not recommended. Large/Huge files take time to be written to tape, consequently they reside on the disk cache until they are completely written to tape and then released from the disk cache. This may cause an issue with your transfer and the archive system as a whole. It is advised that all files be less than 500 GB to reduce the probability of any unforeseen issues from occurring. EXAMPLE: tar -zcvlO <files|dirs> | split -b 5120m - $ARCHIVE_HOME/mylargefile.tgz For more information please visit our Mana documentation. Hints for best use of the archive:Instead of writing many individual files to the archive, bundle them into one tar file and write the tar file to the archive. This will greatly speed up the retrieval process from the archive. It is faster to pull one tar file from the archive than many individual files. Please keep the size of your tar file under 500GB. When retrieving files from the archive, do it file by file in sequential order. Do not pull files from the archive in parallel. The files are on tape which is a sequential access medium. Transferring Archive Data to a Remote SiteIf you have many small files on the archive (see Hints for best use of the archive) bundle them into a tar file of no larger than 500GB. It can take over 4 days to transfer a 500GB file to a remote location. nohup tar -pzcf $WORKDIR/MY_FILE.tar.tgz $ARCHIVE_HOME/MY_DIR/ & To transfer a file from the archive to a remote location: kinit NOTE: Transfer files one-at-a-time (sequentially). |
|
|