Guide to Using the Archive System
Table of Contents
- 1. Archival Basics
- 1.1. Why do I need to archive my data?
- 1.2. How does archival work?
- 1.2.1. Is there any way to estimate retrieval time?
- 1.3. What are the archival configurations?
- 1.3.1. NFS Mount (AFRL, ARL, MHPCC, and ORS)
- 1.4. What is data staging?
- 2. Important Guidelines
- 2.1. Do use compressed tar files.
- 2.2. Do not overwhelm the archive system.
- 2.3. Do not use files in the archive directly.
- 3. Archival from the Command Line (Manual Staging)
- 3.1. Why might I choose to manually stage my data?
- 3.2. Standardized Archive Command
- 3.2.1. Listing files
- 3.2.2. Archiving files
- 3.2.3. Retrieving files
- 3.2.4. Making directories
- 3.2.5. Checking server status
- 3.3. Non-standardized Archival Commands
- 3.3.1. Deleting a file
- 3.3.2. Deleting a directory
- 3.3.3. Moving or renaming a file or directory
- 3.3.4. Changing the permissions of a file or directory
- 4. Archival in Compute Jobs
- 5. Archival in Transfer Queue Jobs (Batch Staging)
- 5.1. When should I batch stage my data?
- 5.2. What is the transfer queue?
- 5.3. Archival Commands
- 5.4. Staging in via the transfer queue (Pre-staging)
- 5.5. Staging out via the transfer queue
- 5.6. Tying it all together
- 5.6.1. Script 1 of 3 (Pre-staging)
- 5.6.2. Script 2 of 3 (Computation)
- 5.6.3. Script 3 of 3 (Stage out to $ARCHIVE_HOME)
1. Archival Basics
1.1. Why do I need to archive my data?
The short answer is to free up system resources and to protect your data.
Your work directory, $WORKDIR, resides on a large temporary file system that is shared with other users. This file system is intended to temporarily hold data that is needed or generated by your jobs. Since there is no quota on these directories and since user jobs often generate a lot of data, the file system would fill up very quickly if everyone was allowed to just leave their files there indefinitely. This would negatively impact everyone and make the system unusable. To protect the system, an automated purge cycle may run to free up disk space by deleting older or unused files. And, if file space becomes critically low, ALL FILES, regardless of age, are subject to deletion. To avoid this, we strongly encourage you to archive the data you want and keep your $WORKDIR clean by removing unnecessary files. Remember that your $WORKDIR is not backed up; so if your files are purged and you didn't archive them, they are gone forever!
1.2. How does archival work?
The archive system ($ARCHIVE_HOST) provides a long-term storage area for your important data. It is extremely large, and your personal archive directory ($ARCHIVE_HOME) has no quota. Even so, you probably don't want to archive everything you generate.
When you archive a file, it's copied to your $ARCHIVE_HOME directory on the archive server's disk cache, where it waits to be written to tape by the system. The disk cache is a temporary storage area for files moving to and from tape. A file in the cache is said to be "online," while a file on tape is "offline." Once your file is written to tape, it may remain "online" for a short time, but eventually it will be removed from the disk cache to make room for other files in transit. Both online and offline files will show up in a directory listing, but offline files need to be retrieved from tape before you can use them.
Retrieval from tape can take a while, so be patient; there's a lot going on in the background. First, the system must determine on which tape (or tapes) your file resides. These are then robotically pulled from the tape library, mounted in one of the limited number of tape drives (assuming not all of them are busy), and wound into position before retrieval can begin. Your wait time depends on how big your file is, how many tapes it is spread across, and how many other archival jobs are running. After a delay, your file will be retrieved from tape and will be available for use.
1.2.1. Is there any way to estimate retrieval time?
Transfer or wait times are dependent on a multitude of parameters ranging from size and number of files, to number of tapes involved, to network load (tape-to-cache and cache-to-hpc), and others too numerous to list. Since most of these parameters constantly vary throughout the day, estimating transfer times can be very difficult. To assist you, with estimating transfer times, the following table lists observed transfer times at MHPCC DSRC for files of various sizes:
File Size |
Sample Size |
Average Time |
Median Time |
80% Finish Within |
90% Finish Within |
---|---|---|---|---|---|
10 GBytes | 179 | 3 Min. | 3.5 Min. | 10 Min. | 15 Min. |
100 GBytes | 6 | 17 Min. | 15.5 Min. | 31 Min. | 31 Min. |
200 GBytes | 6 | 20 Min. | 20 Min. | 26 Min. | 27 Min. |
500 GBytes | 141 | 87 Min. | 97 Min. | 4.5 hours | 10 hours |
1 TByte | 0 | N/A | N/A | N/A | N/A |
1.3. What are the archival configurations?
When using the standardized archival commands (see Section 3 below), the details of the archival configuration at a center are unimportant. Some archival functions, however, can't be done with the archive command, so it's helpful to understand the archival setup wherever you're working.
There are two main archival processes currently in use across the Program, but each center has minor variations that affect how you access the archive server. Some sites use NFS to mount their archive system on their HPC and Utility Server systems so that they appear as directories on the local machine. This is very convenient but slightly slower. Other sites provide access via remote commands, such as scp or rsh. It's a little less convenient but slightly faster. Other centers do both, and at those centers, you can choose which method to use. In addition, some centers allow direct login to their archive server, allowing you to easily manage archived files.
The table below shows the access methods in use at each center.
NFS Mount | Remote Commands | Direct Login | |
---|---|---|---|
AFRL | x | x | x |
ARL | x | ||
ERDC | x | x | |
MHPCC | x | ||
Navy | x | x | |
ORS | x | x | x |
1.3.1. NFS Mount (AFRL, ARL, MHPCC, and ORS)
An NFS-mounted archive file system provides perhaps the most familiar environment for interacting with archived data. Mounted file systems appear as local directories and are accessible via standard Linux commands, such as cd, mkdir, chmod, etc. Files can be archived/retrieved simply by copying them to/from $ARCHIVE_HOME. This approach is extremely convenient and has virtually no learning curve, but can result in slightly slower transfer speeds, which may be more evident with larger files. It's also not portable if used in job scripts. For portability, we recommend that you use the archive command discussed in Section 3.2. The $ARCHIVE_HOST environment variable is irrelevant for NFS-mounted file systems.
1.4. What is data staging?
Data staging is the process of making sure that your data is in the right place at the right time. Related terms are "staging in" or "pre-staging" and "staging out" or "post-job archival." Before a job can run, the input data needs to be "staged in" or "pre-staged." This simply means that the data is copied from the archive server (or some other source) into a directory that is accessible by the job script. Archiving your output data after the job completes is called "post-job archival" or "staging out." "Staging out" may also refer to moving your output data to another location, like the Center-Wide File System ($CENTER), for further processing.
Staging may be performed manually or via a batch script, but since retrieving a file (especially a large file) from tape may take a while, ensuring that your input data is in place before your job runs, and that it stays there until it runs, isn't always as simple as it sounds. To help with this, every HPC system and Utility Server has a transfer queue just for handling file transfers. For more about manual staging, see Section 3 (below). For more about batch staging with the transfer queue, see Section 5 (below).
2. Important Guidelines
These guidelines are important to help safeguard stability of the archive server and to minimize negative impact to all users. Failure to observe these guidelines may result in loss of archival privileges.
2.1. Do use compressed tar files.
There are two factors that make archival using compressed tar files a good idea: overhead and size.
First, let's look at overhead. Every time you archive or retrieve a file, a complex set of time-consuming actions occurs. Some of these actions are described in Section 1.2, but there are others as well. So, if you archive 100 individual files, those time-consuming actions must be performed 100 times. This can really add up. But if you combine those 100 files into a single tar file, those time-consuming actions happen only once. Also note that NOT using tar files can adversely impact the performance of the archive server for all users.
Now let's look at size. By compressing a tar file, you not only save space on the archive server (which benefits everyone), but you also increase the likelihood that your file will fit entirely on a single tape, eliminating the need to pull and mount multiple tapes and decreasing the chance of file corruption. It also reduces the transfer time when moving the file to or from the archive server. Note: always remember tar/gzip your files before transferring them. There is, however, one gotcha that you need to watch for when using tar files. Do not make them too big. While the optimal tar file size may vary between sites, a maximum tar file size of about 200 GBytes is a good rule-of-thumb. At that size, the time required for file transfer and tape I/O is still reasonable. Files larger than 1 TByte are far more likely to span tapes, greatly increasing archival and retrieval times, as well as the chance that a portion of the file could become unusable. The following table shows the maximum recommended tar file sizes at each of the centers.
Center | Recommended Maximum Tar File Size |
---|---|
AFRL DSRC | 500 GBytes |
ARL DSRC | 200 GBytes |
ERDC DSRC | 500 GBytes |
Navy DSRC | 500 GBytes |
MHPCC DSRC | 200 GBytes |
ORS | 200 GBytes |
There is one final caveat to address. If your files are mostly binary data, compressing them will do little good and could possibly cost more time than would be saved. If this is true of your data, you should probably forego compression, though we still recommend combining multiple files into a single tar file.
2.2. Do not overwhelm the archive system.
Although the archive system provides enormous capacity, it is, in fact, limited in two important ways. The most significant limit is the number of tape drives, which determines the number of tapes that can be read from or written to at once. The second limit is the size of the disk cache, which determines how much data can be online at once.
Attempting to archive or retrieve too many files at once can fill up the disk cache on the archive server, halting archival and staging for all users. Even if the cache does not reach capacity, it could still tie up all available tape drives, impacting other users. To avoid this possibility, if you need to retrieve more than about 10 TBytes of data or more than about 30 files at once, please contact the HPC Help Desk for assistance.
2.3. Do not use files in the archive directly.
This is a common mistake for users who are logged into the archive server directly, or who use an NFS-mounted archive partition. The important thing to realize is that although files "look" like they're on disk, they're actually on tape. Any attempt to use those files (for instance with commands like tar, vi, more, less, or grep) will begin the time-consuming process of retrieving the file from tape. Imagine the result of the following:
zcat *.tar.gz | tar -tv | grep search_term
The intent of this command would be to grep through the content listings of multiple compressed tar files for a search term. On a normal file system, this would be no big deal. But on an archive file system, this would require the retrieval of every one of the compressed tar files (possibly thousands of files), which could potentially overwhelm the disk cache on the archive server. This would be undesirable.
If you find that you have inadvertently done something like this, cancel the command immediately, and contact the HPC Help Desk.
3. Archival from the Command Line (Manual Staging)
3.1. Why might I choose to manually stage my data?
Manual staging is simply staging from the command line without using the transfer queue. For many users, this is the simplest way to do staging because small data sets can usually be transferred while you wait. (Your mileage may vary based on system load.) There are, however, a few things to consider before deciding to stage data manually.
- Check the size of your data first - if your data exceeds 500 GBytes, you may want to consider staging via the transfer queue. See Section 5 for additional details.
- Start with a fresh Kerberos ticket - if your transfer time exceeds the lifetime of your Kerberos ticket, your transfer could fail. To help avoid this, get a new ticket before beginning your transfer.
- Start with a fresh login shell - due to security considerations, your login shell may be automatically terminated after 24 hours. If you start with a fresh shell, your transfer will have a full 24 hours to complete.
- Consider "backgrounding" your transfer - by placing your running transfer
into the background, it will continue to run, even if your shell doesn't.
For example:
nohup archive get myfile.tar.gz &
3.2. Standardized Archive Command
For Hokulea specific archive information, see section 3.3.
The archive command is available on most HPC systems and Utility Servers, allowing you to use the same commands to perform common archival tasks regardless of where you're running or how the local archive server is configured. The archive command can use wild cards when listing, archiving, or retrieving files, and works the same way in transfer queue job scripts as in an interactive login shell. The archive command uses $ARCHIVE_HOME as its default target directory on the archive server, unless an alternative path is specified with the "-C path" option. For operations within $ARCHIVE_HOME, "-C path" may be omitted. For complete information on the archive command see the archive man page on the systems.
Functions covered by the archive command are demonstrated below.
3.2.1. Listing files
To list files on the archive server, use the following command:
archive ls -al [-C path]
3.2.2. Archiving files
To send one or more files to the archive server, use the following command:
archive put [-C path] file1.tar.gz file2.tar.gz ...
3.2.3. Retrieving files
To retrieve a single file from the archive server, use the following command:
archive get [-C path] file1.tar.gz
Multiple files can be retrieved by listing them in sequence or by using wildcards. However, wildcard strings must be enclosed in double quotes, as shown below.
archive get [-C path] "file*"
3.2.4. Making directories
To create a directory on the archive server, use the following command:
archive mkdir [-C path] [-m mode] [-p] dir1 dir2 ...
The "-m mode" option sets permissions on the newly created directory. It is equivalent to executing chmod on the directory using numeric mode specifiers, for instance, "-m 750".
The "-p" option creates necessary intermediate directories in a path if they don't already exist.
3.2.5. Checking server status
Before performing an archive operation, it's always a good idea to check that the archive server is actually up and available. To check the server status, use the following command:
archive stat
3.3. Non-standardized Archival Commands
There are, unfortunately, several functions not currently covered by the standardized archive command. If you need to chmod, rm, or mv a file or directory on the archive server, there's currently no standardized way to do it, so you'll have to rely on methods that may differ from site to site. For the MHPCC DSRC, the following commands are recommended:
3.3.1. Deleting a file
To delete a file on the archive server, use the following command:
rm $ARCHIVE_HOME/file
3.3.2. Deleting a directory
To delete a directory on the archive server, use the following command:
rmdir $ARCHIVE_HOME/directory
3.3.3. Moving or renaming a file or directory
To move or rename a file or directory on the archive server, use the following command:
mv $ARCHIVE_HOME/file $ARCHIVE_HOME/file-new
3.3.4. Changing the permissions of a file or directory
To change the permissions of a file or directory on the archive server, use the following command:
chmod [-R] permission $ARCHIVE_HOME/file
The "-R" option will recursively change the permissions of all matching directories and files beneath the specified directory.
4. Archival in Compute Jobs
Archival and retrieval operations within a batch script running in a compute queue are generally a really bad idea and are strongly discouraged. While your data is being transferred, the cores reserved by your compute job sit idle and are unavailable to other jobs but continue to accrue time, wasting your allocation. In addition, archival access (and possibly even the archive command) is not available from compute queues at all centers, and compute job scripts attempting to perform archival operations may fail.
5. Archival in Transfer Queue Jobs (Batch Staging)
5.1. When should I batch stage my data?
If any of the following apply to you, use batch staging:
- If you don't have time to wait for your data to stage
- If you want to submit a job as soon as the input data is staged
- If you want to archive your data as soon as a job completes
5.2. What is the transfer queue?
The transfer queue is a special-purpose queue for transferring or archiving files. It has access to $HOME, $ARCHIVE_HOME, $WORKDIR, and $CENTER. Jobs running in the transfer queue use non-computational cores and do not accrue time against your allocation.
5.3. Archival Commands
The archival functions listed in Section 3 work the same way in transfer queue jobs as in interactive login shells, so the command examples in Sections 3.2 and 3.3 apply to transfer queue jobs as well. For more information on specific commands, see the associated man pages on the systems. Additional transfer queue examples are also found in the Sample Code Repositories ($SAMPLES_HOME) on the systems.
5.4. Staging in via the transfer queue (Pre-staging)
By pre-staging your data in a transfer queue job, you don't have to sit around and wait for your data to be staged before submitting your computational job. The following standalone script demonstrates retrieval of archived data from the archive server, placing it in a newly created directory in your $WORKDIR, whose name is based on the JOBID. Let's call this a "pre-staging job."
#!/bin/sh #PBS -q transfer #PBS -l select=1:ncpus=1 #PBS -j oe #PBS -A Project_ID # Create a directory for this job in $WORKDIR and cd into it. cd $WORKDIR JOBID=`echo $PBS_JOBID | cut -d . -f 1` mkdir my_job.$JOBID cd my_job.$JOBID # If the archive server is available, get the data. Otherwise, exit. STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l` if [ $STATUS -eq 0 ]; then echo "Archive system not on-line!!" echo "Exiting: `date`" exit fi echo "Archive system is on-line; retrieving job files." archive get my_input_data.tar.gz echo "Input data files retrieved: `date`" echo "Unpacking input tar file" tar xvzf my_input_data.tar.gz echo "Directory contents:" ls
An additional example of this script is also found in the Sample Code Repositories ($SAMPLES_HOME) on the systems.
5.5. Staging out via the transfer queue
The term "staging out" refers to the process of dealing with the data that's left in your $WORKDIR after your computational job completes. This generally entails deletion of unneeded files and archival or transfer of important data, which can be time-consuming. Because of this, users can benefit from using the transfer queue for these activities. (Remember that jobs in the transfer queue do not consume allocation.) The following standalone script demonstrates archival of output data to the archive server via the transfer queue. Let's call this a "stage out job."
#!/bin/sh #PBS -q transfer #PBS -l select=1:ncpus=1 #PBS -j oe #PBS -A Project_ID # cd to wherever your data is located cd $WORKDIR echo "Packing data for archiving:" tar cvzf my_output_data.tar.gz my_output_data echo "Storing data from computation job:`date`" # Check to see if archive server is on-line. If so, run archive task. # If not, say so, and indicate where the output data is stored for later # retrieval. STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l` if [ $STATUS -eq 0 ]; then echo "Archive system not on-line!!" echo "Job data files cannot be stored." echo "Retrieve them in `pwd` in my_output_data.tar" echo "Exiting" echo `date` exit 2 fi JOBID=`echo $PBS_JOBID | cut -d. -f 1` archive mkdir my_job.$JOBID archive put -C my_job.$JOBID my_output_data.tar.gz archive ls my_job.$JOBID date exit
An additional example of this script is also found in the Sample Code Repositories ($SAMPLES_HOME) on the systems.
5.6. Tying it all together
While the previous examples were standalone examples, the following technique creates a 3-step job chain that runs from stage-in to stage-out without any involvement from you. This can be advantageous if your workflow is already well-defined and proven, and does not require you to personally analyze your output prior to staging out.
If, however, your workflow does require an eyes-on analysis of the output data or if it requires post processing prior to analysis, you may want to use the stage out job instead to transfer your data to $CENTER, as demonstrated in Section 5.6.4 (below). You may still submit a transfer queue job later on the Utility Server to archive data that you want to keep.
For the purposes of this demonstration, we'll assume that the following scripts are saved as "prestaging.pbs," "computation.pbs," and "outstaging.pbs," respectively. Additional examples of these scripts are also found in the Sample Code Repositories ($SAMPLES_HOME) on the systems.
Note the use of the $PBS_O_WORKDIR environment variable in scripts 2 and 3 (below). This variable is automatically set to the directory in which qsub is executed in script 1. Scripts 2 and 3 then cd to that directory before launching their jobs.
5.6.1. Script 1 of 3 (Pre-staging)
This script contains the pre-staging job and launches the computation job.
#!/bin/sh #PBS -q transfer #PBS -l select=1:ncpus=1 #PBS -j oe #PBS -A Project_ID # Create a directory for this job in $WORKDIR and cd into it. cd $WORKDIR JOBID=`echo $PBS_JOBID | cut -d . -f 1` mkdir my_job.$JOBID cd my_job.$JOBID # If the archive server is available, get the data. Otherwise, exit. STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l` if [ $STATUS -eq 0 ] ; then echo "Archive system not on-line!!" echo "Exiting: `date`" exit fi echo "Archive system is on-line; retrieving job files." archive get my_input_data.tar.gz echo "Input data files retrieved: `date`" echo "Unpacking input tar file" tar xvzf my_input_data.tar.gz rm my_input_data.tar.gz echo "Directory contents:" ls echo "Submitting computational job" qsub -W depend=afterok:${JOBID} ${WORKDIR}/computation.pbs exit
5.6.2. Script 2 of 3 (Computation)
This script contains the computational job and launches the stage-out job.
#!/bin/sh #PBS -l walltime=00:30:00 #PBS -j oe #PBS -q debug ## The following lines show the PBS select statements for the systems ## at this center. Uncomment the line for the system you're running on. ## For the IBM IDataPlex ##PBS -l select=4:ncpus=16:mpiprocs=16 ## For the CRAY CX40 ##PBS -l select=2:ncpus=32:mpiprocs=32 #PBS -A Project_ID #PBS -r n cd $PBS_O_WORKDIR echo "Executing computation" ## The following lines show launch commands for the systems at this center. ## Uncomment the line for the system you're running on. ## Cray CX40 launch command # aprun -n 64 ./my_executable | tee my_output_data ## IBM IDataPlex launch command # mpirun -n 64 ./my_executable | tee my_output_data echo "Computation finished, submitting job to pack and archive data" COMP_JOB=`echo $PBS_JOBID | cut -d. -f 1` if [ -f ${WORKDIR}/outstaging.pbs ] ; then echo "Submitting archive job to transfer queue: `date`" qsub -W depend=afterok:${COMP_JOB} ${WORKDIR}/outstaging.pbs else echo "Post archival script is missing!!!" echo "Archive step to store data cannot be performed." echo "Exiting." exit 1 fi exit
5.6.3. Script 3 of 3 (Stage out to $ARCHIVE_HOME)
This script contains the out-staging script and is launched by the computation script.
#!/bin/sh #PBS -q transfer #PBS -l select=1:ncpus=1 #PBS -j oe #PBS -A Project_ID # cd $PBS_O_WORKDIR echo "Packing data for archiving:" tar cvzf my_output_data.tar.gz my_output_data echo "Storing data from computation job:`date`" # Check to see if archive server is on-line. If so, run archive task. # If not, say so, and indicate where the output data is stored for later # retrieval. STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l` if [ $STATUS -eq 0 ] ; then echo "Archive system not on-line!!" echo "Job data files cannot be stored." echo "Retrieve them in `pwd` in my_output_data.tar.gz" echo "Exiting" echo `date` exit 2 fi JOBID=`echo $PBS_JOBID | cut -d. -f 1` archive mkdir my_job.$JOBID archive put -C my_job.$JOBID my_output_data.tar.gz archive ls my_job.$JOBID date exit