The (SC)² grid project, SABER, uses grid technology to expand computing capabilities for (SC)² members.
The goal of the project is deployment of a production-quality grid computing platform providing an environment for computational scientists to explore grid computing in their scientific disciplines.
How to Use this Quickstart Guide: This guide is intended to facilitate use of SABER machines and the resources they provide. This is not an exhaustive guide to all the technical aspects of SABER. It is, however, a good starting point. This guide should provide enough basic information to allow the average user to run jobs.
TABLE OF CONTENTSSABER, as a grid resource, is an inherently dynamic "system". The specfic resources currently registered for use in SABER include an Alphaserver ES40 cluster at PSC, Intel-based clusters at WVU and NETL, and a Condor flock, set up on the PCs in the CTC training facility at PSC.
As a SABER user, it is imperative that you stay informed of changes to the SABER environment. Important information is posted to PSC's general bboards, which can be read through various facilities on most PSC systems.
If you run into problems, contact saber-help@psc.edu.
There are two types of grants available: starter grants and production grants. Starter grants are appropriate as precursors to large requests. Production grants are large awards for users with extensive computational requirements.
See http://sc-2.psc.edu/grants/ for further information on submitting a proposal for a grant.
To connect to a SABER system, ssh to ben.psc.edu or energy.cluster.wvu.edu. You can use one of two formats: username@hostname, or hostname -l username.
% ssh joeuser@ben.psc.edu (or) % ssh joeuser@energy.cluster.wvu.edu
or
% ssh ben.psc.edu -l joeuser (or) % ssh energy.cluster.wvu.edu -l joeuser
Use the kpasswd command to change your password on ben.psc.edu or energy.cluster. Changing your password on one machine will not change it on the other.
To use grid functions, you need globus in your path. Type: echo $PATH at the command line. If you see /usr/local/globus/globus-2.4.3/bin and /usr/local/globus/globus-2.4.3/sbin you should be fine.
If not:
% setenv GLOBUS_LOCATION /usr/local/globus/globus-2.4.3 % source $GLOBUS_LOCATION/etc/globus-user-env.csh
or, in a Bourne shell:
% setenv GLOBUS_LOCATION /usr/local/globus/globus-2.4.3 % source $GLOBUS_LOCATION/etc/globus-user-env.sh
To be authenticated, you need 1), a grid proxy created from an X.509 certificate; and 2), to place your Distinguished Name (DN) in the PSC grid mapfiles (only once).
Both ben.psc.edu and energy.cluster.wvu.edu use a kx.509 system along with the existing Kerberos infrastructure in order to issue X.509 user certificates. After obtaining a Kerberos principal using kinit, use the kx509 command to get a short-term X.509 certificate. This short-term certificate is linked to the Kerberos principal (i.e., the X.509 certificate lifetime is limited to the Kerberos ticket lifetime--when your Kerberos ticket dies, so does your grid proxy).
% kinit joeuser@PSC.EDU
joeuser@PSC.EDU's Password:
% kx509
% klist
Credentials cache: FILE:/tmp/krb5cc_00000
Principal: joeuser@PSC.EDU
Issued Expires Principal
Jun 10 14:39:57 Jun 10 21:19:10 krbtgt/PSC.EDU@PSC.EDU
Jun 10 14:39:57 Jun 10 21:19:10 afs@PSC.EDU
Jun 10 14:40:12 Jun 10 21:19:10 kca_service/pscuxb.psc.edu@PSC.EDU
Jun 10 14:40:12 Jun 10 21:19:10 kca_service/gridinfo.psc.edu@PSC.EDU
Jun 10 14:40:13 Jun 10 21:19:10 kx509/certificate@PSC.EDU
To create your proxy from the X.509 certificate, issue the kxlist -p command. You can then examine or delete your grid proxy with grid-proxy-info and grid-proxy-destroy, respectively.
% kxlist -p Service kx509/certificate issuer= /C=US/O=Pittsburgh Supercomputing Center/CN=PSC Kerberos Certification Authority subject= /C=US/O=Pittsburgh Supercomputing Center/OU=PSC Kerberos Certification Authority/CN=joeuser/UID=joeuser/emailAddress=joeuser@PSC.EDU serial=08AA hash=67d8fd3f
You will need to put your DN in the PSC mapfile (only once) by visiting this website. It will ask for your DN, which you can find with the grid-proxy-info -subject command. The entire output of the grid-proxy-info -subject command is your DN.
% grid-proxy-info -subject /C=US/O=Pittsburgh Supercomputing Center/OU=PSC Kerberos Certification Authority/CN=joeuser/USERID=joeuser/Email=joeuser@PSC.EDU
Use the grid-proxy-info command to view details about your proxy. Use the -help flag to view options which allow you to specify what information you want to retrieve when using the grid-proxy-info command.
% grid-proxy-info subject : /C=US/O=Pittsburgh Supercomputing Center/OU=PSC Kerberos Certification Authority/CN=joeuser/USERID=joeuser/Email=joeuser@PSC.EDU issuer : /C=US/O=Pittsburgh Supercomputing Center/CN=PSC Kerberos Certification Authority identity : /C=US/O=Pittsburgh Supercomputing Center/OU=PSC Kerberos Certification Authority/CN=joeuser/USERID=joeuser/Email=joeuser@PSC.EDU type : end entity credential strength : 512 bits path : /tmp/x509up_u20024 timeleft : 5:56:46
Use the grid-proxy-destroy command to kill your grid proxy. If you want to verify that it has been destroyed, type grid-proxy-info. If it has been killed, you will get an error stating that the proxy could not be found.
We recommend transferring files using scp, gsiscp, or GridFTP's client program globus-url-copy.
scp is the easiest way to copy small, single files to the SABER platforms. It uses public key encryption to authenticate and encrypt communications. gsiscp is a version of scp based instead on the 'Grid Security Infrastructure' (GSI). This method relies on having a valid X.509 certificate.
While scp and gsiscp are easy to use, they provide very poor performance when transferring multiple files or large files. GridFTP-based programs should be used for most TeraGrid file transfers.
The syntax of gsiscp is the same as with scp. The remote file specification can be one of two forms: either user@system:/path-to-file or system:/path-to-file. If the user is not given as part of the remote file specification, the username on the local system is assumed.
To copy myprog.f to joeuser's src directory on energy.cluster:
gsiscp myprog.f joeuser@energy.cluster:/~joeuser/src/myprog.f
To copy everything from joeuser's local directory to his home directory on ben:
% gsiscp * ben.psc.edu:
scp can be used in place of gsiscp in the above example. scp and gsiscp will work well for the majority of small file transfers.
GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth, wide-area networks. GridFTP is based on FTP, the popular internet file transfer protocol.
GridFTP provides the following protocol features:
The source and destination arguments may be URLs, local files, or standard input/output. The following table shows some acceptable values for these arguments:
| Source/Destination | Description |
|---|---|
| file:<fullpath> file://<hostfullpath> |
For local files -- relative pathnames not allowed |
| gsiftp://<hostpath> | For remote files -- relative paths allowed |
| http://<path to file> https://<path to file> |
For accessing web files -- relative paths allowed |
| - (dash) | Standard input and output |
The globus-url-copy client program is a GridFTP client that may be used to transfer files from the command line. The format is:
% globus-url-copy source_url destination_url
The format for the local filename is file:// followed by the absolute file name. For example, user joeuser would refer to file myfile.txt in his home directory (/home/joeuser) as:
file:///home/joeuser/myfile.txt
Note that three slashes are necessary. The first two are part of file:// and the third is the beginning of the full file specification /home/joeuser/myfile.txt.
The format for the remote file is gsiftp://machine-name followed by the absolute file name. For example, if user joeuser's home directory on energy.cluster.wvu.edu was /home/joeuser/, then he would specify:
gsiftp://energy.cluster.wvu.edu/home/joeuser/myfile.txt
for the file myfile.txt in /home/joeuser.
The Globus toolkit provides commands for submitting and managing grid jobs.
Executes one command on a remote system and returns. It is similar in function to rsh. The format for globus-job-run is:
% globus-job-run remote-host[/jobmanager] command
The command must be given with the full path, e.g., /bin/ls instead of just ls. For example, to run uname on ben, use:
% globus-job-run ben.psc.edu/jobmanager-pbs /bin/uname -a
You can stage in a file to execute with globus-job-run using the -s flag. If you have a script "myscript" which contains:
#!/bin/csh /bin/chmod u+x /home/johndoe/a.out /home/johndoe/a.out
then you can use
% globus-job-run ben.psc.edu/jobmanager-pbs -s myscript
to run the executable a.out from ben.psc.edu. Note that absolute paths to all commands are required (e.g., /bin/chmod, /home/johndoe/a.out). You can get more information about how to use globus-job-run by invoking it with the -help switch:
% globus-job-run -help
Allows you to submit a remote job on a remote machine. This command runs the job in the background--once the job has been submitted, a contact string (used to submit commands to monitor the job) is returned, the connection to the remote host closes, and control is returned to the user. This command is useful when submitting large batch jobs, so you can exit the system and return later to query for the job status. Note that the executable must already be on the remote machine. Unlike globus-job-run, globus-job-submit has no staging function.
The format is:
% globus-job-submit remote-machine executable-command
Again, you must supply the full path to the executable command. globus-job-submit returns a full contact string, which looks like:
https://energy.cluster.wvu.edu:nnnnn/nnnn/nnnnnnnnnn/
or
https://ben.psc.edu:nnnnn/nnnn/nnnnnnnnnn/
This contact string is used as the argument in the commands to query the status of the job (globus-job-status), to retrieve the job output (globus-job-get-output), and to stop a running job and delete the remote copy of the output (globus-job-clean).
Returns the status as PENDING, ACTIVE, DONE or FAILED.
% globus-job-status contact-string
Retrieves the output from the remote machine.
% globus-job-get-output contact-string
Job output is returned to stdout on the local machine. You can get the output multiple times, until it is deleted using globus-job-clean. Redirection can be used to save the output to a file.
Stops a job if it is still running, and deletes the remote copy of the output:
% globus-job-clean contact-string
globus-job-submit (for batch jobs primarily) can be used almost interchangeably in the examples below.
File simple.sh is on remotehost. Job is run from localhost, and output is echoed to the screen.
remotehost% cat /home/joeuser/simple.sh #!/bin/sh echo "I'm a simple test script" remotehost% exit localhost% globus-job-run remotehost /home/joeuser/simple.sh I'm a simple test script localhost%
File local.sh exists on localhost. It is staged over to remotehost for execution, and the output is echoed to the screen.
localhost% cat /home/joeuser/local.sh #!/bin/sh echo "I'm a local script" localhost% ssh remotehost remotehost% cat /home/joeuser/local.sh cat: /home/joeuser/local.sh: No such file or directory remotehost% exit localhost% globus-job-run remotehost -s /home/joeuser/local.sh I'm a local script
File simple.sh exists on remote host. The job is submitted from localhost, and the output is saved on remotehost as remote.out with the -stdout switch.
localhost% globus-job-run remotehost -stdout /home/joeuser/simple.out /home/joeuser/simple.sh localhost% cat /home/joeuser/simple.out cat: /home/joeuser/simple.out: No such file or directory localhost% ssh remotehost remotehost% cat /home/joeuser/simple.out I'm a simple test script
File simple.sh is on remotehost. Job is submitted from localhost. The output is written to remote.out, then staged back to localhost with the -stdout -s switches. File remote.out is not saved on remotehost.
localhost% globus-job-run remotehost -stdout -s remote.out /home/joeuser/simple.sh localhost% cat /home/joeuser/remote.out I'm a simple test script localhost% ssh remotehost remotehost% cat /home/joeuser/remote.out cat: /home/joeuser/remote.out: No such file or directory
File simple.c is on remotehost. It is compiled there, and then globus-job-run is submitted from localhost. Note the use of the -x '(jobtype=mpi)' switch to declare that this is an MPI job, and the use of -np to request the number of processors to use. The output is echoed back to the screen.
remotehost% cat /home/joeuser/simple.c
#include <stdio.h>
#include "mpi.h"
int main(int argc, char *argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("Hello world! I'm %d of %d\n", rank, size);
MPI_Finalize();
return 0;
}
remotehost% mpicc -o /home/joeuser/simple.x /home/joeuser/simple.c
remotehost% exit
localhost% globus-job-run remotehost -x '(jobtype=mpi)' -np 5 /home/joeuser/simple.x
Hello world! I'm 4 of 5
Hello world! I'm 3 of 5
Hello world! I'm 2 of 5
Hello world! I'm 1 of 5
Hello world! I'm 0 of 5
© Pittsburgh Supercomputing Center (PSC),
SuperComputing Science Consortium.
URL: http://sc-2.psc.edu/saber/index.html
Revised: Friday, 18-Jan-2008 12:02:36 EST