DOT User's Guide |
Version 1.0alpha |
Released 12/17/1998 |
Provided with DOT are a number of launch scripts that will initiate a DOT run. You will find these scripts in dot/bin.
The launch scripts all require that DOT_ROOT is properly set in ~/.cshrc.
Everything you need to run DOT on SGI, DEC, or Sun workstations is supplied with DOT. To run on the other supported systems you will need to install a version of MPI appropriate for that system.
The example commands below all show how to run the example supplied with DOT in the dot/example1 subdirectory. The runs all use four processors and use example1.com as the command file. The first four examples are for multi-processor machines, and the final example explains how to run DOT on a network of workstations using MPICH.
cd $DOT_ROOT/example1 $DOT_ROOT/bin/dotlaunch.dec.vendormpi example1.com 4
cd $DOT_ROOT/example1 $DOT_ROOT/bin/dotlaunch.sun.vendormpi example1.com 4
cd $DOT_ROOT/example1 $DOT_ROOT/bin/dotlaunch.t3e.vendormpi example1.com 4
cd $DOT_ROOT/example1 $DOT_ROOT/bin/dotlaunch.sp2.vendormpi example1.com 4
To run in loadleveler, use the following in your loadleveler script:
cd $DOT_ROOT/example1 $DOT_ROOT/bin/exec/dot.sp2.vendormpi example1.com
Launch scripts are provided for MPICH on most types of systems:
You may use computers of different types for a single run. However, with current the version of MPICH, DOT will not work when Sun computers and systems of other types are used in a single run.
The example below shows how to run DOT on three SGI systems named sgi1, sgi2, sgi3, and one DEC Alpha system named alpha1. MPICH requires that you can connect to all the hosts with rsh (see the appendix for information on how to use a ~/.rhosts file.)
rlogin sgi1 cd $DOT_ROOT/example1 $DOT_ROOT/bin/dotchoosesystems.mpich sgi1 sgi2 sgi3 alpha1 > my_hosts $DOT_ROOT/bin/dotlaunch.sgi.mpich example1.com my_hosts
The hosts file file, called my_hosts in the example above, will list the host names and the paths to the appropriate executables:
sgi1 0 /usr/local/dot/bin/exec/dot.sgi.mpich sgi2 1 /usr/local/dot/bin/exec/dot.sgi.mpich sgi3 1 /usr/local/dot/bin/exec/dot.sgi.mpich alpha1 1 /usr/local/dot/bin/exec/dot.dec.mpich
You must launch your DOT run from the first host listed in your dotchoosesystems.mpich command (sgi1 in the example above).
The number following the machine name is an ordering of machines for a multi-processor run. Processor 0 is your "main machine," the one you are logged into, and the one from which your dotlaunch command must be issued. In parallel computing terms, this is called a "master." Processors 1-3 are "slave" machines which are assigned work by the master. Each processor generates and individual .log file. For this example, these files are named:ac10_hs_mache_fas_p2__2A_0bumps_tst.log0.000 ac10_hs_mache_fas_p2__2A_0bumps_tst.log0.001 ac10_hs_mache_fas_p2__2A_0bumps_tst.log0.002 ac10_hs_mache_fas_p2__2A_0bumps_tst.log0.003with the obvious convention that the number at the end of the name refers to the processor number as assigned in the my_hosts file.