OSU Logo

Cluster Use

Printer-friendly versionPrinter-friendly version

Using the OSU Physics Labs as a Beowulf Cluster for long running serial or parallel programs


The computers in Weniger 497 and 412 have now been set up to act as a Beowulf cluster.  These machines are currently 34 (eventually 35) Dell Optiplex GX620's with Intel Pentium D 830 (3.0 GHz) processors, and 1 GB of RAM running Suse Linux 10.1 64 bit Operating System.  These computers are also loaded with the Intel Compiler Suite which comprises a C, C++ and Fortran compilers, as well as the Math Kernel Library (cluster edition) and MPI libraries.  The MPI libraries are what is required for the cluster to be able to run parallel jobs.  The cluster also utilizes a program called Torque which acts as a resource manager for the cluster.  As of 4/2/07, we now use Maui for a scheduler for the system.  This program will take the programs that you submit and decide what computer(s) to run on based on current load.  Torque is based on PBS and uses all the same commands, but is open sourced.

Please direct any questions or comments to elserj@phsyics.oregonstate.edu

Running serial jobs with Torque

A serial job is one that does not need to run on multiple machines.  This includes all java, C, C++, Fortran files not compiled with an mpi compiler, and shell scripts.

The basic command to "submit" jobs to Torque is qsub.  qsub is short for "queue submit".  In essence, what you are doing is submitting a job to the queue where Torque will decide what to do with the program. 

Very important thing to note, qsub will not accept binary (compiled) programs directly.  This means that this process will fail:
> icc program.c -o program   (icc is the intel C compiler, -o tells the compiler to name the executable program program
> qsub program            ( try to submit program)
qsub:  file must be an ascii script         (result)

For security reasons, qsub will only accept shell scripts (note that the schell script does NOT need to be executable, although being so will not affect anything).
Here is a sample script for running serial jobs:(copy and paste between the lines)

#PBS -l walltime=00:15:00
#PBS -l nice=19

#!/bin/bash is required at the beginning of any shell script, or at least any shell script that runs in bash (you could also specify tcsh, ksh, etc...)

#PBS -l walltime=00:15:00 tells the scheduler to only allow 15 minutes for this job. Note that if you require longer, you can change this. Also note that the default is for 2 days if it is not set here. This means that your program will be killed if it taked longer than the walltime. This is to keep bad programs from running forever. See the parallel section for more info. If you would like to run longer, please contact justin by email at elserj@physics.oregonstate.edu.

#PBS -l nice=19 ensures that your job will run in the background so that it won't interfere with any other programs being used.  This is need since the computers we are using are public, ie. anybody can sit down and use one of them.  We don't want to interfere with normal class use.  More information about "nice" can be found by doing man nice.

cd $PBS_O_WORKDIR tells the script to switch to the current directory. If this is not used, an absolute path to the program must be called.

./program tells the script to execute the file program in the current directory.  ../program would tell it to look one directory up for current working directory.


You can run java jobs by using the following command instead of ./program (must be compiled beforehand)

java java_program




It is also possible to run Mathematica jobs on the cluster.  This is very useful for long running jobs.  The steps are as follows:

If you already have a notebook that you would like to run:

Open your notebook in Mathematica,

Select Kernel -> Delete All Output

Select Edit -> Select All

Select Cell -> Cell Properties -> Initialization Cell

Select File -> Save As Special -> Package Format

Save the file somewhere, in this example, I will call it math.m.

Add the following as the first line in your newly saved file:


AppendTo[$Echo, "stdout"]

(This will allow you to see your input commands in the output file) 

Change the ./program line to be

math < math.m > results.out

This will place the output from your notebook in the file results.out.  If your notebook exports to a file, this should work as normal (not tested yet).


The following site has more details on PBS environment variables available, although PBS_O_WORKDIR should be the only one you need for single processor jobs:

As you can see from above serial scripts are very simple, although they can be made more complex if desired.

The nicest feature of running programs with this method is that while your job is running, you can log out and your job will still run.  This means that you can ssh to any one of the machines, compile your program, submit it to the queue, and then log out.  Your job will run until finished or killed and you do not have to lock a workstation or have anyone even know your progrram is running unless they check the queue.

Which leads to how to check on your jobs and where does the output go.  You check on jobs by running the command qstat.
> qstat
Job id                      Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
170.physics-server  run_mpi.sh       justin                 0     R batch

This tells me my program has ID 170, I am running run_mpi.sh as user justin, the program is in state R (running) (Q is queued, waiting for computers, E is exiting, either with errors or without), and is running in queue batch.  Note that batch is the default queue that all jobs will run in.

If your program is written to output to a file, it will still output to that file.  However, if your program is set up to output to stdout (the terminal, console, whatever you want to call it), the output will be redirected to a file named script.oJobID and any errors will be redirected to script.eJobID, where script is the script you used to submit the job, and JobID is the ID given to it by Torque.  The JobID can be determined from qstat, or when you submit a job, it will tell you the JobID.  Note that the JobID will be a number, followed by physics-server.  This is simply because the queue is on the server, all that is important is the number.  The files will look like this:

mpitest@wngr412-pc01:~/mpi> ll
total 32
-rwxr-xr-x 1 mpitest users 17702 2006-11-15 16:38 mpipi
-rw-r--r-- 1 mpitest users  1400 2006-11-15 16:38 mpipi.c
-rw-r--r-- 1 mpitest users   585 2006-11-15 16:39 run_mpi.sh
-rw------- 1 mpitest users     0 2006-11-15 16:39 run_mpi.sh.e164
-rw------- 1 mpitest users   327 2006-11-15 16:39 run_mpi.sh.o164

mpipi is the program being run, mpipi.c is the source code, run_mpi.sh is the script used to submit the job, and run_mpi.sh.e164 and run_mpi.sh.o164 are the error and output files from job 164.

Note that the above is for a mpi (parallel) program, but the basic file structure is the same.

You can kill current jobs by using the qdel command.  Use this command if your program is taking way longer than it should or you need to run it with a different version of your program.
> qdel 170
would have killed the above job shown in qstat if it was still running.  Note that you can only kill your own jobs, not someone else's, although you will be able to see other peoples jobs in the queue.

Running mpi (parallel) jobs with Torque:

This is more complicated in that the script used must have certain commands in it.  The first thing that must be done is to create an mpd secretword.  This secretword is like a password with its main function being used to discriminate jobs started by you from those started by someone else.  To do this follow the below commands:
> cd $HOME
> echo "MPD_SECRETWORD=secretword" >> .mpd.conf      (replace secretword with your own secretword, NOT your password.  You don't really need to remember this, it is used "behind the scenes".)
> chmod 600 .mpd.conf

These commands do the following:
make sure you are in your home directory,
place the text following the echo in quotes in a file named .mpd.conf,
change the permissions on the file .mpd.conf so that no one else can read the file but you.  See here for more info on the chmod command.

It is beyond the scope of this document to describe programming practices for MPI, merely implementation.  However, there is a fairly user friendly "User's Guide to MPI" (postscript) that is available and is recommended reading on the subject.

You can compile your MPI programs with one of the following compilers available on the OSU Physics cluster:
mpicc      MPI wrappers for the gcc 4.1.0 compiler
mpiicc     MPI wrappers for the Intel 9.1 compiler
mpif77    MPI wrappers for the gcc Fortran 77 3.3.5 compiler
mpif90    MPI wrappers for the gcc Fortran 90 4.1.0 compiler
mpiifort   MPI wrappers for the Intel Fortran 9.1 compiler (fortran 90)

I have not tested performance differences between the various compilers.

Here is a sample script used for mpi jobs, with a description following of each line:

# All lines starting with "#PBS" are PBS commands
# Request 2 nodes with 2 processor per node (equals 4 processors)
# ppn can either be 1 or 2
#PBS -l nodes=2:ppn=2
# Set wall clock time to 0 hours, 15 minutes and 0 seconds
#PBS -l walltime=00:15:00

# Set the nice value to 19 so that it doesn't interfere with locally running programs
#PBS -l nice=19

# cd to working directory
# name of executable

# The following checks how many nodes were requested,
# and sets the NP variable to (nodes * ppn) from above
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')

# Number of processors is $NP

# Run MYPROG with appropriate mpirun script
mpirun -r ssh -n $NP $myprog

# make sure to exit the script, else job won't finish properly
exit 0

Here is the mpipi program code I used (courtesy of Rubin Landau) ((Right click, Save As))

Again, the script must start with the line #!/bin/bash.  Note that lines beginning with #PBS are commands to the Torque scheduler, not comments.

Also note that the script must still be submitted using the program qsub:
mpitest@wngr412-pc01:~/mpi> qsub run_mpi.sh

Again, the following site has more details on #PBS commands available:

The line
#PBS -l nodes=2:ppn=2 tells Torque to use 2 computers with 2 processors per node for a total of 4 processors.  All of our machines are Dual Core machines, which means that they each have two processors.  If you want to only use one processor per machine, change to ppn=1.  ppn stands for processor per node. 

The line
#PBS -l walltime=00:15:00 tells Torque to kill the job if it takes longer than 15 minutes to run.  Note that this is actual run time, not total time in the queue, meaning that if for some reason your job doesn't start right away, this delay does not count against you. 
In general it is a good idea to use a walltime kill command in case your program is poorly implemented or stuck in a loop.  You should set the walltime to be about twice the time you expect it to run.  For short jobs, a limit of 15 minutes if fine.  However, if you expect your job to run for several days, you may remove this line.  Note that if you have such jobs, let me know so that I don't think they are run away jobs to be killed.

#PBS -l nice=19 tells the computer to give your program a nice value of 19.  This makes sure that it runs in the background.  See serial description above for more info.

$PBS_O_WORKDIR is the directory that you are executing the qsub command from.  If this line is left out, the following line with the name of your program will have to be an absolute path, such as:

The total number of processors is named with the variable NP.  Note that this value cannot exceed that given by nodes * ppn, although it can be smaller. In the above script, this is done automatically for the max value by parsing the file $PBS_NODEFILE. This file gives each node being set aside by Torque on a line. The command wc -l counts the number of lines, then this result is passed to the program awk which converts it to a format usable by the mpirun program.

The next line is the one that actually runs your program in the MPI environment.  mpirun is the command used to start the MPI environment, -r ssh is required for the machines to be able to communicate with each other using ssh and scp rather than rsh and rcp.  rsh and rcp is quite a bit less secure and so is not enabled on the cluster.  All communication must be done via ssh or scp.  -n $NP calls the NP variable giving the number of processors to use, and $myprog is the program you compiled.

You must also give the script a command to exit with status 0, or the job might have problems writing the output file.

mpipi-c.txt1.38 KB