NEES - Running OpenSees on OSG Computers
As of July 20th 2011, NEEShub offers the capability to run jobs on the Open Science Grid (OSG) computing infrastructure. The Open Science Grid is a consortium of about 80 universities and national laboratories that work together to form a national cyberinfrastructure. Resources belonging to the OSG institutions are shared across the consortium.
Using OSG computers should allow running many simultaneous jobs without much affecting other NEEShub users. The disadvantage is that if there are just a few jobs they can be slower to start on OSG than on the dedicated NEEShub machines.
In order for jobs to run on OSG they must currently be explicitly submitted to OSG. There are two ways to do this, using a command from the workspace application or a modified graphical interface using Rappture. Examples of how to use commands in a workspace are given below in section 1. Section 2 describes how a tool owner can modify a Rappture application to provide a selection field for running simulation on OSG.
Both methods use newly created run directories located in your home directory under $HOME/scratch. Data management details are discussed in Section 3.
To use the workspace tool, you must request access to this tool. If you do not already have access, please open a ticket and ask to be added to the workspace tool.
After starting a NEEShub workspace a user can run a script called osg_opensees which does job submission to OSG.
Here is an example of a simple application that comes with opensees2:
$ /apps/bin/osg_opensees /apps/opensees2/current/examples/sine_no_gui/sine_no_gui.tcl osg_opensees START: Transfer directory = /home/neeshub/mslyz/scratch/run00012 WARNING: 2 files copied from /apps/opensees2/current/examples/sine_no_gui to /home/neeshub/mslyz/scratch/run00012 including sine_no_gui.tcl submit -v OSGFactory -i /home/neeshub/mslyz/scratch/run00012 /home/neeshub/mslyz/scratch/run00012/cmdfile (179.0) Job Submitted at OSGFactory Tue Jul 19 14:48:08 2011 (179.0) Simulation Idle at OSGFactory Tue Jul 19 14:49:13 2011 (179.0) Simulation Running at Fermigridosg1 Tue Jul 19 14:54:49 2011 (179.0) Simulation Done at Fermigridosg1 Tue Jul 19 14:55:44 2011 /apps/bin/osg_opensees: DONE! Results are in /home/neeshub/mslyz/scratch/run00012/@RUNDIR
When the script is done, the output data appears inside a numbered directory in $HOME/scratch
This first example uses osg_opensees and that script will not return until the job is complete. It is often desirable to submit many jobs without waiting for them to finish. An alternative script called osg_opensees_nowait does exactly this. It is also terse in that it only prints the directory name where the results will go. Wrapper scripts can use this feature to generate and process many runs. The command osg_opensees_nowait also differs from osg_opensees in two other ways:
Since osg_opensees_nowait returns before the job is complete, there is no obvious way to know if the job is complete other than looking into the output directory. Here is an example of using osg_opensees_nowait
$ /apps/bin/osg_opensees_nowait /apps/opensees2/current/examples/sine_no_gui/sine_no_gui.tcl /home/neeshub/mslyz/scratch/run00013/sine_no_gui
Even though it appears that osg_opensees_nowait finishes, it is very important to NOT terminate the workspace session until the background jobs have completed. If you terminate the workspace session the submit monitor will be terminated and the results of the job will not be returned to $HOME/scratch.
Rappture tool owners can insert a wedge into their application to allow the user to select local execution as the default or switch to OSG. This involves two changes to your Rappture tool.xml file. 1. Adding a single field to your Rappture UI called runmethod and changing the executed command to runmethod.tcl whose arguments will be the command you really wanted to execute and your command arguments. Essentially, slide your command to the right and insert “tclsh /apps/neesutils/bin/runmethod.tcl” on the left. The routine runmethod.tcl will read the value from the new field and then either call your command directly or call the command osg_opensees as discussed in section 1. We call this a wedge because it allows your application with Rappture interface to run unmodified in foreground or batch. This is possible because Rappture encapsulates all input and output through driver and run files respectively.
To change the command line in tool.xml. Shift everthing to the right and insert “tclsh /apps/neesutils/bin/runmethod.tcl”. Here is an example of how the command section will now look.
<command> tclsh /apps/neesutils/bin/runmethod.tcl /apps/openseesbuild/current/bin/OpenSees @tool/sine.tcl @driver 2>stderr.out </command>
Here is the description of the runmethod field that you also must add to your tool.xml wherever you feel is the best location.
<choice id="runmethod"> <about><label>Run Method</label> <description>Choose what resources will be used to execute your simulation</description> </about> <option><about><label>Local HUB</label></about> <value>HUB</value> </option> <option><about><label>Open Science Grid</label></about> <value>OSG</value> </option> <default>Local HUB</default> </choice>
* There are several different ways to transfer data to and from a NEEShub account. If the default 1GB per-account limit is too small, up to 10GB can be available on request.
* osg_opensees automatically transfers to OSG computers any files in the same directory as the OpenSees tcl command script (i.e. sine_no_gui.tcl in the example above), although it does not transfer subdirectories.
* In each job’s environment the JOBCOUNT variable is set to a unique number and can be used to select a different input data set for each run of OpenSees.
* Some of the limitations of the current OSG setup are:
- MPI is not supported.
- There is usually no more than 10GB of disk space available for each individual job.
- There is typically only 1 to 2 GB of RAM available per job.
- Keeping run times to less than 12 hours is recommended. Longer jobs, especially past 16 hours, are likely to not finish successfully.