Simulation Wiki Tutorial - Running Parameter Sweep Applications on Large Clusters
Parameter Sweep Examples
Following is a very simple example of running OpenSeesMP with batchsubmit:
- The problem:
You have 24 ground motions to simulate, and have a “for loop” that goes over each, you could very easily use OpenSeesMP to use 24 processes to solve your problem in less time. Basically, each process with OpenSeesMP will handle 1 ground motion.
- Step 1: Please refer to this OpenSeesMP tutorial to understand how to modify a sequential OpenSees parameter sweep application into a parallel OpenSeesMP application.
- Step 2: Create the batchsubmit command. To submit an “OpenSeesMP” job to a “venue”, you want 24 processes to run the simulation, and you want to copy the entire input directory recursively. The batchsubmit command for running this parallel job will be:
- /apps/bin/batchsubmit —venue venue_name —ncpus 24 —appdir /apps/openseesbuild/osg —rcopyindir OpenSeesMP /apps/demo/largemp/run.tcl
- note that the “—” in this command are double dashes, not single dashes, and there is a space between each option and its parameter (e.g., —venue
In this example, the OpenSeesMP executable in the /apps/openseesbuild/osg/bin directory gets executed on the venue (e.g., Hansen, Kraken, Ranger, Steele) with 24 processes. This is a bare minimum example showing how each of the user specifications (e.g., venue, number of processes, executable name etc.) gets translated to the options of a batchsubmit command.
- The steps:
There are a few steps that a user may need to take to run a parallel simulation on the large clusters.
- Step 1: Select an appropriate venue: —venue steele or —venue hansen or —venue kraken
- Step 2: Select parallel processing by selecting the number of processes to be used with two of the following options:
- —ncpus 16 : use 16 processes to run an application (can be used alone)
- —nn 4 : schedule processes across 4 nodes (needs to be used with either the —ncpus or —ppn option)
- —ppn 4 : schedule 4 processes per node (needs to be used with either the —ncpus or —ppn option)
- Any two of these flags or the —ncpus flag alone can be used to indicate the number of processes with which this application wants to run. The best way is to distribute processes across multiple nodes to better utilize resources.
- Step 3: Select the directory that contains the application executable (in this case, OpenSeesMP) using: —appdir /path/to/dir
- —appdir /apps/openseesbuild/osg for Purdue resources such as steele or hansen
- —appdir /apps/openseesbuild/kraken for XSEDE resource Kraken
- Step 4: Decide whether or not to copy your input directory. Among many options, the most commonly used one is: —rcopyindir. This option does not take any parameter and makes batchsubmit copy the input directory recursively (i.e., all of the files in the input directory are copied with the simulation, this is the option you need if you have input files other than your .tcl files, such as recorded ground motions).
- Step 5: Application name: OpenSeesMP
- Step 6: Input file: the full path to the input .tcl file.
- (Optional) Select an appropriate wall time using: —walltime hh:mm:ss. Skipping this flag sets the default wall time to be 24 hrs (24:00:00). Even though this is not a required option, a shorter walltime helps the resource manager to pick your application for backfilling. However, this also means once your job starts running, it will be killed after the “walltime” amount of time has passed. So only set this option if you are sure that this time will be long enough or if you need more time than allotted by the default walltime.
- Detailed Examples:
The following examples are real examples of a demo application and should run as it is. By changing —ncpus, —walltime, and specifying the complete path to the input tcl file, a user should be able to run his/her own parallel OpenSeesMP simulation without any problem.
- Example 1: Submitting a 24 process OpenSeesMP run on Steele, distributing 6 processes per machine.
- /apps/bin/batchsubmit —venue steele —nn 4 —ppn 6 —appdir /apps/openseesbuild/osg —walltime 04:00:00 —rcopyindir OpenSeesMP /apps/demo/largemp/run.tcl
- Example 2: Submitting a 24 process OpenSeesMP run with the —ncpus option on Hansen:
- /apps/bin/batchsubmit —venue hansen —ncpus 24 —appdir /apps/openseesbuild/osg —walltime 04:00:00 —rcopyindir OpenSeesMP /apps/demo/largemp/run.tcl
- Example 3: Submitting a 40 process OpenSeesMP run on Kraken:
- /apps/bin/batchsubmit —venue kraken —nn 6 —ppn 4 —appdir /apps/openseesbuild/kraken —walltime 04:00:00 —rcopyindir OpenSeesMP /apps/demo/largemp/run.tcl
In all three examples, the —rcopyindir option ensures that the entire /apps/demo/largemp directory is copied recursively (with sub-directories) to the offsite venue.