It is currently configured to run for the Condor system at CMSLPC CAF and CERN's LSF batch system. It is very easy to configure to run on other condor systems and fairly easy to modify for most other batch systems. Please email me at cplager+cmshelp@fnal.gov if you are interested.
Important: At the LPC at FNAL, the CAF is a very powerful and useful tool. It is also, unfortunately, relatively easy to do "bad" things that will effect all of the computers at the LPC. To avoid this:
At CERN, these are not a concern and it is allowed to read to and write from the /afs disk space.
If you are running on cmslpc machines, I recommend using the scripts in my area (~cplager/bin/runmanySections.py). If you are running elsewhere or prefer your own copy of the scripts, you can grab the two needed files with the commands below:
wget http://home.fnal.gov/~cplager/log/RunMany/runManySections.py wget http://home.fnal.gov/~cplager/log/RunMany/runMany.bashNote that:
The idea is to have a file that lists all jobs that you want run. To submit the jobs:
~cplager/bin/runManySections.py --submitCondor myJobs.cmd
~cplager/bin/runManySections.py --submitLsf --lsfOptions "-q 8nh" myJobs.cmdwhere myJobs.cmd file has a header that sets up everything and a body that lists commands (and "-q 8nh" tells LSF I want to use the 8 natural hour queue)x.
(I highly recommend Debugging Jobs Locally before submitting them to the batch system).
Below I show how to write one of these files "by hand." If you already have a list of jobs that you want run, you should consider Using runManySections.py to Create Command File below.
# -*- sh -*- # for font lock mode ###################### ## Setup Everything ## ###################### # How the environment be setup - env = cd /uscms/home/cplager/work/cmssw/CMSSW_3_5_7; . /uscmst1/prod/sw/cms/bashrc prod; eval `scramv1 runtime -sh`; cd - ############## ## Commands ## ############## # logFileName Command out1.log myFirstCommand name1.output out2.log mySecondCommand name2.outputThis above file:
All of the output files (e.g., out1.log, out2.log, name1.output, and , name2.output) will all be returned to the directory where you were when you called runManySections.py.
Important: At FNAL, it is ok to read a few small files from your home area on the batch system. Please:
At CERN, while the same things are still recommended, they are not required.
Although not required, I highly recommend setting up your jobs so that the output file names (i.e., the output of your job as well as the log files) contain information about with CAF job and section they are run in. The reasons are:
To do this is quite simple, as the environment variable $JID contains the necessary information. In the following example, myFirstCommand takes the only argument as the output filename. So instead of the line from above:
# logFileName Command out1.log myFirstCommand name1.outputWe should write:
# logFileName Command out1_$(JID).log myFirstCommand name1_$(JID).outputThis will cause the log file name to be something like output_JID_6123_1.log and the output name to be name1_6123_1.output.
Note that you can access any environment variable this way $(AnyEnvironmentVariable).
If you would like to include a tarball that will be automatically untarred when running your jobs, you can add a line to your command file
- tarfile = myTarBall.tgz
The tarball will be untarred by default in a directory called tardir/. The tarball will be untarred before the environment is setup. This can be a nice way of setting up your environment. If include a script setupMyEnvironment.bash, then your commands file could just contain the line.
- env = . tardir/setupMyEnvironment.bash
where . tardir/setupMyEnvironment.bash is the bash equivalent to tcsh's source tardir/setupMyEnvironment.tcsh.
Note: Any files that are in the working directory on the condor system are copied back to your home area. Any files that are in a subdirectory (e.g., tardir) are not. This is why files are not untarred into the main directory.
This assumes you have a gzipped tarball. E.g.,
tar czvf myTarBall firstFile.config secondFile.config thirdfile fourthfile
You can use the script itself to help generate the command file. Start with just a simple list of commands you wish to run:
cplager@cmslpc16> cat commands.listOfJobs root -l -b -q -n tardir/runSilly.C("output_$(JID).root", 1) root -l -b -q -n tardir/runSilly.C("output_$(JID).root", 2) root -l -b -q -n tardir/runSilly.C("output_$(JID).root", 3) root -l -b -q -n tardir/runSilly.C("output_$(JID).root", 4) root -l -b -q -n tardir/runSilly.C("output_$(JID).root", 5)Now we can use this file and have the script setup most or all of the other details:
~cplager/bin/development/runManySections.py --createCommandFile --cmssw --addLog --setTarball=tarball.tgz \ commands.listOfJobs commands.cmd
Except for the --createCommandFile above, all of the other options are optional. You can also use:
Here is an example of how to do it in the situation where you have a file macro.cc which contains a void macro (TString outputname, int mode). Here is a silly example:
#include <iostream> #include "TString.h" void silly (TString name, int mode) { std::cout << "Hi " << name << ", " << mode << std::endl; }Now compile this script inside of root:
cplager@cmslpc16> root -l root [0] .L silly.C+ Info in <TUnixSystem::ACLiC>: creating shared library /uscms_data/d2/cplager/shabnam/./silly_C.so root [1]Now we need a macro that will load silly and run it:
void runSilly (TString name, int mode) { gSystem->Load("tardir/silly_C.so"); silly(name, mode); }
Create a tarball containing the shared object library and the load script:
tar czvf tarball.tgz silly_C.so runSilly.CThe command used to run this looks like:
root -l -b -q -n tardir/runSilly.C("output_$(JID).root", 1)See Using runManySections.py to Create Command File above for a detailed example of submitting this to the queue.
Important: I recommend logging out and back in to the LPC and NOT setting up any CMSSW or any other environment.
Before submitting jobs to the queue, it is a good idea to run (at least a short) one locally. To do this, we use the --testSection and --runTest flags.
cplager@cmslpc16> ~cplager/bin/runManySections.py --testSection=2 commands.cmd /uscms/home/cplager/bin/runMany.bash /uscms/home/cplager/bin /uscms_data/d2/cplager/shabnam/commands.cmd 2 123
cplager@cmslpc16> ~cplager/bin/runManySections.py --testSection=2 --runTest commands.cmd
This system has been designed with Condor in mind and then adapted to LSF. Here are some of the differences of which you should be aware.
If you do not want this feature, please use --noLsfCopy option.