U.S. CMS
Search
uscms.org  www 

User Software and Computing

Getting Started: Guideline for users

USCMS computing facility at Fermilab


USCMS is the largest dCache installation in the world. We provide access to over 6 PB of disk space from almost 10k processes (both batch and interactive) with remarkable performance and stability. At this scale, users need to follow some rules to be able to provide the highest possible performance to all processes. Please help keeping FNAL stable and performant so that everybody can produce physics results as quickly and reliably as possible.

1. Interactive nodes

- The basic purpose of interactive nodes is to provide a platform to develop and debug analysis code. Once ready, running it on a big scale must be performed on the batch nodes. Jobs running interactively usually have a negative impact on the performance of interactive nodes and affect many users at once, for this reason, users should avoid this practice. Taking up multiple cores in your interactive jobs will slow yourself and others down. Jobs running this way may be removed by the administrators.

- No strict policing is foreseen for now but that might change in the future depending on everyone's behavior.

- Forking jobs to run outside of your interactive shell is never a good idea on the interactive nodes. It affects everyone trying to work on these multi-user systems. Please don't do this.

2. Batch jobs:

- Use CRAB instead of writing your own data handling system since CRAB has solved all problems you could encounter, both for file handling and data access. Additionally CRAB has access to more CPU than you do at the FNAL batch system.
- Guidelines of how to structure a condor batch job at the CMS LPC are here, including how to not put stress on the filesystems.

3. File access in general:
- Please use the CMSSW software releases installed centrally on CVMFS and don’t install CMSSW yourself.

- Follow guidlines here for what not to do on the EOS Storage Element filesystem.

- Please help us keeping the FNAL facility running at this high scale efficiently and performant. If in doubt about the kind of work your trying to do outside the established workflows (for example CRAB) at FNAL, please contact LPC computing support to ask for guidance and help.

- The FNAL facility team is very motivated and works very hard to keep the facility running smoothly and without problems.

Suggestions for high input condor jobs:

I have a set of jobs which runs over a some 200 rootfiles each being ~50 MB in size. Usually running them over to get simple histograms is fast enough so as not to slow down anyone. But I am trying to a bit of loops now and some smearing of various quantities which is making the execution time consuming. So I am suggested to use condor for the same.

I have all these input root files residing in /uscms_data/d1/seema/SusyAnalysis/QCD_Spring10START3X_V26S09v1/QCDPtXXX/*.root I have 16 different XXX each having 200 files of ~50 MB each.

The CMSSW directory where I setup my root environment & submit jobs are in my home area /uscms/home/seema/work/SUSY/CMSSW_3_5_6/src/JetMETCorrections/GammaJet/test/qcdAnalysis

Where I should be keeping my input root files if I need to access them using condor so as not to slow down/disturb the system for others. Could you please suggest how to proceed ?

We do have some methods you can use to use to run jobs that mitigate the problem, some are documented on the batch systems web page, more are described here:

#1 - The recommended way to run a job like this is to let condor transfer the input files you need to the local worker at the the beginning of the job. From your description it seems that you have a separate directory for each job (QCDPtXXX) and there are number of files in this dir. You can tar up all the root files in that directory and have condor copy that one tar file or if the number of input files is small you can have condor transfer them individually.

The condor directives you need to use are:
should_transfer_file = YES
transfer_input_files = file1, file2

Note that the transfer_input_files directive can't take a directory name but it can take an absolute path. You should use the absolute path to your input files since your .root files are not in the same area as where the job is submitted from.

Also to make your jobs more efficient you should always use this directive:
when_to_transfer_output = ON_EXIT

There is a good explanation of all of this on the condor manual site:

#2 - The second way for you to run is simply to run very few jobs that have this foot print. By very few we mean 10 or less running at one time. Even then if we see performance issues related to these jobs we will need to stop them and will contact you and you to use one of the first two recommendations.

LPC Physics Group accounts

The group accounts are setup in this way:
A user account with the same name of the group, "lpcjj" in this case, is created and owns everything.
By default, it gives no write permission to group, "us_cms" (everyone is in this group.
To use the group area, login as yourself to the cluster, and then ksu lpcjj to become the LPC Physics group user.
Once you are user lpcjj, you are welcome to change the group permission using chmod g+w .. We leave the permission choice to the discretion of the group.

Webmaster | Last modified: Wednesday, 09-Aug-2017 09:23:55 CDT