uscms.org  www 

User Software and Computing: Computing Environment Setup

GPU nodes at CMS LPC: connections & software

CMS LPC GPU nodes connection

  • Make sure you have your local /etc/krb5.conf and ~/.ssh/config setup for Host cmslpc*.fnal.gov per the Connect to the LPC CAF instructions
  • Authenticate to kerberos as usual for your username
    kinit username@FNAL.GOV
  • Pick ONE of the three nodes numbered from 1 to 3
    • ssh -Y username@cmslpcgpu1.fnal.gov
    • ssh -Y username@cmslpcgpu2.fnal.gov
    • ssh -Y username@cmslpcgpu3.fnal.gov
  • Keep in mind these are interactive without limitations, so only allocate the GPU you think you need, and be sure to release it in a timely fashion. There are no explicit protections from user processes interfering.
  • Nodes are Scientific Linux 7 and have no access to the cmslpc condor batch queues
  • For long running and resource intensive jobs, please use other GPU nodes that can be found on the grid through CMS Connect

GPU nodes local disk space

  • Important: GPU usage can be severely limited by reading/writing files over the NFS mounted disk areas. Therefore, each of the individual CMS LPC GPU machines has a disk located on that machine for your use:
    • /storage/local/data1/gpuscratch
    • Make your own working directory in that area
    • Do NOT fill up the disk, as the machine will not function (please check with df -H)
    • Automatic cleaning of files that are older than 30 days will be done in /storage/local/data1/gpuscratch on each node using the following command:
      • /usr/bin/find /storage/local/data1/gpuscratch -type f -ctime +30 -fprint /var/tmp/gpuscratch_`date +\%Y\%m\%d`.log -exec rm -f {} \;

GPU Software

Singularity images

  • Image location: /cvmfs/unpacked.cern.ch/registry.hub.docker.com/fnallpc/ - you can ls that area to find the image names(tags)
  • You can see the containers with contents and possible tags at https://hub.docker.com/r/fnallpc/fnallpc-docker
  • Example command to start a Singularity container (here using Docker tag of tensorflow-latest-devel-gpu-singularity):
  • singularity run --nv --bind `readlink $HOME` --bind `readlink -f ${HOME}/nobackup/` --bind /cvmfs /cvmfs/unpacked.cern.ch/registry.hub.docker.com/fnallpc/fnallpc-docker:tensorflow-latest-devel-gpu-singularity

    • Explanation of command: The --nv is needed to mount the nvidia libraries/executables/drivers. Typically you only have access to the files in the folder in which you initiate the run command (the overlay entrypoint). To get around that, we bind the users' ${HOME} area, nobackup area, and CernVM-FS. The last part of the command is the sandbox.
    • Note: Don't bind your current working folder (CWD) or you'll see an error message like: WARNING: Bind mount '/uscms/home/username => /uscms/home/username' overlaps container CWD /uscms/home/username, may not be available

Software images background details

The images are built using Docker because that is the format used by the base images. The images are then converted to Singularity sandboxes using unpacked.cern.ch https://gitlab.cern.ch/unpacked/sync. Singularity containers are an overlay rather than a complete sandbox (unlike Docker containers). The user will still have access to their file areas.

See also the Docker/Singularity HATS@LPC 2020 for more about understanding regular usage of Docker and Singularity, including important security lessons should you choose to modify your own images. Note that GPU was only briefly covered in this HATS. Note: Images will be periodically updated to use the latest software. If a user requires a specific stable set of software, they should request this on the lpc-gpu listserv, and indicate the complete list of packages/version numbers required as well as the length of time this stable version is needed (i.e., length of EXO-XX-YYY analysis). Some of the many benefits of singularity containers include less work to maintain your software, less disk space taken up on your account, and the ability to use the containers outside of the cmslpc nodes on grid sites that mount /cvmfs/unpacked.cern.ch.

Installing User software

Note: we recommend using the containers as described above and requesting software when needed. We do understand that users may wish to work with their own software installations in addition to the containers. Here are some best practices to be aware of:
  • Don't fill up your quota (2GB on home), you may find it convenient to soft link your ~/.local to ~/nobackup/local
  • Don't install software in places that will break other programs (for instance, newer versions of python automatically put ~/.local in the python path even if you don't want that to happen)
  • There may be versions of the software you are installing on /cvmfs/sft.cern.ch (check for GPU compilation) or in the fnallpc containers
  • Useful tricks in pip:
    • call pip like this: HOME=`pwd` pip install followed by the remainder of the command (which will install in `pwd`/.local rather than ~/.local)
    • use -I (--ignore-installed) to avoid having issues trying to uninstall packages from cvmfs (which is read-only, so you can't uninstall)
    • use --no-cache-dir to avoid filling up home directory with pointless cache files
    • Use venv to maintain virtual environments which can reduce clashes with system tools

GPU monitoring

GPU User communications and help

CMS LPC GPU nodes technical specifics

From deviceQuery, the CUDA driver version may change with future kernel updates, so please check on the command line rather than the web page for the latest.

[tonjes@cmslpcgpu1 ~]$ /usr/local/cuda/extras/demo_suite/deviceQuery
/usr/local/cuda/extras/demo_suite/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla P100-PCIE-12GB"
  CUDA Driver Version / Runtime Version          11.0 / 11.0
  CUDA Capability Major/Minor version number:    6.0
  Total amount of global memory:                 12198 MBytes (12790923264 bytes)
  (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1329 MHz (1.33 GHz)
  Memory Clock rate:                             715 Mhz
  Memory Bus Width:                              3072-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 101 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.0, CUDA Runtime Version = 11.0, NumDevs = 1, Device0 = Tesla P100-PCIE-12GB
Result = PASS
Physical machine specs

1U Rack chassis, 1400w Power Supply Platinum Level. 
3 x Hot-swap 3.5" SAS/SATA drive Bays. 2 PCIE 3.0 x16 slots (support double-width GPU), 1 PCIE 3.0 x16 low-profile slot.
Intel C621 chipset, Single socket P (LGA 3647) supports Intel Xeon scalable Processors, TDP 70-205W. 
6 DIMM slots, up to 768GB 3DS LRDIMM, 192GB 
ECC RDIM, DDR4-2666MHz. 
Onboard Intel X550 Dual Port 10GBase-T, IPMI 2.0 with virtual media over LAN and KVM-over-LAN support, ASPEED 
AST2500 Graphics, 6 SATA3 (6Gbps) ports. 
Dual M.2 Mini-PCIe support 2280.
Intel Xeon Silver 4140 8-Core 2.1GHz 11MB Cache 85W Processor, DDR4-2400MHz
NVIDIA TESLA P100 12GB PCIe 3.0 PCEe 3.0 Passive GPU
HGST 2TB SATA 6Gb/s 128M 7200RPM 3.5" HDD

Other CMS GPU resources available

Versions of TensorFlow available for GPU usage and other GPU sites described in the CMS connect documentation: https://ci-connect.atlassian.net/wiki/spaces/CMS/pages/79953986/Using+TensorFlow
Webmaster | Last modified: Friday, 09-Oct-2020 15:14:52 CDT