User Software and Computing
System Status: SSI Metrics
- Not interactive monitor but useful: Landscape LPC mainly condor - authenticate with grid certificate
- The SSI Metrics pages are meant mainly for sysadmins. Take note of the time in the upper right, and "green box" around node name means online, not necessarily if the node is under high memory/CPU/or network load.
- Requires FNAL SSO to access the pages (use Services password OR Kerberos authenticated - after following instructions for one-time browser configuration)
- To access the SSI Metrics links from offsite: use the FNAL VPN client - login with FNAL SSO to download the client, and you will need your Services password to operate the VPN, and a Fermilab certificate
- Interactive nodes:
- CMSLPC Interactive nodes, number of users logged in and number processes
- CMSLPC Interactive nodes note that the green plots just mean the node is online, doesn't measure load
- CMSLPC Interactive stats for each node individually, can type an individual node in the "Hostname"
- CMSLPC interactive node I/O wait time
- CMSLPC Interactive nodes: Network for each node el9 (choose different range in the regex for other systems)
- Servers with more than 10 blocked I/O processes or high load (includes non-CMS servers)
- Condor worker nodes (T1 and T3):
- LPC Worker Nodes network usage, make sure you know the threshold for the T3/T1 line, as they are both reported here
- EOS storage nodes:
- EOS on landscape is the easiest way to see if it's up, check for proper space total, MGM errors, change in replica imbalance, or overload on FUSE access
- CMSLPC Interactive nodes, number of users logged in and number processes
- CMSLPC Interactive nodes note that the green plots just mean the node is online, doesn't measure load
- CMSLPC Interactive stats for each node individually, can type an individual node in the "Hostname"
- CMSLPC interactive node I/O wait time
- CMSLPC Interactive nodes: Network for each node el9 (choose different range in the regex for other systems)
- Servers with more than 10 blocked I/O processes or high load (includes non-CMS servers)
- LPC Worker Nodes network usage, make sure you know the threshold for the T3/T1 line, as they are both reported here
- EOS on landscape is the easiest way to see if it's up, check for proper space total, MGM errors, change in replica imbalance, or overload on FUSE access