WRITELOOP

A SIMPLE SYSTEM METRICS MONITORING TOOL FOR LINUX

In this post I present sysstat, a versatile CLI monitoring tool that you can leverage to inspect real time and historic metrics on a linux system regarding the usage of cpu, memory, swap and others.

2022 August 11

The sysstat package

The sysstat package contains utilities that can be used to collect system metrics in real-time and also to keep history of their values. It is written in C, so it is fast and has a low footprint that does not conflict with any other process on your system.

It provides the following utilities:

  • mpstat: cpu usage metrics.

  • vmstat: memory usage metrics.

  • iostat: disk devices/partitions metrics.

  • pidstat: process/pid metrics.

  • sar: an activity report. It can automatically persist and generate reports on the metrics of the other utilities.

Useful utilities

All commands above can be useful on their own way. Some give a more general view and others are more specific.

There is also a man page for each one that you can read to learn them in depth.

If you like more practical approaches and want to start using this commands, I have a cheatsheet at github that you can use to start experimenting.

The ones I use more often are the following:

VMSTAT

Collect information about processes, CPU, memory usage and disk activity.

With no parameters it returns averages since the last reboot. You can also pass some parameters to it to see the behavior of your system in real-time during any activity, e.g., before and after your kernel is being compiled. In order to do that, you can run vmstat moments before starting, passing as a parameter an interval that corresponds to the time this activity runs.

Example:

$ vmstat 5 10 -n -t
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- -----timestamp-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st                 -03
7  0 853480 172776  22380 817056    1    3    14    13   35   17  3  1 96  0  0 2017-07-28 19:03:24
0  0 853480 170544  22392 816672    0    0    26   152 1382 9197 10  3 87  0  0 2017-07-28 19:03:29
1  0 853480 167164  22400 816800    0    0    26    14 1147 8081  8  3 89  0  0 2017-07-28 19:03:34
1  0 853480 177224  22412 808520    1    0    27    50 1586 9893 13  3 84  0  0 2017-07-28 19:03:39
2  0 853480 174124  22420 811488    0    0     0   150 1261 8738 10  4 86  0  0 2017-07-28 19:03:44
0  2 853480 171596  22420 811608    6    0    32   148 1655 9030 11  3 87  0  0 2017-07-28 19:03:49

On the example above, I capture the metrics with a delay of 5 seconds, for 10 times in a row (so, for 50 seconds).

The second parameter (10) is optional. If not passed, it will keep outputting the metrics until you kill it. This logic applies to the other utilities as well (pidsat, iostat, sar, etc…)

-n is for the header to not be duplicated and with -t the timestamp is appended to the end of each line.

SAR

Generates reports at intervals with metrics automatically collected, for your convenience.

On debian derivatives (I can confirm on PopOS! 22.04), you must activate a systemd timer to collect those metrics. To check the command that the corresponding systemd service runs, type:

$ sudo systemctl cat sysstat-collect.service

# /lib/systemd/system/sysstat-collect.service
# /lib/systemd/system/sysstat-collect.service
# (C) 2014 Tomasz Torcz <tomek@pipebreaker.pl>
#
# sysstat-12.5.2 systemd unit file:
#        Collects system activity data
#        Activated by sysstat-collect.timer unit

[Unit]
Description=system activity accounting tool
Documentation=man:sa1(8)
After=sysstat.service

[Service]
Type=oneshot
User=root
ExecStart=/usr/lib/sysstat/sa1 1 1

As you can see, the command that will be executed will be: /usr/lib/sysstat/sa1 1 1. The default timer runs it at each 10 minutes.

This command will “run sa1 (sadc), and every time collect one sample”. The interval parameter (the first “1” value given to sa1) is meaningless here since you need at least 2 samples to define an interval. So, two samples need to be collected by sar’s backend (sadc). Counters collected by sadc are, in most cases, cumulative values since boot time. So you can take one snapshot at time t, then another one 10 minutes later, the values displayed will actually cover the whole 10 minutes interval, but of course, those statistics (CPU utilization, network traffic, context switches, etc.) will be average values over the period. So maybe the dips and spikes will be less visible. On the other hand values like e.g., memory utilization (values displayed by “sar -r”), are actually instantaneous values: They give you a view of your system at the very moment when they are collected.

The data (files /var/log/sysstat/sa*, where * is the day number) to feed the reports is stored on a binary format.

The tool that can read these data is called sar, and is the one responsible for generating daily report files (sar*, where * is the day number).

You can also use a tool called sadf to format sar reports to more convenient formats, like JSON or CSV.

E.g.:

$ sadf -T -dh /var/log/sa/sa27 -- | sed 's/;/,/g' > system_stats_20170727.csv
$ more system_stats_20170727.csv
# hostname,interval,timestamp,CPU,%user,%nice,%system,%iowait,%steal,%idle[...]
localhost.localdomain,599,2017-07-27 23:00:03,-1,10.52,0.00,7.73,1.49,0.00,80.26
localhost.localdomain,-1,2017-07-27 23:02:38,LINUX-RESTART
# hostname,interval,timestamp,CPU,%user,%nice,%system,%iowait,%steal,%idle[...]
localhost.localdomain,600,2017-07-27 23:20:01,-1,0.07,0.00,0.10,0.01,0.00,99.82
localhost.localdomain,600,2017-07-27 23:30:01,-1,0.04,0.00,0.07,0.01,0.00,99.88
localhost.localdomain,600,2017-07-27 23:40:01,-1,0.14,0.00,0.13,0.01,0.00,99.72
localhost.localdomain,600,2017-07-27 23:50:01,-1,0.16,0.00,0.16,0.01,0.00,99.67
$ sadf -T -jh /var/log/sa/sa27 -- | sed 's/;/,/g' > system_stats_20170727.json
$ more system_stats_20170727.json
{
   "sysstat":{
      "sysdata-version":2.15,
      "hosts":[
         {
            "nodename":"localhost.localdomain",
            "sysname":"Linux",
            "release":"3.10.0-514.26.2.el7.x86_64",
            "machine":"x86_64",
            "number-of-cpus":2,
            "file-date":"2017-07-27",
            "statistics":[
               {
                  "timestamp":{
                     "date":"2017-07-27",
                     "time":"23:00:03",
                     "utc":0,
                     "interval":599
                  },
                  "cpu-load":[
                     {
                        "cpu":"all",
                        "user":10.52,
                        "nice":0.00,
                        "system":7.73,
                        "iowait":1.49,
                        "steal":0.00,
                        "idle":80.26
                     }
                  ]
               },
               {
                  "timestamp":{
                     "date":"2017-07-27",
                     "time":"23:20:01",
                     "utc":0,
                     "interval":600
                  },
                  "cpu-load":[
                     {
                        "cpu":"all",
                        "user":0.07,
                        "nice":0.00,
                        "system":0.10,
                        "iowait":0.01,
                        "steal":0.00,
                        "idle":99.82
                     }
                  ]
               }
            ],
            "restarts":[
               {
                  "boot":{
                     "date":"2017-07-27",
                     "time":"23:02:38",
                     "utc":0
                  }
               }
            ]
         }
      ]
   }
}

Relevant configuration files

(debian derived distributions, like Ubuntu and PopOS!)

  • /etc/default/sysstat (enable the periodic persistance of the metrics)

  • /etc/systat/sysstat (fine tune the systat configuration, like retention, etc)

Start and enable systemd service and timers

That is needed to enable collecting data.

$ sudo systemctl start sysstat sysstat-collect.timer sysstat-summary.timer
$ sudo systemctl enable sysstat sysstat-collect.timer sysstat-summary.timer
NOTE: The original content(s) that inspired this one can be found at:
https://blog.2ndquadrant.com/in-the-defense-of-sar/
https://www.tecmint.com/linux-performance-monitoring-and-file-system-statistics-reports/
https://blog.2ndquadrant.com/visualizing-sar-data/
All copyright and intellectual property of each one belongs to its' original author.