How to set up Nvidia GPGPU computing using just the official Debian 11 repos

If you don’t know if or how your Nvidia GPU is supported by Debian, try running nvidia-detect from the nvidia-detect package.

For supported chips, setting up Nvidia GPGPU computing is as simple as installing the nvidia-opencl-icd package (from the non-free component). This will pull in a large swathe of recommended packages like the non-free, proprietary Nvidia graphics driver, libcuda1, nvidia-smi, etc. See:

apt install nvidia-opencl-icd

The following additional packages will be installed:
  glx-alternative-mesa glx-alternative-nvidia glx-diversions libcuda1 libnvidia-cfg1 libnvidia-compiler libnvidia-ml1 libnvidia-ptxjitcompiler1 libpci3 nvidia-alternative
  nvidia-installer-cleanup nvidia-kernel-common nvidia-kernel-dkms nvidia-kernel-support nvidia-legacy-check nvidia-modprobe nvidia-opencl-common nvidia-persistenced nvidia-smi
  nvidia-support ocl-icd-libopencl1 pci.ids pciutils update-glx
Suggested packages:
  libgl1-mesa-glx | libgl1 nvidia-driver | nvidia-driver-any nvidia-cuda-mps wget | curl | lynx-cur
Recommended packages:
  libcuda1:i386
The following NEW packages will be installed:
  glx-alternative-mesa glx-alternative-nvidia glx-diversions libcuda1 libnvidia-cfg1 libnvidia-compiler libnvidia-ml1 libnvidia-ptxjitcompiler1 libpci3 nvidia-alternative
  nvidia-installer-cleanup nvidia-kernel-common nvidia-kernel-dkms nvidia-kernel-support nvidia-legacy-check nvidia-modprobe nvidia-opencl-common nvidia-opencl-icd
  nvidia-persistenced nvidia-smi nvidia-support ocl-icd-libopencl1 pci.ids pciutils update-glx
0 upgraded, 25 newly installed, 0 to remove and 0 not upgraded.
Need to get 50.1 MB of archives.
After this operation, 157 MB of additional disk space will be used.

Now you need to reboot so that the proprietary Nvidia driver will be loaded.

Now you can list general information about your card by running nvidia-smi from the nvidia-smi package, or OpenCL capabilities by running clinfo from the clinfo package. This also works on headless machines.

Also see my related code repository: https://github.com/michaelfranzl/image_debian-gpgpu

Performance comparison of three different implementations of dynamic_cast in C++

I intended to center my new software design around dynamic_cast when I found repeated mention of allegedly poor performance of it, as well as outright exclamations that one’s software design turns ‘poor’ the second introducing it.

I took the warnings seriously at first but wanted to know more. Starting to dig deeper, I found only assertions which are either not backed up at all, or assertions based on invalid comparisons, for example comparing the run-time feature dynamic_cast with the compile-time feature reinterpret_cast. Well sure, compared to a zero-cost operation, everything is infinitely slower!

But these circumstances are well-known:

The field of performance is littered with myth and bogus folklore.

C++ Core Guidelines, Per.6

In trying to decide if my software design is indeed ‘allowed’ to center around dynamic casting, I was in need of good measurements. So, I followed C++ Core Guideline Per.6: Don’t make claims about performance without measurements and wrote a benchmark program to compare 3 different implementation of dynamic_cast. It compares:

Method

Writing a reasonable benchmark program was indeed harder than I anticipated, as predicted:

Getting good performance measurements can be hard and require specialized tools.

Per.6 of the C++ Core Guidelines

One of the hard parts was to choose a realistic unit of work which can serve as a comparable base-line. The emphasis here is on “realistic”; for example, I don’t think that repeatedly doing the same type of cast on the same object is very realistic.

This unit of work should also be as small as possible to maximize the performance impact of dynamic casting:

  1. Do a dynamic_cast<Base*> on an object inheriting from Base. This has zero runtime cost because the cast will always succeed; the compiler can optimize this away, and it is equivalent to a static_cast<Base*>.
  2. If the cast is successful, increment a running total; if not, increment another running total. It is important to spend the same number of cycles in both branches, to not skew the view at the latency of the cast.

This unit of work is done repeatedly by iterating over 2 million individual objects (which are instances of class Base or of its sub-types) stored in a std::vector<std::shared_ptr<Base>> . The objects are in contiguous memory, but can be aligned, or shuffled; on modern hardware architectures, this makes a big difference due to cache prefetching.

This base-line is then compared to cases were only one parameter is varied: the target class of dynamic_cast.

Benchmark program

The program uses 3 different class hierarchies to cover the most typical casting scenarios: Deep, shallow, and balanced.

Deep hierarchy:

A ← B ← C ← D ← E ← F ← G ← H

Shallow hierarchy:

A ← B
A ← C
A ← D
A ← E
A ← F
A ← G
A ← H

Balanced hierarchy:

A ← B
A ← E

B ← C
B ← D

E ← F
E ← G
E ← H

I made the source code of the benchmark program available. See https://github.com/michaelfranzl/dynamic_cast_benchmark

There may be mistakes in the program. If you discover one, please leave a comment below or contribute a merge request.

The output of an example run on the program can be found below.

Findings

If anything is proven here, then this:

Modern hardware and optimizers defy naive assumptions; even experts are regularly surprised.

Per.6 of the C++ Core Guidelines

I tried to extract out of the program output below general observations; this is somewhat possible, but there are exceptions, and a few odd items.

  1. It is clear that in all 3 implementations, dynamic casting of an object inheriting from Base to the target class Base is as fast as the base-line.
  2. kcl_dynamic_cast performs best in all scenarios.
  3. The ordering of the objects in memory has a significant influence on performance. The worst case for dynamic_cast with ordered objects performs about the same as the best case for kcl_dynamic_cast with shuffled objects.
  4. In priori_cast, the latency grows with the distance between the target class and the base class. However, this characteristic is less pronounced when the objects are randomly shuffled in memory, or when the class hierarchy is not deep. The latency is independent of the operand class (independent of the cast being successful or not), which makes it a constant-time operation.
  5. In dynamic_cast, for successful casts, the latency grows with the distance between the operand class and the base class (about 16% of base-line speed for casts to the same class when objects are aligned, and 2% when objects are shuffled). Non-successful casts have a latency independent of the target class.

If you think that I’ve made a mistake in interpreting the results, please leave a comment below.

Program output

The results were generated on an AMD Ryzen 5 3600 CPU, with the frequency fixed at 3600 MHz.

The unit of performance is given in MHz, i.e. million “units of work” per second. Percentages are given relative to the base-line. The horizontal bars illustrate the given percentages.

Run 1 (objects aligned)

Class hierarchy: deep

Cast type: Mostly successful (cast from class G)

Base-line: static_cast

  -: 987.2 MHz (100%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 969.9 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B:  27.6 MHz (  3%) [2000000] |------|
  C:  32.9 MHz (  3%) [2000000] |-------|
  D:  40.7 MHz (  4%) [2000000] |---------|
  E:  54.1 MHz (  5%) [2000000] |-------------|
  F:  79.4 MHz (  8%) [2000000] |-------------------|
  G: 152.0 MHz ( 15%) [2000000] |------------------------------------|
  H:  22.8 MHz (  2%) [      0] |-----|
  Z:  22.1 MHz (  2%) [      0] |-----|
------------
AVG:  48.0 MHz                  |===========|

Implementation: priori_cast

  A: 912.0 MHz ( 92%) [2000000] |------------------------------------------------------------...
  B: 120.8 MHz ( 12%) [2000000] |-----------------------------|
  C:  79.8 MHz (  8%) [2000000] |-------------------|
  D:  58.0 MHz (  6%) [2000000] |--------------|
  E:  42.6 MHz (  4%) [2000000] |----------|
  F:  35.6 MHz (  4%) [2000000] |--------|
  G:  32.4 MHz (  3%) [2000000] |-------|
  H:   9.0 MHz (  1%) [      0] |--|
  Z:   9.0 MHz (  1%) [      0] |--|
------------
AVG:  43.0 MHz                  |==========|

Implementation: kcl_dynamic_cast

  A: 963.9 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B: 190.0 MHz ( 19%) [2000000] |----------------------------------------------|
  C: 188.4 MHz ( 19%) [2000000] |---------------------------------------------|
  D: 196.3 MHz ( 20%) [2000000] |-----------------------------------------------|
  E: 199.5 MHz ( 20%) [2000000] |------------------------------------------------|
  F: 224.9 MHz ( 23%) [2000000] |------------------------------------------------------|
  G: 220.6 MHz ( 22%) [2000000] |-----------------------------------------------------|
  H: 146.7 MHz ( 15%) [      0] |-----------------------------------|
  Z: 154.1 MHz ( 16%) [      0] |-------------------------------------|
------------
AVG: 168.9 MHz                  |=========================================|

Cast type: Mostly failed (cast from class B)

Base-line: static_cast

  -: 989.6 MHz (100%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 973.2 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B: 149.9 MHz ( 15%) [2000000] |------------------------------------|
  C:  70.5 MHz (  7%) [      0] |-----------------|
  D:  69.0 MHz (  7%) [      0] |----------------|
  E:  70.0 MHz (  7%) [      0] |-----------------|
  F:  68.7 MHz (  7%) [      0] |----------------|
  G:  69.5 MHz (  7%) [      0] |----------------|
  H:  68.6 MHz (  7%) [      0] |----------------|
  Z:  68.3 MHz (  7%) [      0] |----------------|
------------
AVG:  70.5 MHz                  |=================|

Implementation: priori_cast

  A: 970.9 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B: 120.7 MHz ( 12%) [2000000] |-----------------------------|
  C:  77.4 MHz (  8%) [      0] |------------------|
  D:  54.7 MHz (  6%) [      0] |-------------|
  E:  45.3 MHz (  5%) [      0] |-----------|
  F:  37.8 MHz (  4%) [      0] |---------|
  G:  33.0 MHz (  3%) [      0] |--------|
  H:   9.2 MHz (  1%) [      0] |--|
  Z:   9.1 MHz (  1%) [      0] |--|
------------
AVG:  43.0 MHz                  |==========|

Implementation: kcl_dynamic_cast

  A: 974.2 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B: 239.5 MHz ( 24%) [2000000] |----------------------------------------------------------|
  C: 192.0 MHz ( 19%) [      0] |----------------------------------------------|
  D: 190.4 MHz ( 19%) [      0] |----------------------------------------------|
  E: 194.9 MHz ( 20%) [      0] |-----------------------------------------------|
  F: 198.2 MHz ( 20%) [      0] |------------------------------------------------|
  G: 194.8 MHz ( 20%) [      0] |-----------------------------------------------|
  H: 199.0 MHz ( 20%) [      0] |------------------------------------------------|
  Z: 187.9 MHz ( 19%) [      0] |---------------------------------------------|
------------
AVG: 177.4 MHz                  |===========================================|

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 997.0 MHz (101%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 943.8 MHz ( 96%) [2000000] |------------------------------------------------------------...
  B:  39.2 MHz (  4%) [1714818] |---------|
  C:  44.4 MHz (  4%) [1428952] |----------|
  D:  47.2 MHz (  5%) [1144295] |-----------|
  E:  46.0 MHz (  5%) [ 858212] |-----------|
  F:  42.9 MHz (  4%) [ 572465] |----------|
  G:  37.3 MHz (  4%) [ 286584] |---------|
  H:  31.0 MHz (  3%) [      0] |-------|
  Z:  29.2 MHz (  3%) [      0] |-------|
------------
AVG:  35.3 MHz                  |========|

Implementation: priori_cast

  A: 980.9 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B: 105.5 MHz ( 11%) [1714818] |-------------------------|
  C:  67.7 MHz (  7%) [1428952] |----------------|
  D:  45.7 MHz (  5%) [1144295] |-----------|
  E:  37.2 MHz (  4%) [ 858212] |---------|
  F:  33.1 MHz (  3%) [ 572465] |--------|
  G:  31.1 MHz (  3%) [ 286584] |-------|
  H:   9.0 MHz (  1%) [      0] |--|
  Z:   8.9 MHz (  1%) [      0] |--|
------------
AVG:  37.6 MHz                  |=========|

Implementation: kcl_dynamic_cast

  A: 827.8 MHz ( 84%) [2000000] |------------------------------------------------------------...
  B:  86.2 MHz (  9%) [1714818] |--------------------|
  C:  90.0 MHz (  9%) [1428952] |---------------------|
  D:  93.2 MHz (  9%) [1144295] |----------------------|
  E:  92.9 MHz (  9%) [ 858212] |----------------------|
  F:  86.2 MHz (  9%) [ 572465] |--------------------|
  G:  81.3 MHz (  8%) [ 286584] |-------------------|
  H:  78.4 MHz (  8%) [      0] |-------------------|
  Z:  78.7 MHz (  8%) [      0] |-------------------|
------------
AVG:  76.3 MHz                  |==================|

Class hierarchy: shallow

Cast type: Mostly successful (cast from class G)

Base-line: static_cast

  -: 864.3 MHz ( 88%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 916.6 MHz ( 93%) [2000000] |------------------------------------------------------------...
  B:  69.8 MHz (  7%) [      0] |----------------|
  C:  69.0 MHz (  7%) [      0] |----------------|
  D:  62.6 MHz (  6%) [      0] |---------------|
  E:  62.0 MHz (  6%) [      0] |---------------|
  F:  69.8 MHz (  7%) [      0] |----------------|
  G: 147.5 MHz ( 15%) [2000000] |-----------------------------------|
  H:  69.9 MHz (  7%) [      0] |----------------|
  Z:  68.9 MHz (  7%) [      0] |----------------|
------------
AVG:  68.8 MHz                  |================|

Implementation: priori_cast

  A: 961.1 MHz ( 97%) [2000000] |------------------------------------------------------------...
  B:  26.0 MHz (  3%) [      0] |------|
  C:  23.6 MHz (  2%) [      0] |-----|
  D:  20.5 MHz (  2%) [      0] |----|
  E:  19.5 MHz (  2%) [      0] |----|
  F:  15.6 MHz (  2%) [      0] |---|
  G:  14.0 MHz (  1%) [2000000] |---|
  H:   8.6 MHz (  1%) [      0] |--|
  Z:   9.5 MHz (  1%) [      0] |--|
------------
AVG:  15.3 MHz                  |===|

Implementation: kcl_dynamic_cast

  A: 984.7 MHz (100%) [2000000] |------------------------------------------------------------...
  B: 180.4 MHz ( 18%) [2000000] |-------------------------------------------|
  C: 188.6 MHz ( 19%) [2000000] |---------------------------------------------|
  D: 203.0 MHz ( 21%) [2000000] |-------------------------------------------------|
  E: 210.0 MHz ( 21%) [2000000] |---------------------------------------------------|
  F: 214.8 MHz ( 22%) [2000000] |----------------------------------------------------|
  G: 225.9 MHz ( 23%) [2000000] |------------------------------------------------------|
  H: 152.4 MHz ( 15%) [      0] |-------------------------------------|
  Z: 162.5 MHz ( 16%) [      0] |---------------------------------------|
------------
AVG: 170.8 MHz                  |=========================================|

Cast type: Mostly failed (cast from class B)

Base-line: static_cast

  -: 950.1 MHz ( 96%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 962.0 MHz ( 97%) [2000000] |------------------------------------------------------------...
  B: 150.8 MHz ( 15%) [2000000] |------------------------------------|
  C:  68.0 MHz (  7%) [      0] |----------------|
  D:  68.8 MHz (  7%) [      0] |----------------|
  E:  67.0 MHz (  7%) [      0] |----------------|
  F:  68.6 MHz (  7%) [      0] |----------------|
  G:  67.2 MHz (  7%) [      0] |----------------|
  H:  68.8 MHz (  7%) [      0] |----------------|
  Z:  67.0 MHz (  7%) [      0] |----------------|
------------
AVG:  69.6 MHz                  |================|

Implementation: priori_cast

  A: 937.2 MHz ( 95%) [2000000] |------------------------------------------------------------...
  B:  25.3 MHz (  3%) [2000000] |------|
  C:  22.3 MHz (  2%) [      0] |-----|
  D:  20.2 MHz (  2%) [      0] |----|
  E:  18.9 MHz (  2%) [      0] |----|
  F:  15.6 MHz (  2%) [      0] |---|
  G:  14.4 MHz (  1%) [      0] |---|
  H:   8.9 MHz (  1%) [      0] |--|
  Z:   9.9 MHz (  1%) [      0] |--|
------------
AVG:  15.1 MHz                  |===|

Implementation: kcl_dynamic_cast

  A: 969.9 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B: 231.2 MHz ( 23%) [2000000] |--------------------------------------------------------|
  C: 193.5 MHz ( 20%) [      0] |-----------------------------------------------|
  D: 192.5 MHz ( 20%) [      0] |----------------------------------------------|
  E: 186.5 MHz ( 19%) [      0] |---------------------------------------------|
  F: 183.6 MHz ( 19%) [      0] |--------------------------------------------|
  G: 184.4 MHz ( 19%) [      0] |--------------------------------------------|
  H: 184.4 MHz ( 19%) [      0] |--------------------------------------------|
  Z: 192.2 MHz ( 19%) [      0] |----------------------------------------------|
------------
AVG: 172.1 MHz                  |=========================================|

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 1004.5 MHz (102%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 966.2 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B:  69.4 MHz (  7%) [ 286159] |----------------|
  C:  69.1 MHz (  7%) [ 285844] |----------------|
  D:  63.3 MHz (  6%) [ 285454] |---------------|
  E:  57.7 MHz (  6%) [ 285702] |--------------|
  F:  63.2 MHz (  6%) [ 285421] |---------------|
  G:  58.1 MHz (  6%) [ 286243] |--------------|
  H:  58.0 MHz (  6%) [      0] |--------------|
  Z:  68.2 MHz (  7%) [      0] |----------------|
------------
AVG:  56.3 MHz                  |=============|

Implementation: priori_cast

  A: 940.7 MHz ( 95%) [2000000] |------------------------------------------------------------...
  B:  25.7 MHz (  3%) [ 286159] |------|
  C:  23.2 MHz (  2%) [ 285844] |-----|
  D:  19.8 MHz (  2%) [ 285454] |----|
  E:  18.4 MHz (  2%) [ 285702] |----|
  F:  15.3 MHz (  2%) [ 285421] |---|
  G:  14.4 MHz (  1%) [ 286243] |---|
  H:   8.5 MHz (  1%) [      0] |--|
  Z:   9.4 MHz (  1%) [      0] |--|
------------
AVG:  15.0 MHz                  |===|

Implementation: kcl_dynamic_cast

  A: 889.3 MHz ( 90%) [2000000] |------------------------------------------------------------...
  B:  94.5 MHz ( 10%) [1714823] |----------------------|
  C:  96.8 MHz ( 10%) [1428664] |-----------------------|
  D:  94.0 MHz ( 10%) [1142820] |----------------------|
  E:  92.8 MHz (  9%) [ 857366] |----------------------|
  F:  89.0 MHz (  9%) [ 571664] |---------------------|
  G:  86.9 MHz (  9%) [ 286243] |---------------------|
  H:  82.9 MHz (  8%) [      0] |--------------------|
  Z:  83.9 MHz (  9%) [      0] |--------------------|
------------
AVG:  80.1 MHz                  |===================|

Class hierarchy: balanced

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 980.9 MHz ( 99%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 984.7 MHz (100%) [2000000] |------------------------------------------------------------...
  B:  48.8 MHz (  5%) [ 858197] |-----------|
  C:  43.5 MHz (  4%) [ 285844] |----------|
  D:  42.4 MHz (  4%) [ 286062] |----------|
  E:  44.3 MHz (  4%) [ 856053] |----------|
  F:  46.3 MHz (  5%) [ 285411] |-----------|
  G:  47.8 MHz (  5%) [ 285525] |-----------|
  H:  39.8 MHz (  4%) [      0] |---------|
  Z:  42.0 MHz (  4%) [      0] |----------|
------------
AVG:  39.4 MHz                  |=========|

Implementation: priori_cast

  A: 982.3 MHz (100%) [2000000] |------------------------------------------------------------...
  B:  11.8 MHz (  1%) [ 858197] |--|
  C:  12.3 MHz (  1%) [ 285844] |--|
  D:  10.8 MHz (  1%) [ 286062] |--|
  E:  10.2 MHz (  1%) [ 856053] |--|
  F:   9.4 MHz (  1%) [ 285411] |--|
  G:   9.9 MHz (  1%) [ 285525] |--|
  H:   8.8 MHz (  1%) [      0] |--|
  Z:   9.7 MHz (  1%) [      0] |--|
------------
AVG:   9.2 MHz                  |==|

Implementation: kcl_dynamic_cast

  A: 922.1 MHz ( 93%) [2000000] |------------------------------------------------------------...
  B:  95.4 MHz ( 10%) [ 858197] |-----------------------|
  C:  90.9 MHz (  9%) [ 285844] |----------------------|
  D:  90.9 MHz (  9%) [ 286062] |----------------------|
  E:  92.4 MHz (  9%) [ 856053] |----------------------|
  F:  88.0 MHz (  9%) [ 285411] |---------------------|
  G:  89.0 MHz (  9%) [ 285525] |---------------------|
  H:  88.3 MHz (  9%) [      0] |---------------------|
  Z:  88.0 MHz (  9%) [      0] |---------------------|
------------
AVG:  80.3 MHz                  |===================|

Run 2 (objects shuffled)

Class hierarchy: deep

Cast type: Mostly successful (cast from class G)

Base-line: static_cast

  -: 956.5 MHz (100%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 945.6 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B:   6.4 MHz (  1%) [2000000] |-|
  C:   6.7 MHz (  1%) [2000000] |-|
  D:   7.0 MHz (  1%) [2000000] |-|
  E:   7.4 MHz (  1%) [2000000] |-|
  F:  13.3 MHz (  1%) [2000000] |---|
  G:  20.3 MHz (  2%) [2000000] |-----|
  H:   6.2 MHz (  1%) [      0] |-|
  Z:   6.1 MHz (  1%) [      0] |-|
------------
AVG:   8.1 MHz                  |==|

Implementation: priori_cast

  A: 971.3 MHz (102%) [2000000] |------------------------------------------------------------...
  B:  14.4 MHz (  2%) [2000000] |---|
  C:  13.8 MHz (  1%) [2000000] |---|
  D:   7.7 MHz (  1%) [2000000] |-|
  E:   7.4 MHz (  1%) [2000000] |-|
  F:   7.3 MHz (  1%) [2000000] |-|
  G:   7.1 MHz (  1%) [2000000] |-|
  H:   9.6 MHz (  1%) [      0] |--|
  Z:   9.8 MHz (  1%) [      0] |--|
------------
AVG:   8.6 MHz                  |==|

Implementation: kcl_dynamic_cast

  A: 951.5 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B:  27.3 MHz (  3%) [2000000] |------|
  C:  27.6 MHz (  3%) [2000000] |------|
  D:  28.0 MHz (  3%) [2000000] |-------|
  E:  28.7 MHz (  3%) [2000000] |-------|
  F:  34.2 MHz (  4%) [2000000] |--------|
  G:  35.0 MHz (  4%) [2000000] |--------|
  H:  21.1 MHz (  2%) [      0] |-----|
  Z:  21.2 MHz (  2%) [      0] |-----|
------------
AVG:  24.8 MHz                  |======|

Cast type: Mostly failed (cast from class B)

Base-line: static_cast

  -: 932.0 MHz ( 97%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 924.6 MHz ( 97%) [2000000] |------------------------------------------------------------...
  B:  20.4 MHz (  2%) [2000000] |-----|
  C:   7.9 MHz (  1%) [      0] |-|
  D:   7.9 MHz (  1%) [      0] |-|
  E:   7.9 MHz (  1%) [      0] |-|
  F:   7.9 MHz (  1%) [      0] |-|
  G:   7.8 MHz (  1%) [      0] |-|
  H:   7.9 MHz (  1%) [      0] |-|
  Z:   7.9 MHz (  1%) [      0] |-|
------------
AVG:   8.4 MHz                  |==|

Implementation: priori_cast

  A: 981.8 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  14.7 MHz (  2%) [2000000] |---|
  C:  14.1 MHz (  1%) [      0] |---|
  D:   7.8 MHz (  1%) [      0] |-|
  E:   7.6 MHz (  1%) [      0] |-|
  F:   7.4 MHz (  1%) [      0] |-|
  G:   7.3 MHz (  1%) [      0] |-|
  H:   9.4 MHz (  1%) [      0] |--|
  Z:   9.6 MHz (  1%) [      0] |--|
------------
AVG:   8.7 MHz                  |==|

Implementation: kcl_dynamic_cast

  A: 966.2 MHz (101%) [2000000] |------------------------------------------------------------...
  B:  34.5 MHz (  4%) [2000000] |--------|
  C:  27.9 MHz (  3%) [      0] |-------|
  D:  27.9 MHz (  3%) [      0] |-------|
  E:  28.0 MHz (  3%) [      0] |-------|
  F:  28.0 MHz (  3%) [      0] |-------|
  G:  28.0 MHz (  3%) [      0] |-------|
  H:  27.9 MHz (  3%) [      0] |-------|
  Z:  27.5 MHz (  3%) [      0] |------|
------------
AVG:  25.5 MHz                  |======|

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 985.2 MHz (103%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 962.5 MHz (101%) [2000000] |------------------------------------------------------------...
  B:   7.0 MHz (  1%) [1714818] |-|
  C:   7.3 MHz (  1%) [1428952] |-|
  D:   7.4 MHz (  1%) [1144295] |-|
  E:   7.4 MHz (  1%) [ 858212] |-|
  F:   7.3 MHz (  1%) [ 572465] |-|
  G:   7.0 MHz (  1%) [ 286584] |-|
  H:   6.7 MHz (  1%) [      0] |-|
  Z:   6.5 MHz (  1%) [      0] |-|
------------
AVG:   6.3 MHz                  |=|

Implementation: priori_cast

  A: 981.4 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  13.8 MHz (  1%) [1714818] |---|
  C:  12.7 MHz (  1%) [1428952] |---|
  D:   6.8 MHz (  1%) [1144295] |-|
  E:   6.7 MHz (  1%) [ 858212] |-|
  F:   7.0 MHz (  1%) [ 572465] |-|
  G:   7.0 MHz (  1%) [ 286584] |-|
  H:   9.1 MHz (  1%) [      0] |--|
  Z:  10.0 MHz (  1%) [      0] |--|
------------
AVG:   8.1 MHz                  |==|

Implementation: kcl_dynamic_cast

  A: 945.2 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B:  22.6 MHz (  2%) [1714818] |-----|
  C:  23.8 MHz (  2%) [1428952] |-----|
  D:  24.0 MHz (  3%) [1144295] |------|
  E:  23.6 MHz (  2%) [ 858212] |-----|
  F:  22.7 MHz (  2%) [ 572465] |-----|
  G:  21.4 MHz (  2%) [ 286584] |-----|
  H:  20.2 MHz (  2%) [      0] |-----|
  Z:  20.3 MHz (  2%) [      0] |-----|
------------
AVG:  19.8 MHz                  |====|

Class hierarchy: shallow

Cast type: Mostly successful (cast from class G)

Base-line: static_cast

  -: 980.4 MHz (103%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 999.5 MHz (104%) [2000000] |------------------------------------------------------------...
  B:   8.0 MHz (  1%) [      0] |--|
  C:   8.0 MHz (  1%) [      0] |-|
  D:   7.7 MHz (  1%) [      0] |-|
  E:   7.8 MHz (  1%) [      0] |-|
  F:   8.0 MHz (  1%) [      0] |--|
  G:  20.5 MHz (  2%) [2000000] |-----|
  H:   8.0 MHz (  1%) [      0] |--|
  Z:   7.9 MHz (  1%) [      0] |-|
------------
AVG:   8.4 MHz                  |==|

Implementation: priori_cast

  A: 953.3 MHz (100%) [2000000] |------------------------------------------------------------...
  B:   7.2 MHz (  1%) [      0] |-|
  C:   6.8 MHz (  1%) [      0] |-|
  D:   6.7 MHz (  1%) [      0] |-|
  E:   6.5 MHz (  1%) [      0] |-|
  F:   6.0 MHz (  1%) [      0] |-|
  G:   5.8 MHz (  1%) [2000000] |-|
  H:   9.0 MHz (  1%) [      0] |--|
  Z:   9.8 MHz (  1%) [      0] |--|
------------
AVG:   6.4 MHz                  |=|

Implementation: kcl_dynamic_cast

  A: 981.4 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  27.8 MHz (  3%) [2000000] |------|
  C:  27.7 MHz (  3%) [2000000] |------|
  D:  28.4 MHz (  3%) [2000000] |-------|
  E:  34.4 MHz (  4%) [2000000] |--------|
  F:  34.8 MHz (  4%) [2000000] |--------|
  G:  35.5 MHz (  4%) [2000000] |--------|
  H:  21.5 MHz (  2%) [      0] |-----|
  Z:  21.5 MHz (  2%) [      0] |-----|
------------
AVG:  25.7 MHz                  |======|

Cast type: Mostly failed (cast from class B)

Base-line: static_cast

  -: 994.5 MHz (104%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 981.4 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  20.2 MHz (  2%) [2000000] |-----|
  C:   7.9 MHz (  1%) [      0] |-|
  D:   7.9 MHz (  1%) [      0] |-|
  E:   7.7 MHz (  1%) [      0] |-|
  F:   7.7 MHz (  1%) [      0] |-|
  G:   7.8 MHz (  1%) [      0] |-|
  H:   7.8 MHz (  1%) [      0] |-|
  Z:   7.8 MHz (  1%) [      0] |-|
------------
AVG:   8.3 MHz                  |==|

Implementation: priori_cast

  A: 920.8 MHz ( 96%) [2000000] |------------------------------------------------------------...
  B:   6.9 MHz (  1%) [2000000] |-|
  C:   6.8 MHz (  1%) [      0] |-|
  D:   6.7 MHz (  1%) [      0] |-|
  E:   6.5 MHz (  1%) [      0] |-|
  F:   6.1 MHz (  1%) [      0] |-|
  G:   5.8 MHz (  1%) [      0] |-|
  H:   8.5 MHz (  1%) [      0] |--|
  Z:  10.2 MHz (  1%) [      0] |--|
------------
AVG:   6.4 MHz                  |=|

Implementation: kcl_dynamic_cast

  A: 984.3 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  35.0 MHz (  4%) [2000000] |--------|
  C:  28.3 MHz (  3%) [      0] |-------|
  D:  28.3 MHz (  3%) [      0] |-------|
  E:  28.2 MHz (  3%) [      0] |-------|
  F:  28.2 MHz (  3%) [      0] |-------|
  G:  28.1 MHz (  3%) [      0] |-------|
  H:  28.3 MHz (  3%) [      0] |-------|
  Z:  28.3 MHz (  3%) [      0] |-------|
------------
AVG:  25.9 MHz                  |======|

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 996.5 MHz (104%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 1003.5 MHz (105%) [2000000] |------------------------------------------------------------...
  B:   7.8 MHz (  1%) [ 286159] |-|
  C:   7.8 MHz (  1%) [ 285844] |-|
  D:   7.8 MHz (  1%) [ 285454] |-|
  E:   7.7 MHz (  1%) [ 285702] |-|
  F:   7.8 MHz (  1%) [ 285421] |-|
  G:   7.8 MHz (  1%) [ 286243] |-|
  H:   7.7 MHz (  1%) [      0] |-|
  Z:   8.0 MHz (  1%) [      0] |--|
------------
AVG:   6.9 MHz                  |=|

Implementation: priori_cast

  A: 966.2 MHz (101%) [2000000] |------------------------------------------------------------...
  B:   7.0 MHz (  1%) [ 286159] |-|
  C:   6.7 MHz (  1%) [ 285844] |-|
  D:   6.5 MHz (  1%) [ 285454] |-|
  E:   6.4 MHz (  1%) [ 285702] |-|
  F:   5.9 MHz (  1%) [ 285421] |-|
  G:   5.7 MHz (  1%) [ 286243] |-|
  H:   8.6 MHz (  1%) [      0] |--|
  Z:   9.7 MHz (  1%) [      0] |--|
------------
AVG:   6.3 MHz                  |=|

Implementation: kcl_dynamic_cast

  A: 769.8 MHz ( 80%) [2000000] |------------------------------------------------------------...
  B:  22.7 MHz (  2%) [1714823] |-----|
  C:  24.2 MHz (  3%) [1428664] |------|
  D:  24.3 MHz (  3%) [1142820] |------|
  E:  23.9 MHz (  3%) [ 857366] |------|
  F:  23.0 MHz (  2%) [ 571664] |-----|
  G:  22.0 MHz (  2%) [ 286243] |-----|
  H:  20.4 MHz (  2%) [      0] |-----|
  Z:  20.5 MHz (  2%) [      0] |-----|
------------
AVG:  20.1 MHz                  |=====|

Class hierarchy: balanced

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 985.7 MHz (103%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 988.1 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  11.4 MHz (  1%) [ 858197] |--|
  C:   7.4 MHz (  1%) [ 285844] |-|
  D:   7.3 MHz (  1%) [ 286062] |-|
  E:   9.4 MHz (  1%) [ 856053] |--|
  F:   7.3 MHz (  1%) [ 285411] |-|
  G:   7.4 MHz (  1%) [ 285525] |-|
  H:   6.9 MHz (  1%) [      0] |-|
  Z:   7.1 MHz (  1%) [      0] |-|
------------
AVG:   7.1 MHz                  |=|

Implementation: priori_cast

  A: 978.5 MHz (102%) [2000000] |------------------------------------------------------------...
  B:   5.1 MHz (  1%) [ 858197] |-|
  C:   5.4 MHz (  1%) [ 285844] |-|
  D:   5.1 MHz (  1%) [ 286062] |-|
  E:   4.8 MHz (  0%) [ 856053] |-|
  F:   4.7 MHz (  0%) [ 285411] |-|
  G:   5.0 MHz (  1%) [ 285525] |-|
  H:   8.8 MHz (  1%) [      0] |--|
  Z:   9.7 MHz (  1%) [      0] |--|
------------
AVG:   5.4 MHz                  |=|

Implementation: kcl_dynamic_cast

  A: 934.1 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B:  24.4 MHz (  3%) [ 858197] |------|
  C:  23.0 MHz (  2%) [ 285844] |-----|
  D:  23.1 MHz (  2%) [ 286062] |-----|
  E:  24.1 MHz (  3%) [ 856053] |------|
  F:  22.0 MHz (  2%) [ 285411] |-----|
  G:  23.1 MHz (  2%) [ 285525] |-----|
  H:  22.2 MHz (  2%) [      0] |-----|
  Z:  22.3 MHz (  2%) [      0] |-----|
------------
AVG:  20.5 MHz                  |=====|

Debian 10: System monitoring using e-mail (Exim as a smarthost)

Recently, IT infrastructure monitoring tools have been springing up like mushrooms after a rain. But let’s take a step back and look at a traditional and very basic way to monitor a system – using e-mail. Yes, you heard right; that internet app invented in the 1960s!

For some events and incidents on a Debian system, the superuser of the system is still, by default, informed via e-mail. For example, mdadm shoots an e-mail to root when there is a software RAID array degradation. But by default, these e-mails are just dumped to a mail spool file, never inspected. This is really bad when your hard disk just died, and when you were never informed that the other one in the RAID array died too just a couple of months ago!

It has been 8 years since I wrote last about how to configure my favorite Mail Transport Agent, Exim. Back then, as an e-mail service provider, I wanted to run Exim as a production MTA; receiving, sending, and relaying e-mail in the internet. It was a daunting task, to say the least.

This time, however, I just want to run Exim locally, behind a firewall. It should receive locally submitted e-mail, and rather than dump the e-mails to a local mail spool file, forward it to a remote e-mail address, where I can read it on my phone. This mode of an MTA is called smarthost.

A smarthost is a local e-mail MTA/server which receives SMTP messages to any e-mail address locally, and forwards them as an authenticated client to the next MTA (that is why it must be “smart” – it must have the right credentials). The submission must be authenticated (submitting username and password), because it is unlikely that your local IP address from your ISP is not in any e-mail blacklist. And internet-facing MTAs will refuse to accept e-mails from such IP addresses.

It was not so straightforward to accomplish this, as always. But it is totally doable, and in a short time, too. This blog post documents the steps for Debian 10. They should continue to work in later Debian versions.

You will need:

  • A working e-mail account at an e-mail provider of your choice (SMTP server FQDN, port number, username and password). It is advisable that you generate a dedicated username and password. Do not use your primary username and password! If the credentials leak or are stolen (they will be stored on the local hard drive, readable only by root), then you can simply disable the affected e-mail account without further impact. Furthermore, for security, we are going to require that the SMTP server offers a port featuring TLS-on-connect, as suggested by RFC-8314 (“cleartext considered obsolete”).
  • Debian 10 (newer should work too, because the fundamentals rarely change)
  • Internet connection
  • 30 minutes of time
  • root access. Do all the following steps as root.

First, make sure that the FQDN of the SMTP server of your e-mail provider is not a DNS alias, otherwise domain matching inside of Exim will not work (Exim often uses reverse-DNS lookups to find actual FQDNs). Check this by entering:

host smtp.example.com

If this prints "smtp.example.com" is an alias for "mail.example.com", then resolve all aliases and choose the final FQDN. Another way to get the final FQDN is by doing a reverse DNS lookup on the server’s IP address, e.g. by running:

dig -x <ip address>

Use the printed FQDN to the right of “PTR”. In our example, this is mail.example.com.

Now, let’s actually start by installing Exim (exim4-daemon-light is enough):

apt install exim4

Next, configure exim4:

dpkg-reconfigure exim4-config

At the prompts, do the following:

  1. Select the option “mail sent by smarthost; no local mail”
  2. For “System mail name” leave the pre-filled hostname or PQDN
  3. “IP-addresses to listen on”: leave leave the default “127.0.0.1;::1”.
  4. “Other destinations for which mail is accepted”: leave the pre-filled hostname or PQDN.
  5. “Visible domain name for local users”: leave the pre-filled hostname or PQDN.
  6. For “IP address or host name of the outgoing smarthost” enter the final FQDN which you have found previously, plus a double colon ::, plus the port number. For example: mail.example.com::465
  7. “Keep number of DNS-queries minimal”: leave the default “No”.
  8. “Split configuration into small files”: leave the default “No”.

Next, generate certificates for Exim by running the command below. These will be used for the TLS connections. To test, you can leave defaults for all the questions which the following command asks you. Later, you could upgrade to more professional certificates:

/usr/share/doc/exim4-base/examples/exim-gencert

Next, define the recipient e-mail address as an alias of the system user root. Add the following line to /etc/aliases:

root: recipient@example.com

The remote SMTP server may reject mail without a proper “From:” address. Usually, the server expects the address in the “From:” field to have the same domain name as the MTA itself. For this reason, add the following line to /etc/email-addresses.

root: sender@example.com

Next, add credentials for the SMTP submission to the file /etc/exim4/passwd.client in the format <FQDN>:<username>:<password> For example:

mail.example.com:username:hackme

The default configuration of Exim shipped in the Debian package is excellent and very flexible. But TLS is not enabled by default, and so we still need to make a few adaptations to the default config file.

Add the following lines to /etc/exim4/exim4.conf.localmacros. Create the file if it doesn’t exist:

# Enable TLS
MAIN_TLS_ENABLE = 1

# Require TLS for all remote hosts (STARTTLS or TLS-on-connect)
REMOTE_SMTP_SMARTHOST_HOSTS_REQUIRE_TLS = *

# Require TLS-on-connect for RFC-8314
REMOTE_SMTP_SMARTHOST_REQUIRE_PROTOCOL = smtps

Next, make the following adaption to the file /etc/exim4/exim4.conf.template. After .ifdef REMOTE_SMTP_SMARTHOST_HOSTS_REQUIRE_TLS.endif add the following:

.ifdef REMOTE_SMTP_SMARTHOST_REQUIRE_PROTOCOL
  protocol = REMOTE_SMTP_SMARTHOST_REQUIRE_PROTOCOL
.endif

Run update-exim4.conf and systemctl restart exim4. The final configuration file is written to /var/lib/exim4/config.autogenerated.

Testing with an Exim test instance

To test your setup, you can start a test instance of Exim in the local root console, listening on port 26. It will use the same configuration file, but runs in parallel and independently from the already running Exim daemon. Run as root:

exim -bd -d -oX 26

Then, you can use swaks (from the swaks Debian package) to send a test e-mail to the test instance. The output will be very verbose, so you can easily debug:

swaks --from root@localhost --to root@localhost --port 26

This will send an e-mail to the address looked up under the key “root” in the file /etc/aliases (in our case, recipient@example.com). The “From:” header address will be looked up under the key “root” in the file /etc/email-addresses (in our case, sender@example.com).

See if you got the e-mail in the target inbox. If yes, then testing with the production Exim daemon should work too.

Testing with the Exim daemon

Observe the output of …

tail -f /var/log/exim4/mainlog

… and then again send a test e-mail using swaks, but this time to the default port 25:

swaks --from root@localhost --to root@localhost

If you got the e-mail, then congratulations! You will now receive e-mails directed at the local root user. You can easily extend this for other, unprivileged users of the system.

To test the entire chain and make sure that you will always be informed by important events, you could send a test e-mail in periodic intervals. But, this is a topic for a future blog post!

Scripting

You could wrap the above swaks command in a shell script to send e-mail from other scripts. For example:

#!/bin/sh
swaks --from root@localhost --to root@localhost --header "Subject: [$(hostname)] $1" --body "Body: $2"

Warning: You could also use the older tools sendmail or mail to write a similar script, but both directly call exim4 under the hood, which is fine as long as you don’t call such a script from a systemd unit (e.g. a service or a timer). Because seemingly, Exim, when called from the command line to submit e-mail, forks and detaches a short-lived process in order to deliver the e-mail to the target MTA. But systemd kills all sub-processes as soon as the main process exits. The process lives just long enough that the e-mail is put into Exim’s queue, but it is never executed. swaks on the other hand delivers the mail to Exim via SMTP and the delivery process is not subject to be killed by systemd.

You should also rate-limit Exim to protect against DoS attacks, but this is also a topic for a future blog post!