Performance comparison of three different implementations of dynamic_cast in C++

I intended to center my new software design around dynamic_cast when I found repeated mention of allegedly poor performance of it, as well as outright exclamations that one’s software design turns ‘poor’ the second introducing it.

I took the warnings seriously at first but wanted to know more. Starting to dig deeper, I found only assertions which are either not backed up at all, or assertions based on invalid comparisons, for example comparing the run-time feature dynamic_cast with the compile-time feature reinterpret_cast. Well sure, compared to a zero-cost operation, everything is infinitely slower!

But these circumstances are well-known:

The field of performance is littered with myth and bogus folklore.

C++ Core Guidelines, Per.6

In trying to decide if my software design is indeed ‘allowed’ to center around dynamic casting, I was in need of good measurements. So, I followed C++ Core Guideline Per.6: Don’t make claims about performance without measurements and wrote a benchmark program to compare 3 different implementation of dynamic_cast. It compares:

Method

Writing a reasonable benchmark program was indeed harder than I anticipated, as predicted:

Getting good performance measurements can be hard and require specialized tools.

Per.6 of the C++ Core Guidelines

One of the hard parts was to choose a realistic unit of work which can serve as a comparable base-line. The emphasis here is on “realistic”; for example, I don’t think that repeatedly doing the same type of cast on the same object is very realistic.

This unit of work should also be as small as possible to maximize the performance impact of dynamic casting:

  1. Do a dynamic_cast<Base*> on an object inheriting from Base. This has zero runtime cost because the cast will always succeed; the compiler can optimize this away, and it is equivalent to a static_cast<Base*>.
  2. If the cast is successful, increment a running total; if not, increment another running total. It is important to spend the same number of cycles in both branches, to not skew the view at the latency of the cast.

This unit of work is done repeatedly by iterating over 2 million individual objects (which are instances of class Base or of its sub-types) stored in a std::vector<std::shared_ptr<Base>> . The objects are in contiguous memory, but can be aligned, or shuffled; on modern hardware architectures, this makes a big difference due to cache prefetching.

This base-line is then compared to cases were only one parameter is varied: the target class of dynamic_cast.

Benchmark program

The program uses 3 different class hierarchies to cover the most typical casting scenarios: Deep, shallow, and balanced.

Deep hierarchy:

A ← B ← C ← D ← E ← F ← G ← H

Shallow hierarchy:

A ← B
A ← C
A ← D
A ← E
A ← F
A ← G
A ← H

Balanced hierarchy:

A ← B
A ← E

B ← C
B ← D

E ← F
E ← G
E ← H

I made the source code of the benchmark program available. See https://github.com/michaelfranzl/dynamic_cast_benchmark

There may be mistakes in the program. If you discover one, please leave a comment below or contribute a merge request.

The output of an example run on the program can be found below.

Findings

If anything is proven here, then this:

Modern hardware and optimizers defy naive assumptions; even experts are regularly surprised.

Per.6 of the C++ Core Guidelines

I tried to extract out of the program output below general observations; this is somewhat possible, but there are exceptions, and a few odd items.

  1. It is clear that in all 3 implementations, dynamic casting of an object inheriting from Base to the target class Base is as fast as the base-line.
  2. kcl_dynamic_cast performs best in all scenarios.
  3. The ordering of the objects in memory has a significant influence on performance. The worst case for dynamic_cast with ordered objects performs about the same as the best case for kcl_dynamic_cast with shuffled objects.
  4. In priori_cast, the latency grows with the distance between the target class and the base class. However, this characteristic is less pronounced when the objects are randomly shuffled in memory, or when the class hierarchy is not deep. The latency is independent of the operand class (independent of the cast being successful or not), which makes it a constant-time operation.
  5. In dynamic_cast, for successful casts, the latency grows with the distance between the operand class and the base class (about 16% of base-line speed for casts to the same class when objects are aligned, and 2% when objects are shuffled). Non-successful casts have a latency independent of the target class.

If you think that I’ve made a mistake in interpreting the results, please leave a comment below.

Program output

The results were generated on an AMD Ryzen 5 3600 CPU, with the frequency fixed at 3600 MHz.

The unit of performance is given in MHz, i.e. million “units of work” per second. Percentages are given relative to the base-line. The horizontal bars illustrate the given percentages.

Run 1 (objects aligned)

Class hierarchy: deep

Cast type: Mostly successful (cast from class G)

Base-line: static_cast

  -: 987.2 MHz (100%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 969.9 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B:  27.6 MHz (  3%) [2000000] |------|
  C:  32.9 MHz (  3%) [2000000] |-------|
  D:  40.7 MHz (  4%) [2000000] |---------|
  E:  54.1 MHz (  5%) [2000000] |-------------|
  F:  79.4 MHz (  8%) [2000000] |-------------------|
  G: 152.0 MHz ( 15%) [2000000] |------------------------------------|
  H:  22.8 MHz (  2%) [      0] |-----|
  Z:  22.1 MHz (  2%) [      0] |-----|
------------
AVG:  48.0 MHz                  |===========|

Implementation: priori_cast

  A: 912.0 MHz ( 92%) [2000000] |------------------------------------------------------------...
  B: 120.8 MHz ( 12%) [2000000] |-----------------------------|
  C:  79.8 MHz (  8%) [2000000] |-------------------|
  D:  58.0 MHz (  6%) [2000000] |--------------|
  E:  42.6 MHz (  4%) [2000000] |----------|
  F:  35.6 MHz (  4%) [2000000] |--------|
  G:  32.4 MHz (  3%) [2000000] |-------|
  H:   9.0 MHz (  1%) [      0] |--|
  Z:   9.0 MHz (  1%) [      0] |--|
------------
AVG:  43.0 MHz                  |==========|

Implementation: kcl_dynamic_cast

  A: 963.9 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B: 190.0 MHz ( 19%) [2000000] |----------------------------------------------|
  C: 188.4 MHz ( 19%) [2000000] |---------------------------------------------|
  D: 196.3 MHz ( 20%) [2000000] |-----------------------------------------------|
  E: 199.5 MHz ( 20%) [2000000] |------------------------------------------------|
  F: 224.9 MHz ( 23%) [2000000] |------------------------------------------------------|
  G: 220.6 MHz ( 22%) [2000000] |-----------------------------------------------------|
  H: 146.7 MHz ( 15%) [      0] |-----------------------------------|
  Z: 154.1 MHz ( 16%) [      0] |-------------------------------------|
------------
AVG: 168.9 MHz                  |=========================================|

Cast type: Mostly failed (cast from class B)

Base-line: static_cast

  -: 989.6 MHz (100%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 973.2 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B: 149.9 MHz ( 15%) [2000000] |------------------------------------|
  C:  70.5 MHz (  7%) [      0] |-----------------|
  D:  69.0 MHz (  7%) [      0] |----------------|
  E:  70.0 MHz (  7%) [      0] |-----------------|
  F:  68.7 MHz (  7%) [      0] |----------------|
  G:  69.5 MHz (  7%) [      0] |----------------|
  H:  68.6 MHz (  7%) [      0] |----------------|
  Z:  68.3 MHz (  7%) [      0] |----------------|
------------
AVG:  70.5 MHz                  |=================|

Implementation: priori_cast

  A: 970.9 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B: 120.7 MHz ( 12%) [2000000] |-----------------------------|
  C:  77.4 MHz (  8%) [      0] |------------------|
  D:  54.7 MHz (  6%) [      0] |-------------|
  E:  45.3 MHz (  5%) [      0] |-----------|
  F:  37.8 MHz (  4%) [      0] |---------|
  G:  33.0 MHz (  3%) [      0] |--------|
  H:   9.2 MHz (  1%) [      0] |--|
  Z:   9.1 MHz (  1%) [      0] |--|
------------
AVG:  43.0 MHz                  |==========|

Implementation: kcl_dynamic_cast

  A: 974.2 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B: 239.5 MHz ( 24%) [2000000] |----------------------------------------------------------|
  C: 192.0 MHz ( 19%) [      0] |----------------------------------------------|
  D: 190.4 MHz ( 19%) [      0] |----------------------------------------------|
  E: 194.9 MHz ( 20%) [      0] |-----------------------------------------------|
  F: 198.2 MHz ( 20%) [      0] |------------------------------------------------|
  G: 194.8 MHz ( 20%) [      0] |-----------------------------------------------|
  H: 199.0 MHz ( 20%) [      0] |------------------------------------------------|
  Z: 187.9 MHz ( 19%) [      0] |---------------------------------------------|
------------
AVG: 177.4 MHz                  |===========================================|

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 997.0 MHz (101%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 943.8 MHz ( 96%) [2000000] |------------------------------------------------------------...
  B:  39.2 MHz (  4%) [1714818] |---------|
  C:  44.4 MHz (  4%) [1428952] |----------|
  D:  47.2 MHz (  5%) [1144295] |-----------|
  E:  46.0 MHz (  5%) [ 858212] |-----------|
  F:  42.9 MHz (  4%) [ 572465] |----------|
  G:  37.3 MHz (  4%) [ 286584] |---------|
  H:  31.0 MHz (  3%) [      0] |-------|
  Z:  29.2 MHz (  3%) [      0] |-------|
------------
AVG:  35.3 MHz                  |========|

Implementation: priori_cast

  A: 980.9 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B: 105.5 MHz ( 11%) [1714818] |-------------------------|
  C:  67.7 MHz (  7%) [1428952] |----------------|
  D:  45.7 MHz (  5%) [1144295] |-----------|
  E:  37.2 MHz (  4%) [ 858212] |---------|
  F:  33.1 MHz (  3%) [ 572465] |--------|
  G:  31.1 MHz (  3%) [ 286584] |-------|
  H:   9.0 MHz (  1%) [      0] |--|
  Z:   8.9 MHz (  1%) [      0] |--|
------------
AVG:  37.6 MHz                  |=========|

Implementation: kcl_dynamic_cast

  A: 827.8 MHz ( 84%) [2000000] |------------------------------------------------------------...
  B:  86.2 MHz (  9%) [1714818] |--------------------|
  C:  90.0 MHz (  9%) [1428952] |---------------------|
  D:  93.2 MHz (  9%) [1144295] |----------------------|
  E:  92.9 MHz (  9%) [ 858212] |----------------------|
  F:  86.2 MHz (  9%) [ 572465] |--------------------|
  G:  81.3 MHz (  8%) [ 286584] |-------------------|
  H:  78.4 MHz (  8%) [      0] |-------------------|
  Z:  78.7 MHz (  8%) [      0] |-------------------|
------------
AVG:  76.3 MHz                  |==================|

Class hierarchy: shallow

Cast type: Mostly successful (cast from class G)

Base-line: static_cast

  -: 864.3 MHz ( 88%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 916.6 MHz ( 93%) [2000000] |------------------------------------------------------------...
  B:  69.8 MHz (  7%) [      0] |----------------|
  C:  69.0 MHz (  7%) [      0] |----------------|
  D:  62.6 MHz (  6%) [      0] |---------------|
  E:  62.0 MHz (  6%) [      0] |---------------|
  F:  69.8 MHz (  7%) [      0] |----------------|
  G: 147.5 MHz ( 15%) [2000000] |-----------------------------------|
  H:  69.9 MHz (  7%) [      0] |----------------|
  Z:  68.9 MHz (  7%) [      0] |----------------|
------------
AVG:  68.8 MHz                  |================|

Implementation: priori_cast

  A: 961.1 MHz ( 97%) [2000000] |------------------------------------------------------------...
  B:  26.0 MHz (  3%) [      0] |------|
  C:  23.6 MHz (  2%) [      0] |-----|
  D:  20.5 MHz (  2%) [      0] |----|
  E:  19.5 MHz (  2%) [      0] |----|
  F:  15.6 MHz (  2%) [      0] |---|
  G:  14.0 MHz (  1%) [2000000] |---|
  H:   8.6 MHz (  1%) [      0] |--|
  Z:   9.5 MHz (  1%) [      0] |--|
------------
AVG:  15.3 MHz                  |===|

Implementation: kcl_dynamic_cast

  A: 984.7 MHz (100%) [2000000] |------------------------------------------------------------...
  B: 180.4 MHz ( 18%) [2000000] |-------------------------------------------|
  C: 188.6 MHz ( 19%) [2000000] |---------------------------------------------|
  D: 203.0 MHz ( 21%) [2000000] |-------------------------------------------------|
  E: 210.0 MHz ( 21%) [2000000] |---------------------------------------------------|
  F: 214.8 MHz ( 22%) [2000000] |----------------------------------------------------|
  G: 225.9 MHz ( 23%) [2000000] |------------------------------------------------------|
  H: 152.4 MHz ( 15%) [      0] |-------------------------------------|
  Z: 162.5 MHz ( 16%) [      0] |---------------------------------------|
------------
AVG: 170.8 MHz                  |=========================================|

Cast type: Mostly failed (cast from class B)

Base-line: static_cast

  -: 950.1 MHz ( 96%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 962.0 MHz ( 97%) [2000000] |------------------------------------------------------------...
  B: 150.8 MHz ( 15%) [2000000] |------------------------------------|
  C:  68.0 MHz (  7%) [      0] |----------------|
  D:  68.8 MHz (  7%) [      0] |----------------|
  E:  67.0 MHz (  7%) [      0] |----------------|
  F:  68.6 MHz (  7%) [      0] |----------------|
  G:  67.2 MHz (  7%) [      0] |----------------|
  H:  68.8 MHz (  7%) [      0] |----------------|
  Z:  67.0 MHz (  7%) [      0] |----------------|
------------
AVG:  69.6 MHz                  |================|

Implementation: priori_cast

  A: 937.2 MHz ( 95%) [2000000] |------------------------------------------------------------...
  B:  25.3 MHz (  3%) [2000000] |------|
  C:  22.3 MHz (  2%) [      0] |-----|
  D:  20.2 MHz (  2%) [      0] |----|
  E:  18.9 MHz (  2%) [      0] |----|
  F:  15.6 MHz (  2%) [      0] |---|
  G:  14.4 MHz (  1%) [      0] |---|
  H:   8.9 MHz (  1%) [      0] |--|
  Z:   9.9 MHz (  1%) [      0] |--|
------------
AVG:  15.1 MHz                  |===|

Implementation: kcl_dynamic_cast

  A: 969.9 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B: 231.2 MHz ( 23%) [2000000] |--------------------------------------------------------|
  C: 193.5 MHz ( 20%) [      0] |-----------------------------------------------|
  D: 192.5 MHz ( 20%) [      0] |----------------------------------------------|
  E: 186.5 MHz ( 19%) [      0] |---------------------------------------------|
  F: 183.6 MHz ( 19%) [      0] |--------------------------------------------|
  G: 184.4 MHz ( 19%) [      0] |--------------------------------------------|
  H: 184.4 MHz ( 19%) [      0] |--------------------------------------------|
  Z: 192.2 MHz ( 19%) [      0] |----------------------------------------------|
------------
AVG: 172.1 MHz                  |=========================================|

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 1004.5 MHz (102%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 966.2 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B:  69.4 MHz (  7%) [ 286159] |----------------|
  C:  69.1 MHz (  7%) [ 285844] |----------------|
  D:  63.3 MHz (  6%) [ 285454] |---------------|
  E:  57.7 MHz (  6%) [ 285702] |--------------|
  F:  63.2 MHz (  6%) [ 285421] |---------------|
  G:  58.1 MHz (  6%) [ 286243] |--------------|
  H:  58.0 MHz (  6%) [      0] |--------------|
  Z:  68.2 MHz (  7%) [      0] |----------------|
------------
AVG:  56.3 MHz                  |=============|

Implementation: priori_cast

  A: 940.7 MHz ( 95%) [2000000] |------------------------------------------------------------...
  B:  25.7 MHz (  3%) [ 286159] |------|
  C:  23.2 MHz (  2%) [ 285844] |-----|
  D:  19.8 MHz (  2%) [ 285454] |----|
  E:  18.4 MHz (  2%) [ 285702] |----|
  F:  15.3 MHz (  2%) [ 285421] |---|
  G:  14.4 MHz (  1%) [ 286243] |---|
  H:   8.5 MHz (  1%) [      0] |--|
  Z:   9.4 MHz (  1%) [      0] |--|
------------
AVG:  15.0 MHz                  |===|

Implementation: kcl_dynamic_cast

  A: 889.3 MHz ( 90%) [2000000] |------------------------------------------------------------...
  B:  94.5 MHz ( 10%) [1714823] |----------------------|
  C:  96.8 MHz ( 10%) [1428664] |-----------------------|
  D:  94.0 MHz ( 10%) [1142820] |----------------------|
  E:  92.8 MHz (  9%) [ 857366] |----------------------|
  F:  89.0 MHz (  9%) [ 571664] |---------------------|
  G:  86.9 MHz (  9%) [ 286243] |---------------------|
  H:  82.9 MHz (  8%) [      0] |--------------------|
  Z:  83.9 MHz (  9%) [      0] |--------------------|
------------
AVG:  80.1 MHz                  |===================|

Class hierarchy: balanced

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 980.9 MHz ( 99%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 984.7 MHz (100%) [2000000] |------------------------------------------------------------...
  B:  48.8 MHz (  5%) [ 858197] |-----------|
  C:  43.5 MHz (  4%) [ 285844] |----------|
  D:  42.4 MHz (  4%) [ 286062] |----------|
  E:  44.3 MHz (  4%) [ 856053] |----------|
  F:  46.3 MHz (  5%) [ 285411] |-----------|
  G:  47.8 MHz (  5%) [ 285525] |-----------|
  H:  39.8 MHz (  4%) [      0] |---------|
  Z:  42.0 MHz (  4%) [      0] |----------|
------------
AVG:  39.4 MHz                  |=========|

Implementation: priori_cast

  A: 982.3 MHz (100%) [2000000] |------------------------------------------------------------...
  B:  11.8 MHz (  1%) [ 858197] |--|
  C:  12.3 MHz (  1%) [ 285844] |--|
  D:  10.8 MHz (  1%) [ 286062] |--|
  E:  10.2 MHz (  1%) [ 856053] |--|
  F:   9.4 MHz (  1%) [ 285411] |--|
  G:   9.9 MHz (  1%) [ 285525] |--|
  H:   8.8 MHz (  1%) [      0] |--|
  Z:   9.7 MHz (  1%) [      0] |--|
------------
AVG:   9.2 MHz                  |==|

Implementation: kcl_dynamic_cast

  A: 922.1 MHz ( 93%) [2000000] |------------------------------------------------------------...
  B:  95.4 MHz ( 10%) [ 858197] |-----------------------|
  C:  90.9 MHz (  9%) [ 285844] |----------------------|
  D:  90.9 MHz (  9%) [ 286062] |----------------------|
  E:  92.4 MHz (  9%) [ 856053] |----------------------|
  F:  88.0 MHz (  9%) [ 285411] |---------------------|
  G:  89.0 MHz (  9%) [ 285525] |---------------------|
  H:  88.3 MHz (  9%) [      0] |---------------------|
  Z:  88.0 MHz (  9%) [      0] |---------------------|
------------
AVG:  80.3 MHz                  |===================|

Run 2 (objects shuffled)

Class hierarchy: deep

Cast type: Mostly successful (cast from class G)

Base-line: static_cast

  -: 956.5 MHz (100%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 945.6 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B:   6.4 MHz (  1%) [2000000] |-|
  C:   6.7 MHz (  1%) [2000000] |-|
  D:   7.0 MHz (  1%) [2000000] |-|
  E:   7.4 MHz (  1%) [2000000] |-|
  F:  13.3 MHz (  1%) [2000000] |---|
  G:  20.3 MHz (  2%) [2000000] |-----|
  H:   6.2 MHz (  1%) [      0] |-|
  Z:   6.1 MHz (  1%) [      0] |-|
------------
AVG:   8.1 MHz                  |==|

Implementation: priori_cast

  A: 971.3 MHz (102%) [2000000] |------------------------------------------------------------...
  B:  14.4 MHz (  2%) [2000000] |---|
  C:  13.8 MHz (  1%) [2000000] |---|
  D:   7.7 MHz (  1%) [2000000] |-|
  E:   7.4 MHz (  1%) [2000000] |-|
  F:   7.3 MHz (  1%) [2000000] |-|
  G:   7.1 MHz (  1%) [2000000] |-|
  H:   9.6 MHz (  1%) [      0] |--|
  Z:   9.8 MHz (  1%) [      0] |--|
------------
AVG:   8.6 MHz                  |==|

Implementation: kcl_dynamic_cast

  A: 951.5 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B:  27.3 MHz (  3%) [2000000] |------|
  C:  27.6 MHz (  3%) [2000000] |------|
  D:  28.0 MHz (  3%) [2000000] |-------|
  E:  28.7 MHz (  3%) [2000000] |-------|
  F:  34.2 MHz (  4%) [2000000] |--------|
  G:  35.0 MHz (  4%) [2000000] |--------|
  H:  21.1 MHz (  2%) [      0] |-----|
  Z:  21.2 MHz (  2%) [      0] |-----|
------------
AVG:  24.8 MHz                  |======|

Cast type: Mostly failed (cast from class B)

Base-line: static_cast

  -: 932.0 MHz ( 97%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 924.6 MHz ( 97%) [2000000] |------------------------------------------------------------...
  B:  20.4 MHz (  2%) [2000000] |-----|
  C:   7.9 MHz (  1%) [      0] |-|
  D:   7.9 MHz (  1%) [      0] |-|
  E:   7.9 MHz (  1%) [      0] |-|
  F:   7.9 MHz (  1%) [      0] |-|
  G:   7.8 MHz (  1%) [      0] |-|
  H:   7.9 MHz (  1%) [      0] |-|
  Z:   7.9 MHz (  1%) [      0] |-|
------------
AVG:   8.4 MHz                  |==|

Implementation: priori_cast

  A: 981.8 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  14.7 MHz (  2%) [2000000] |---|
  C:  14.1 MHz (  1%) [      0] |---|
  D:   7.8 MHz (  1%) [      0] |-|
  E:   7.6 MHz (  1%) [      0] |-|
  F:   7.4 MHz (  1%) [      0] |-|
  G:   7.3 MHz (  1%) [      0] |-|
  H:   9.4 MHz (  1%) [      0] |--|
  Z:   9.6 MHz (  1%) [      0] |--|
------------
AVG:   8.7 MHz                  |==|

Implementation: kcl_dynamic_cast

  A: 966.2 MHz (101%) [2000000] |------------------------------------------------------------...
  B:  34.5 MHz (  4%) [2000000] |--------|
  C:  27.9 MHz (  3%) [      0] |-------|
  D:  27.9 MHz (  3%) [      0] |-------|
  E:  28.0 MHz (  3%) [      0] |-------|
  F:  28.0 MHz (  3%) [      0] |-------|
  G:  28.0 MHz (  3%) [      0] |-------|
  H:  27.9 MHz (  3%) [      0] |-------|
  Z:  27.5 MHz (  3%) [      0] |------|
------------
AVG:  25.5 MHz                  |======|

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 985.2 MHz (103%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 962.5 MHz (101%) [2000000] |------------------------------------------------------------...
  B:   7.0 MHz (  1%) [1714818] |-|
  C:   7.3 MHz (  1%) [1428952] |-|
  D:   7.4 MHz (  1%) [1144295] |-|
  E:   7.4 MHz (  1%) [ 858212] |-|
  F:   7.3 MHz (  1%) [ 572465] |-|
  G:   7.0 MHz (  1%) [ 286584] |-|
  H:   6.7 MHz (  1%) [      0] |-|
  Z:   6.5 MHz (  1%) [      0] |-|
------------
AVG:   6.3 MHz                  |=|

Implementation: priori_cast

  A: 981.4 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  13.8 MHz (  1%) [1714818] |---|
  C:  12.7 MHz (  1%) [1428952] |---|
  D:   6.8 MHz (  1%) [1144295] |-|
  E:   6.7 MHz (  1%) [ 858212] |-|
  F:   7.0 MHz (  1%) [ 572465] |-|
  G:   7.0 MHz (  1%) [ 286584] |-|
  H:   9.1 MHz (  1%) [      0] |--|
  Z:  10.0 MHz (  1%) [      0] |--|
------------
AVG:   8.1 MHz                  |==|

Implementation: kcl_dynamic_cast

  A: 945.2 MHz ( 99%) [2000000] |------------------------------------------------------------...
  B:  22.6 MHz (  2%) [1714818] |-----|
  C:  23.8 MHz (  2%) [1428952] |-----|
  D:  24.0 MHz (  3%) [1144295] |------|
  E:  23.6 MHz (  2%) [ 858212] |-----|
  F:  22.7 MHz (  2%) [ 572465] |-----|
  G:  21.4 MHz (  2%) [ 286584] |-----|
  H:  20.2 MHz (  2%) [      0] |-----|
  Z:  20.3 MHz (  2%) [      0] |-----|
------------
AVG:  19.8 MHz                  |====|

Class hierarchy: shallow

Cast type: Mostly successful (cast from class G)

Base-line: static_cast

  -: 980.4 MHz (103%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 999.5 MHz (104%) [2000000] |------------------------------------------------------------...
  B:   8.0 MHz (  1%) [      0] |--|
  C:   8.0 MHz (  1%) [      0] |-|
  D:   7.7 MHz (  1%) [      0] |-|
  E:   7.8 MHz (  1%) [      0] |-|
  F:   8.0 MHz (  1%) [      0] |--|
  G:  20.5 MHz (  2%) [2000000] |-----|
  H:   8.0 MHz (  1%) [      0] |--|
  Z:   7.9 MHz (  1%) [      0] |-|
------------
AVG:   8.4 MHz                  |==|

Implementation: priori_cast

  A: 953.3 MHz (100%) [2000000] |------------------------------------------------------------...
  B:   7.2 MHz (  1%) [      0] |-|
  C:   6.8 MHz (  1%) [      0] |-|
  D:   6.7 MHz (  1%) [      0] |-|
  E:   6.5 MHz (  1%) [      0] |-|
  F:   6.0 MHz (  1%) [      0] |-|
  G:   5.8 MHz (  1%) [2000000] |-|
  H:   9.0 MHz (  1%) [      0] |--|
  Z:   9.8 MHz (  1%) [      0] |--|
------------
AVG:   6.4 MHz                  |=|

Implementation: kcl_dynamic_cast

  A: 981.4 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  27.8 MHz (  3%) [2000000] |------|
  C:  27.7 MHz (  3%) [2000000] |------|
  D:  28.4 MHz (  3%) [2000000] |-------|
  E:  34.4 MHz (  4%) [2000000] |--------|
  F:  34.8 MHz (  4%) [2000000] |--------|
  G:  35.5 MHz (  4%) [2000000] |--------|
  H:  21.5 MHz (  2%) [      0] |-----|
  Z:  21.5 MHz (  2%) [      0] |-----|
------------
AVG:  25.7 MHz                  |======|

Cast type: Mostly failed (cast from class B)

Base-line: static_cast

  -: 994.5 MHz (104%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 981.4 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  20.2 MHz (  2%) [2000000] |-----|
  C:   7.9 MHz (  1%) [      0] |-|
  D:   7.9 MHz (  1%) [      0] |-|
  E:   7.7 MHz (  1%) [      0] |-|
  F:   7.7 MHz (  1%) [      0] |-|
  G:   7.8 MHz (  1%) [      0] |-|
  H:   7.8 MHz (  1%) [      0] |-|
  Z:   7.8 MHz (  1%) [      0] |-|
------------
AVG:   8.3 MHz                  |==|

Implementation: priori_cast

  A: 920.8 MHz ( 96%) [2000000] |------------------------------------------------------------...
  B:   6.9 MHz (  1%) [2000000] |-|
  C:   6.8 MHz (  1%) [      0] |-|
  D:   6.7 MHz (  1%) [      0] |-|
  E:   6.5 MHz (  1%) [      0] |-|
  F:   6.1 MHz (  1%) [      0] |-|
  G:   5.8 MHz (  1%) [      0] |-|
  H:   8.5 MHz (  1%) [      0] |--|
  Z:  10.2 MHz (  1%) [      0] |--|
------------
AVG:   6.4 MHz                  |=|

Implementation: kcl_dynamic_cast

  A: 984.3 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  35.0 MHz (  4%) [2000000] |--------|
  C:  28.3 MHz (  3%) [      0] |-------|
  D:  28.3 MHz (  3%) [      0] |-------|
  E:  28.2 MHz (  3%) [      0] |-------|
  F:  28.2 MHz (  3%) [      0] |-------|
  G:  28.1 MHz (  3%) [      0] |-------|
  H:  28.3 MHz (  3%) [      0] |-------|
  Z:  28.3 MHz (  3%) [      0] |-------|
------------
AVG:  25.9 MHz                  |======|

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 996.5 MHz (104%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 1003.5 MHz (105%) [2000000] |------------------------------------------------------------...
  B:   7.8 MHz (  1%) [ 286159] |-|
  C:   7.8 MHz (  1%) [ 285844] |-|
  D:   7.8 MHz (  1%) [ 285454] |-|
  E:   7.7 MHz (  1%) [ 285702] |-|
  F:   7.8 MHz (  1%) [ 285421] |-|
  G:   7.8 MHz (  1%) [ 286243] |-|
  H:   7.7 MHz (  1%) [      0] |-|
  Z:   8.0 MHz (  1%) [      0] |--|
------------
AVG:   6.9 MHz                  |=|

Implementation: priori_cast

  A: 966.2 MHz (101%) [2000000] |------------------------------------------------------------...
  B:   7.0 MHz (  1%) [ 286159] |-|
  C:   6.7 MHz (  1%) [ 285844] |-|
  D:   6.5 MHz (  1%) [ 285454] |-|
  E:   6.4 MHz (  1%) [ 285702] |-|
  F:   5.9 MHz (  1%) [ 285421] |-|
  G:   5.7 MHz (  1%) [ 286243] |-|
  H:   8.6 MHz (  1%) [      0] |--|
  Z:   9.7 MHz (  1%) [      0] |--|
------------
AVG:   6.3 MHz                  |=|

Implementation: kcl_dynamic_cast

  A: 769.8 MHz ( 80%) [2000000] |------------------------------------------------------------...
  B:  22.7 MHz (  2%) [1714823] |-----|
  C:  24.2 MHz (  3%) [1428664] |------|
  D:  24.3 MHz (  3%) [1142820] |------|
  E:  23.9 MHz (  3%) [ 857366] |------|
  F:  23.0 MHz (  2%) [ 571664] |-----|
  G:  22.0 MHz (  2%) [ 286243] |-----|
  H:  20.4 MHz (  2%) [      0] |-----|
  Z:  20.5 MHz (  2%) [      0] |-----|
------------
AVG:  20.1 MHz                  |=====|

Class hierarchy: balanced

Cast type: Mixed (cast from random classes)

Base-line: static_cast

  -: 985.7 MHz (103%) [2000000] |------------------------------------------------------------...

Implementation: dynamic_cast

  A: 988.1 MHz (103%) [2000000] |------------------------------------------------------------...
  B:  11.4 MHz (  1%) [ 858197] |--|
  C:   7.4 MHz (  1%) [ 285844] |-|
  D:   7.3 MHz (  1%) [ 286062] |-|
  E:   9.4 MHz (  1%) [ 856053] |--|
  F:   7.3 MHz (  1%) [ 285411] |-|
  G:   7.4 MHz (  1%) [ 285525] |-|
  H:   6.9 MHz (  1%) [      0] |-|
  Z:   7.1 MHz (  1%) [      0] |-|
------------
AVG:   7.1 MHz                  |=|

Implementation: priori_cast

  A: 978.5 MHz (102%) [2000000] |------------------------------------------------------------...
  B:   5.1 MHz (  1%) [ 858197] |-|
  C:   5.4 MHz (  1%) [ 285844] |-|
  D:   5.1 MHz (  1%) [ 286062] |-|
  E:   4.8 MHz (  0%) [ 856053] |-|
  F:   4.7 MHz (  0%) [ 285411] |-|
  G:   5.0 MHz (  1%) [ 285525] |-|
  H:   8.8 MHz (  1%) [      0] |--|
  Z:   9.7 MHz (  1%) [      0] |--|
------------
AVG:   5.4 MHz                  |=|

Implementation: kcl_dynamic_cast

  A: 934.1 MHz ( 98%) [2000000] |------------------------------------------------------------...
  B:  24.4 MHz (  3%) [ 858197] |------|
  C:  23.0 MHz (  2%) [ 285844] |-----|
  D:  23.1 MHz (  2%) [ 286062] |-----|
  E:  24.1 MHz (  3%) [ 856053] |------|
  F:  22.0 MHz (  2%) [ 285411] |-----|
  G:  23.1 MHz (  2%) [ 285525] |-----|
  H:  22.2 MHz (  2%) [      0] |-----|
  Z:  22.3 MHz (  2%) [      0] |-----|
------------
AVG:  20.5 MHz                  |=====|

Save the (DOM) trees! Why direct DOM manipulation is not a bad idea

A generic, and so non-binary, unsorted, some labels duplicated, arbitrary diagram of a tree.
A generic, and so non-binary, unsorted, some labels duplicated, arbitrary diagram of a tree. CC BY-SA 4.0

Direct DOM manipulation has gotten a bad reputation in the last decade of web development. From Ruby on Rails to React, the DOM was seen as something to gloriously destroy and re-render from the server or even from the browser. Never mind that the browser already exerted a lot of effort parsing HTML and constructing this tree! Mind-numbingly complex HTML string regular expression tests and manipulations had to deal with low-level details of the HTML syntax to insert, delete and change elements, sometimes on every keystroke! Contrasting to that, functions like createElement, remove and insertBefore from the DOM world were largely unknown and unused, except perhaps in jQuery.

Processing of HTML is destructive: The original DOM is destroyed and garbage collected with a certain time delay. Attached event handlers are detached and garbage collected. A completely new DOM is created from parsing new HTML set via .innerHTML =. Event listeners will have to be re-attached from the user-land (this is no issue when using on* HTML attributes, but this has disadvantages as well).

It doesn’t have to be this way. Do not eliminate, but manipulate!

Save the (DOM) trees!

sanitize-dom crawls a DOM sub-tree (beginning from a given node, all the way down to its ancestral leaves) and filters and manipulates it non-destructively. This is very efficient: The browser doesn’t have to re-render everything; it only re-renders what has been changed (sound familiar from React?).

The benefits of direct DOM manipulation:

  • Nodes stay alive.
  • References to nodes (i.e. stored in a Map or WeakMap) stay alive.
  • Already attached event handlers stay alive.
  • The browser doesn’t have to re-render entire sections of a page; thus no flickering, no scroll jumping, no big CPU spikes.
  • CPU cycles for repeatedly parsing and dumping of HTML are eliminated.

sanitize-dom

I just released version 4 of my sanitize-dom library which is my tool of choice to sanitize user-generated content. As mentioned in this post, my project takes a different approach: It doesn’t work with HTML, it operates only on DOM nodes.

sanitize-doms further advantages:

  • No dependencies.
  • Small footprint (only about 7 kB minimized).
  • Faster than other HTML sanitizers because there is no HTML parsing and serialization.

Check out sanitize-dom here on Github!

HTML5 + JavaScript + CSS3 RGBA video overlays on top of live GStreamer video pipelines

GStreamer comes with a number of plugins that allow rendering of text and/or graphics overlays on top of video: rsvgoverlay, subtitleoverlay, textoverlay, cairooverlay, gdkpixbufoverlay, opencvtextoverlay, etc. However, some of these plugins often allow only static graphics and text, and often do not approach the flexibility and power of dedicated video post-processing software products.

“noweffects” (a play on the name of a popular video post-processing software) is a proof-of-concept of leveraging the power of a modern HTML5 + JavaScript + CSS3 web browser engine to render high-quality, programmable, alpha-aware, animated, vector- and bitmap based content, which is then rendered into an RGBA raw video stream, which can then be transferred via some kind of IPC method to separate GStreamer processes, where it can be composited with other content via GStreamers regular compositor or videomixer plugins.

Qt was chosen for its ease of integration of modern WebKit (QtWebKit) and GStreamer (qt-gstreamer), and its ability to render widgets to RGBA images. The QMainWindow widget is rendered in regular intervals to QImages in RGBA format, then inserted into a GStreamer pipeline via the appsrc plugin. This pipeline simply uses udpsink to multicast the raw video RTP packets on localhost to allow for multiple ‘subscribers’. A second GStreamer pipleline can then use udpsrc and apply the overlay.

Proof-of-concept code is available at: https://github.com/michaelfranzl/noweffects

The following demonstration video was generated with “noweffects”: A website (showing CSS3 animations), rendered to an RGBA video via QtWebKit, then overlaid on top of a video test pattern in a separate GStreamer process.