Download: switch_benchmark.tar.gz
In this post I benchmark and compare virtual function calls to functionally equivalent if-else and switch statements. In object oriented design switch statements are substituted by polymorphism. The reason is to make the code more compact, readable and maintainable. However, the benchmark indicates that there is a case for doing the opposite -- replacing polymorphism with switch statements, if only for very specific and limited applications where performance considerations are the highest priority. Of course, switch statements can be made more compact, readable and maintainable using other C++ techniques such as metaprogramming. The source code for these techniques and the benchmark is available for download. Tested with gcc version 4.3.2.
Friday, September 02, 2011
A case for replacing polymorphism with switch-statements
The polymorphic benchmark uses a container (an array)
that holds pointers to a base class from which several classes are derived. The
time taken to make a virtual function call is measured for arrays of between 2
and 10 objects. An array of 2 objects takes 14 ns on average and 18 ns for 10
objects.
A functionally equivalent if-else function varies between
10 ns and nearly 18 ns.
The switch statement was the best performing at a
constant 4 ns irrespective of number of objects. The switch statement is
obviously being optimized by the compiler with the construction of a jump
table.
Should I care about a 14 ns virtual function call
overhead? For most applications a few nanoseconds here or there
make no difference. However, there are probably some software applications
that might need to make every nanosecond count. For example, a network device
may be described as having a performance of 10 gigabits per second
(Gb/s). A 10-Gb/s Ethernet interface can deliver packets at between
812,740 and 14,880,960 p/s depending on the size of the packet, or on average
it delivers packets every 67 ns - 1.2 us (see this article). In this context, a
14 ns overhead is significant. Therefore, if you are building a software
application on top of 10 gigabits Ethernet interfaces, if-else and especially switch-statements
might be the way to go. You can not make too many virtual function calls in 67
ns. Granted, my hardware may not be cutting-edge but I think a case can be made
for switch-statements over polymorphism. Function call overhead can be improved by a factor of at least 4.
Getting back to compactness, readability and maintainability
of the code; we can improve the if-else and switch statements using Boost’s mpl,
preprocessor and tuple libraries. You can see below a typical switch-statement
and how it is used in place of an array of pointers.
struct SwitchBase
{
void
function(long* x, int type)
{
switch(type) {
case 0:
obj1.function(x);
break;
case 1:
obj2.function(x);
break;
case 2:
obj3.function(x);
break;
case 3:
obj4.function(x);
break;
case 4:
obj5.function(x);
break;
case 5:
obj6.function(x);
break;
case 6:
obj7.function(x);
break;
case 7:
obj8.function(x);
break;
case 8:
obj9.function(x);
break;
case 9:
obj10.function(x);
break;
}
}
ImpOfStatic1
obj1;
ImpOfStatic2 obj2;
ImpOfStatic3
obj3;
ImpOfStatic4
obj4;
ImpOfStatic5
obj5;
ImpOfStatic6
obj6;
ImpOfStatic7
obj7;
ImpOfStatic8
obj8;
ImpOfStatic9
obj9;
ImpOfStatic10
obj10;
};
long sum=0;
SwitchBase sw;
sw.function(&sum, 2); // calls object 3 with a 4 ns
overhead
sw.function(&sum, 0); // calls object 1 with a 4 ns
overhead
Writing switch-statements like this by hand is tedious
and error prone. Instead we can automate its construction. I provide a class
called “switch_case”. The compiler automatically constructs all the case
statements required and the performance (Switch-Boost) is exactly the same as the
hand written version.
Usage: Create instances of each class and add them to a
container (a Boost tuple).
// Instantiate the class instances
ImpOfStatic1 s1;
ImpOfStatic2 s2;
ImpOfStatic3 s3;
ImpOfStatic4 s4;
ImpOfStatic5 s5;
ImpOfStatic6 s6;
ImpOfStatic7 s7;
ImpOfStatic8 s8;
ImpOfStatic9 s9;
ImpOfStatic10 s10;
// Create a tuple and store the instances
typedef boost::tuple<
ImpOfStatic1*,
ImpOfStatic2*,
ImpOfStatic3*,
ImpOfStatic4*,
ImpOfStatic5*,
ImpOfStatic6*,
ImpOfStatic7*,
ImpOfStatic8*,
ImpOfStatic9*,
ImpOfStatic10*>
StatTuple;
// Store the instances in the tuple
StatTuple
tup(&s1,&s2,&s3,&s4,&s5,&s6,&s7,&s8,&s9,&s10);
// Crreate a switch-statement for all types in the tuple.
switch_case<StatTuple> call(tup);
Func f;
call(f, 2, &sum);
// calls object s3 with a 4 ns overhead
call(f, 0, &sum); // calls object s1 with a 4 ns overhead
In this case, you do not have to write a switch-case function by hand. The compiler does it automatically. It is almost as elegant as the polymorphic solution and comes with a couple of real advantages:
1) Function call
overhead is improved by a factor of four.
2) Functions can be in-lined -- virtual functions can
not.
3) No loss of type identity specific to derived classes.
4) No derived classes are needed. Classes can have
different member function names.
Subscribe to:
Post Comments (Atom)
This is a subject I like very much. I can't resist to make some comments. So here they are:
ReplyDelete- Assuming gcc 4.3.5 and -O2, only dynamic benchmark is doing what expected. The other ones are "over-optimized". Just try to change first static method from pre-increment to pre-decrement to see the difference. Both switch cases will degrade performance from 2 to 3 times.
- In dynamic (virtual) benchmark, methods are called with an object context (this). To be fair in the comparison, static methods should be provided with an additional parameter (function dependent).
- With these two modifications dynamic and switch cases are comparable.