I have the following two files :-
single.cpp
:-
#include
#include
using namespace
std;
unsigned long a=0;
class A {
public:
virtual int f() __attribute__ ((noinline)) { return a; }
};
class B : public A {
public:
virtual int f() __attribute__ ((noinline)) { return a; }
void
g() __attribute__ ((noinline)) { return; }
};
int
main() {
cin>>a;
A* obj;
if (a>3)
obj = new B();
else
obj = new A();
unsigned long result=0;
for (int i=0;
i<65535; i++) {
for (int j=0; j<65535; j++) {
result+=obj->f();
}
}
cout<
}
And
multiple.cpp :-
#include
#include
using
namespace std;
unsigned long a=0;
class A
{
public:
virtual int f() __attribute__ ((noinline)) { return a;
}
};
class dummy {
public:
virtual void g() __attribute__ ((noinline)) { return;
}
};
class B : public A, public dummy {
public:
virtual int f() __attribute__ ((noinline)) { return a;
}
virtual void g() __attribute__ ((noinline)) { return;
}
};
int main() {
cin>>a;
A* obj;
if (a>3)
obj = new
B();
else
obj = new A();
unsigned
long result=0;
for (int i=0; i<65535; i++) {
for
(int j=0; j<65535; j++) {
result+=obj->f();
}
}
cout< }
I
am using gcc version 3.4.6 with flags -O2
And
this is the timings results I get :-
multiple
:-
real
0m8.635s
user 0m8.608s
sys
0m0.003s
single :-
real 0m10.072s
user
0m10.045s
sys
0m0.001s
On
the other hand, if in multiple.cpp I invert the order of class derivation thus :-
class B : public dummy, public A
{
Then I get the
following timings (which is slightly slower than that for single inheritance as one
might expect thanks to 'thunk' adjustments to the this pointer that the code would need
to do) :-
real
0m11.516s
user 0m11.479s
sys
0m0.002s
Any idea why
this may be happening? There doesn't seem to be any difference in the assembly generated
for all three cases as far as the loop is concerned. Is there some other place that I
need to look at?
Also, I have bound the process
to a specific cpu core and I am running it on a real-time priority with SCHED_RR.
EDIT:- This was noticed by Mysticial and
reproduced by me.
Doing a
cout << "vtable:
" << *(void**)obj <<
endl;
just before the
loop in single.cpp leads to single also being as fast as multiple clocking in at 8.4 s
just like public A, public dummy.
No comments:
Post a Comment