Tuesday, 24 October 2017

performance - Huge overhead due to C function call

I have a simple function that multiplies two
matrices.


void mmul1(float A[ni][nk], float
B[nk][nj], float C[ni][nj])
{
int i, j, k;
for (i=0;
i for (j=0; j C[i][j] = 0;

for (k=0; k C[i][j] += A[i][k]*B[k][j];
}

}
}
}

I have
a main function that looks like this:


int
main(int argc, char** argv) {
// timer structs
struct timeval ts,
te, td;
float tser, tpar, diff;
int i, j, k;

printf("matrix size : %d x %d x %d\n", ni, nj, nk);
srand(0);
//
initialization
for (i=0; i for (k=0; k {
A[i][k] = (float)rand()/RAND_MAX;
}
}
for
(k=0; k for (j=0; j B[k][j] =
(float)rand()/RAND_MAX;
}
}
gettimeofday(&ts,
NULL);
for (i=0; i for (j=0; j {
Cans[i][j] = 0;
for (k=0; k Cans[i][j]
+= A[i][k]*B[k][j];
}
}
}

gettimeofday(&te, NULL);
timersub(&ts, &te, &td);

tser = fabs(td.tv_sec+(float)td.tv_usec/1000000.0);
gettimeofday(&ts,
NULL);
mmul1(A, B, C);
gettimeofday(&te, NULL);

timersub(&ts, &te, &td);
tpar =
fabs(td.tv_sec+(float)td.tv_usec/1000000.0);
// compare results

diff = compute_diff(C, Cans);
printf("Performance : %.2f GFlop/s (%.1fX)\n",
2.0*ni*nj*nk/tpar/1000000000, tser/tpar );
printf("Result Diff : %.3f\n",
diff );
return
0;
}

I am compiling
with gcc's -O3 flag.


When testing,
I found that if I add static inline to
mult's signature, I get a 5X speedup when testing on 512x512
matrices. The overhead of a function call should be negligible compared to the
multiplication. Why is this performance penalty occurring (is the compiler generating
different machine code?), and how can I fix it without
inlineing
mult?

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...