Friday 22 December 2017

c++ - Why is my program slow when looping over exactly 8192 elements?

itemprop="text">

Here is the extract from the program
in question. The matrix img[][] has the size SIZE×SIZE, and is
initialized at:



img[j][i] = 2 * j +
i




Then, you make a
matrix res[][], and each field in here is made to be the
average of the 9 fields around it in the img matrix. The border is left at 0 for
simplicity.



for(i=1;i            
for(j=1;j res[j][i]=0;

for(k=-1;k<2;k++)
for(l=-1;l<2;l++)
res[j][i] +=
img[j+l][i+k];
res[j][i] /=
9;
}



That's
all there's to the program. For completeness' sake, here is what comes before. No code
comes after. As you can see, it's just
initialization.



#define SIZE
8192
float img[SIZE][SIZE]; // input image
float res[SIZE][SIZE];
//result of mean filter
int i,j,k,l;
for(i=0;i
for(j=0;j
img[j][i] =
(2*j+i)%8196;


Basically,
this program is slow when SIZE is a multiple of 2048, e.g. the execution
times:



SIZE = 8191: 3.44
secs
SIZE = 8192: 7.20 secs
SIZE = 8193: 3.18
secs



The
compiler is GCC.
From what I know, this is because of memory management, but I
don't really know too much about that subject, which is why I'm asking
here.



Also how to fix this would be nice, but if
someone could explain these execution times I'd already be happy
enough.



I already know of malloc/free, but the
problem is not amount of memory used, it's merely execution time, so I don't know how
that would help.


itemprop="text">
class="normal">Answer



The
difference is caused by the same super-alignment issue from the following related
questions:






But
that's only because there's one other problem with the
code.



Starting from the original
loop:



for(i=1;i            
for(j=1;j res[j][i]=0;

for(k=-1;k<2;k++)
for(l=-1;l<2;l++)

res[j][i]
+= img[j+l][i+k];
res[j][i] /=
9;
}


First
notice that the two inner loops are trivial. They can be unrolled as
follows:



for(i=1;i            {
for(j=1;j res[j][i]=0;


res[j][i] += img[j-1][i-1];
res[j][i] += img[j ][i-1];
res[j][i]
+= img[j+1][i-1];
res[j][i] += img[j-1][i ];
res[j][i] += img[j
][i ];
res[j][i] += img[j+1][i ];
res[j][i] +=
img[j-1][i+1];
res[j][i] += img[j ][i+1];
res[j][i] +=
img[j+1][i+1];
res[j][i] /= 9;


}
}


So that
leaves the two outer-loops that we're interested
in.



Now we can see the problem is the same in
this question: Why does the
order of the loops affect performance when iterating over a 2D
array?



You are iterating the matrix
column-wise instead of row-wise.




/>

To solve this problem, you should interchange the
two
loops.



for(j=1;j            {
for(i=1;i res[j][i]=0;
res[j][i] +=
img[j-1][i-1];
res[j][i] += img[j ][i-1];
res[j][i] +=
img[j+1][i-1];

res[j][i] += img[j-1][i ];
res[j][i] +=
img[j ][i ];
res[j][i] += img[j+1][i ];
res[j][i] +=
img[j-1][i+1];
res[j][i] += img[j ][i+1];
res[j][i] +=
img[j+1][i+1];
res[j][i] /= 9;

}
}



This
eliminates all the non-sequential access completely so you no longer get random
slow-downs on large powers-of-two.



/>

Core i7 920 @ 3.5
GHz



Original
code:



8191: 1.499
seconds

8192: 2.122 seconds
8193: 1.582
seconds


Interchanged
Outer-Loops:



8191: 0.376
seconds
8192: 0.357 seconds
8193: 0.351
seconds



No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...