I went on to test memcpy behavior on my system after
seeing this href="https://stackoverflow.com/questions/21038965/why-speed-of-memcpy-drops-dramatically-every-4kb">Why
does the speed of memcpy() drop dramatically every
4KB?
Details of my
system:
arun@arun-OptiPlex-9010:~/mem_copy_test$
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit,
64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s)
list: 0-7
Thread(s) per core: 2
Core(s) per socket:
4
Socket(s): 1
NUMA node(s): 1
Vendor ID:
GenuineIntel
CPU family: 6
Model: 58
Stepping:
9
CPU MHz: 1600.000
BogoMIPS:
6784.45
Virtualization: VT-x
L1d cache: 32K
L1i
cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0
CPU(s): 0-7
arun@arun-OptiPlex-9010:~/mem_copy_test$ cat
/proc/cpuinfo | grep 'model name'| head -1
model name : Intel(R)
Core(TM) i7-3770 CPU @
3.40GHz
arun@arun-OptiPlex-9010:~/mem_copy_test$ uname
-a
Linux arun-OptiPlex-9010 3.13.0-40-generic #69-Ubuntu
SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64
GNU/Linux
Test
program:
#include
#include
#include
#include
void
memcpy_speed(unsigned long buf_size, unsigned long iters)
{
struct
timeval start, end;
unsigned char * pbuff_1;
unsigned char *
pbuff_2;
int i;
pbuff_1 = (void
*)malloc(buf_size);
pbuff_2 = (void *)malloc(buf_size);
gettimeofday(&start, NULL);
for(i = 0; i < iters; ++i){
memcpy(pbuff_2, pbuff_1, buf_size);
}
gettimeofday(&end,
NULL);
printf("%5.3f\n",
((buf_size*iters)/(1.024*1.024))/((end.tv_sec - \
start.tv_sec)*1000*1000+(end.tv_usec - start.tv_usec)));
free(pbuff_1);
free(pbuff_2);
}
main()
{
unsigned
long buf_size;
unsigned int i;
buf_size = 1;
for (i = 1; i < 16385 ; i++) {
printf("bufsize in kb=%d speed=",
i);
buf_size = i * 1024;
memcpy_speed(buf_size, 10000);
printf("\n");
}
}
I
am sharing the output from my google drive as stackoverflow is not allowing
me
to post images(says 10 reps needed for
that)
Output for 1 to 256 KB: href="https://drive.google.com/file/d/0B3mnbsS6F4tpY2dhRWJLaEY1RWc/view?usp=sharing"
rel="nofollow
noreferrer">https://drive.google.com/file/d/0B3mnbsS6F4tpY2dhRWJLaEY1RWc/view?usp=sharing
output
for 1 to 16384 KB: href="https://drive.google.com/file/d/0B3mnbsS6F4tpeC1Dd2R1VnJOV2c/view?usp=sharing"
rel="nofollow
noreferrer">https://drive.google.com/file/d/0B3mnbsS6F4tpeC1Dd2R1VnJOV2c/view?usp=sharing
1)
Why the graph has a peak @11-13KB?
2) why
behavior from 20 to 129KB9(range1) and 130 to 256KB(range2) are different?(range1 has
max speed not at multiples of 4 but range2 has max speed at multiples of 4; that too
with large peaks; also range2 has better speed than range1 at multiples of
4)
3) Why the speed reduces
dramatically close to 3000KB?
--Arun
No comments:
Post a Comment