Thursday 23 November 2017

c++ - Why are std::fstreams so slow?

I was working on a simple parser and when profiling I
observed the bottleneck is in... file read! I extracted very simple test to compare the
performance of fstreams and FILE* when
reading a big blob of
data:



#include

#include
#include

#include
#include



void measure(const std::string&
test, std::function function)
{
auto start_time =
std::chrono::high_resolution_clock::now();


function();

auto duration =
std::chrono::duration_cast(std::chrono::high_resolution_clock::now()
- start_time);
std::cout< "<(duration.count()) * 0.000001<<"
ms"<}


#define BUFFER_SIZE
(1024 * 1024 * 1024)

int main(int argc, const char *
argv[])
{
auto buffer = new char[BUFFER_SIZE];

memset(buffer, 123, BUFFER_SIZE);

measure("FILE* write",
[buffer]()
{
FILE* file = fopen("test_file_write",
"wb");

fwrite(buffer, 1, BUFFER_SIZE, file);

fclose(file);
});
measure("FILE* read", [buffer]()

{
FILE* file = fopen("test_file_read", "rb");
fread(buffer, 1,
BUFFER_SIZE, file);
fclose(file);
});
measure("fstream
write", [buffer]()

{
std::ofstream
stream("test_stream_write", std::ios::binary);
stream.write(buffer,
BUFFER_SIZE);
});
measure("fstream read", [buffer]()

{
std::ifstream stream("test_stream_read", std::ios::binary);

stream.read(buffer, BUFFER_SIZE);
});



delete[]
buffer;
}


The
results of running this code on my machine are:



FILE* write 1388.59
ms
FILE* read 1292.51 ms
fstream write 3105.38 ms
fstream
read 3319.82
ms



fstream
write/read are about 2 times slower than FILE* write/read! And
this while reading a big blob of data, without any parsing or other features of
fstreams. I'm running the code on Mac OS, Intel I7 2.6GHz, 16GB
1600 MHz Ram, SSD drive. Please note that running again same code the time for
FILE* read is very low (about 200 ms) probably because the file
gets cached... This is why the files opened for reading are not created using the
code.



Why when reading just a blob of binary
data using fstream is so slow compared to
FILE*?



EDIT
1:
I updated the code and the times. Sorry for the
delay!



EDIT 2: I
added command line and new results (very similar to previous
ones!)




$ clang++
main.cpp -std=c++11 -stdlib=libc++ -O3
$ ./a.out
FILE* write 1417.9
ms
FILE* read 1292.59 ms
fstream write 3214.02 ms
fstream
read 3052.56
ms


Following the
results for the second
run:




$
./a.out
FILE* write 1428.98 ms
FILE* read 196.902
ms
fstream write 3343.69 ms
fstream read 2285.93
ms


It looks like the
file gets cached when reading for both FILE* and
stream as the time reduces with the same amount for both of
them.



EDIT 3: I
reduced the code to
this:




FILE* file =
fopen("test_file_write", "wb");
fwrite(buffer, 1, BUFFER_SIZE,
file);
fclose(file);

std::ofstream
stream("test_stream_write", std::ios::binary);
stream.write(buffer,
BUFFER_SIZE);


And
started the profiler. It seems like stream spends lots of time
in xsputn function, and the actual
write calls have the same duration (as it should be, it's the
same
function...)




Running
Time Self Symbol Name
3266.0ms 66.9% 0,0 std::__1::basic_ostream std::__1::char_traits >::write(char const*, long)
3265.0ms
66.9% 2145,0 std::__1::basic_streambuf
>::xsputn(char const*, long)
1120.0ms 22.9% 7,0
std::__1::basic_filebuf
>::overflow(int)
1112.0ms 22.7% 2,0 fwrite
1127.0ms 23.0% 0,0
fwrite


EDIT
4
For some reason this question is marked as duplicate. I wanted to point
out that I don't use printf at all, I use only
std::cout to write the time. The files used in the
read part are the output from the
write part, copied with different name to avoid
caching

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...