Thursday 8 November 2018

Reading from a text file in python faster than C++

I am making a simple program that reads input from a text file then does some processing, it's written in Python. I heard reading from files is slow in Python and I do it periodically in my program so I thought I can use C++ for the file-reading (embedded inside the Python program). Someone told me to benchmark file reading in both languages to see if it's worth it so I did and the results were confusing.



In Python I wrote this code:




from timeit import default_timer as timer
start = timer()

file = open("test.txt", "r")
while True:
file_contents = file.read(1)
if not file_contents:
break
file.close()


end = timer()
print((end - start) * 1000)


I use a while loop instead of just reading it all in one go using file.read() to try to make it a fair comparison.



In C++ I wrote this:



#include
#include

#include

using namespace std;
using Clock = std::chrono::high_resolution_clock;
using msDuration = std::chrono::duration;

int main() {
ifstream reader;
int n;


reader.open("test.txt");
auto start = Clock::now();
while (!reader.eof())
reader >> n;
auto end = Clock::now();
reader.close();

msDuration delay = end - start;
}



I would have need to do operator overloading to output the "delay" so I just used debugging mode to watch by inserting a break point after it is initialized.



I ran both tests several times on a text file with a few lines (same one), the results were as follows:
-An average of 0.4 ms in Python, sometimes jumping to up to 1 ms (not frequently).
-An average of 2.5 ms in C++, sometimes jumping to up to 6 ms (not frequently).



Can someone explain to me why this is happening and how is Python faster than C++ in reading a text file?



p.s I tried removing the condition !reader.eof() and the while loop from the C++ program and reading them consequently like this >>x>>y>>z>>...

Still no noticeable difference.



The contents of the text file (if it matters) :




6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24



6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12
18 24




6 12 18 24 6 12 18 24 6 12 18 246 12 18 24



6 12 18 24 6 12 18 246 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12
18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18
24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6
12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12
18 24 6 12 18 24 6 12 18 24




edit: For a larger instance of this file (30mb) it is 10 seconds for Python vs almost a minute for C++. The gap is real.




edit 2: Used a string variable to read the input into instead of the int in C++, time reduced to 35~45 seconds. Still Python performs better up till now.



edit 3: Reading line by line into a string in C++ reduces the time further to 25~30 seconds. However, reading line by line in Python takes about 1-2 seconds.

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...