I am making a simple program that reads input from a text file then does some processing, it's written in Python. I heard reading from files is slow in Python and I do it periodically in my program so I thought I can use C++ for the file-reading (embedded inside the Python program). Someone told me to benchmark file reading in both languages to see if it's worth it so I did and the results were confusing.
In Python I wrote this code:
from timeit import default_timer as timer
start = timer()
file = open("test.txt", "r")
while True:
file_contents = file.read(1)
if not file_contents:
break
file.close()
end = timer()
print((end - start) * 1000)
I use a while loop instead of just reading it all in one go using file.read() to try to make it a fair comparison.
In C++ I wrote this:
#include
#include
#include
using namespace std;
using Clock = std::chrono::high_resolution_clock;
using msDuration = std::chrono::duration;
int main() {
ifstream reader;
int n;
reader.open("test.txt");
auto start = Clock::now();
while (!reader.eof())
reader >> n;
auto end = Clock::now();
reader.close();
msDuration delay = end - start;
}
I would have need to do operator overloading to output the "delay" so I just used debugging mode to watch by inserting a break point after it is initialized.
I ran both tests several times on a text file with a few lines (same one), the results were as follows:
-An average of 0.4 ms in Python, sometimes jumping to up to 1 ms (not frequently).
-An average of 2.5 ms in C++, sometimes jumping to up to 6 ms (not frequently).
Can someone explain to me why this is happening and how is Python faster than C++ in reading a text file?
p.s I tried removing the condition !reader.eof() and the while loop from the C++ program and reading them consequently like this >>x>>y>>z>>...
Still no noticeable difference.
The contents of the text file (if it matters) :
6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24
6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12
18 24
6 12 18 24 6 12 18 24 6 12 18 246 12 18 24
6 12 18 24 6 12 18 246 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12
18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18
24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6
12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12 18 24 6 12
18 24 6 12 18 24 6 12 18 24
edit: For a larger instance of this file (30mb) it is 10 seconds for Python vs almost a minute for C++. The gap is real.
edit 2: Used a string variable to read the input into instead of the int in C++, time reduced to 35~45 seconds. Still Python performs better up till now.
edit 3: Reading line by line into a string in C++ reduces the time further to 25~30 seconds. However, reading line by line in Python takes about 1-2 seconds.
No comments:
Post a Comment