Monday, 25 March 2019

python - How to store a dataframe using Pandas

Numpy file formats are pretty fast for numerical data


I prefer to use numpy files since they're fast and easy to work with.
Here's a simple benchmark for saving and loading a dataframe with 1 column of 1million points.


import numpy as np
import pandas as pd
num_dict = {'voltage': np.random.rand(1000000)}
num_df = pd.DataFrame(num_dict)

using ipython's %%timeit magic function


%%timeit
with open('num.npy', 'wb') as np_file:
np.save(np_file, num_df)

the output is


100 loops, best of 3: 5.97 ms per loop

to load the data back into a dataframe


%%timeit
with open('num.npy', 'rb') as np_file:
data = np.load(np_file)
data_df = pd.DataFrame(data)

the output is


100 loops, best of 3: 5.12 ms per loop

NOT BAD!


CONS


There's a problem if you save the numpy file using python 2 and then try opening using python 3 (or vice versa).

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...