Thursday 28 November 2019

python - How to load large data into pandas efficiently?




I am working with a very wide dataset (1005 rows * 590,718 columns, 1.2G). Loading such a large dataset into a pandas dataframe result in code failure entirely due to insufficient memory.




I am aware that Spark is probably a good alternative to Pandas for dealing with large datasets, but is there any amenable solution in Pandas to reduce memory usage while loading large data?


Answer



You could use



pandas.read_csv(filename, chunksize = chunksize)

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...