Others

https://en.wikipedia.org/wiki/Math_Kernel_Library

pandas_profiling

import pandas as pd

import pandas_profiling

pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/planets.csv').profile_report()

https://towardsdatascience.com/exploring-your-data-with-just-1-line-of-python-4b35ce21a82d

Reading sql

mydb1 = pymysql.connect(host=hosts,

user=user,

password=password,

database=dbname)

chunks = pd.read_sql(query, mydb1, chunksize=50000)

next(chunks).to_csv(file_path, index=False)

for chunk in chunks:

chunk.to_csv(file_path, index=False, header=False, mode='a')

mydb1.close()

bamboolib

https://towardsdatascience.com/introducing-bamboolib-a-gui-for-pandas-4f6c091089e3

Python Version

Pandas 2.0 : Everything You Need to Know - YouTube

Faster Pandas

Dask
- Parallel Computation
- Task Graph
- https://www.kdnuggets.com/2020/04/dask-big-data.html
https://rapids.ai https://github.com/rapidsai
Vaex: A Fast DataFrame for Python 🚀
GitHub - pola-rs/polars: Fast multi-threaded, hybrid-out-of-core DataFrame library in Rust | Python | Node.js
Querying 1TB on a laptop with Python dataframes – Ibis
1. GitHub - ibis-project/ibis: the portable Python dataframe library
DuckDB
GitHub - modin-project/modin: Modin: Scale your Pandas workflows by changing a single line of code
1. import modin.pandas as pd

Tricks

Chunked Dataset Loading

import pandas as pd

def process(chunk):

  """Placeholder function that you may replace with your actual code for cleaning and processing each data chunk."""

  print(f"Processing chunk of shape: {chunk.shape}")

chunk_iter = pd.read_csv("https://raw.githubusercontent.com/frictionlessdata/datasets/main/files/csv/10mb.csv", chunksize=100000)

for chunk in chunk_iter:

    process(chunk)

Others

Downcasting Data Types for Memory Efficiency Optimization
Using Categorical Data for Frequently Occurring Strings
Saving Data in Efficient Format: Parquet
GroupBy Aggregation
query() and eval() for Efficient Filtering and Computation
Vectorized String Operations for Efficient Column Transformations

7 Pandas Tricks to Handle Large Datasets - MachineLearningMastery.com

faster-python

pandas_profiling​

Reading sql​

bamboolib​

Python Version​

Faster Pandas​

Tricks​

Chunked Dataset Loading​

Others​

Others​

pandas_profiling

Reading sql

bamboolib

Python Version

Faster Pandas

Tricks

Chunked Dataset Loading

Others

Others