Others
https://en.wikipedia.org/wiki/Math_Kernel_Library
pandas_profiling
import pandas as pd
import pandas_profiling
pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/planets.csv').profile_report()
https://towardsdatascience.com/exploring-your-data-with-just-1-line-of-python-4b35ce21a82d
Reading sql
mydb1 = pymysql.connect(host=hosts,
user=user,
password=password,
database=dbname)
chunks = pd.read_sql(query, mydb1, chunksize=50000)
next(chunks).to_csv(file_path, index=False)
for chunk in chunks:
chunk.to_csv(file_path, index=False, header=False, mode='a')
mydb1.close()
bamboolib
https://towardsdatascience.com/introducing-bamboolib-a-gui-for-pandas-4f6c091089e3
Python Version
Pandas 2.0 : Everything You Need to Know - YouTube
Faster Pandas
- Dask
- Parallel Computation
- Task Graph
- https://www.kdnuggets.com/2020/04/dask-big-data.html
- https://rapids.ai https://github.com/rapidsai
- Vaex: A Fast DataFrame for Python 🚀
- GitHub - pola-rs/polars: Fast multi-threaded, hybrid-out-of-core DataFrame library in Rust | Python | Node.js
- Querying 1TB on a laptop with Python dataframes – Ibis
- DuckDB
- GitHub - modin-project/modin: Modin: Scale your Pandas workflows by changing a single line of code
import modin.pandas as pd
Tricks
Chunked Dataset Loading
import pandas as pd
def process(chunk):
"""Placeholder function that you may replace with your actual code for cleaning and processing each data chunk."""
print(f"Processing chunk of shape: {chunk.shape}")
chunk_iter = pd.read_csv("https://raw.githubusercontent.com/frictionlessdata/datasets/main/files/csv/10mb.csv", chunksize=100000)
for chunk in chunk_iter:
process(chunk)
Others
- Downcasting Data Types for Memory Efficiency Optimization
- Using Categorical Data for Frequently Occurring Strings
- Saving Data in Efficient Format: Parquet
- GroupBy Aggregation
- query() and eval() for Efficient Filtering and Computation
- Vectorized String Operations for Efficient Column Transformations
7 Pandas Tricks to Handle Large Datasets - MachineLearningMastery.com