Introduction
- Learning Pandas? Start Here. • Excellent whirlwind summary of the most useful methods and what they’re for • Rob Mulla 📺
- Python Pandas Tutorial: A Complete Introduction for Beginners • “Learn some of the most important pandas features for exploring, cleaning, transforming, visualizing, and learning from data” • LearnDataSci 📖
- Learn Data Science from SCRATCH (with GitHub CoPilot) • Visual Studio Code 📺
General
- DataFrame • Pandas docs 📚
- Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby) • Keith Gali 📺
- Pandas code snippets • InterviewQs 📖
- Exploratory Data Analysis with Pandas • Rob Mulla 📺
- 25 Nooby Pandas Coding Mistakes You Should NEVER make • Rob Mulla 📺
- Filtering Pandas • Waylon Walker 📖
- Pandas: How to Select Rows Based on Column Values • Statology 📖
- Select rows from a Pandas DataFrame based on values in a column • InterviewQs 📖
- How do I select rows from a DataFrame based on column values? • Stack Overflow 👩💻
- How to filter Pandas dataframe using ‘in’ and ‘not in’ like in SQL • Stack Overflow 👩💻
- How to Use “NOT IN” Filter in Pandas (With Examples) • Statology 📖
DataFrames
- Merge Pandas Dataframes • Using
pd.merge
anddf.merge
• Rob Mulla 📺
Series
- …
Memory usage
- Stop wasting memory in your Pandas DataFrame! • When reading in a CSV with
read_csv
, specify which cols you care about withusecols
and specify the data type of those columns withdtype
, preferring efficient types like Category over broader types like strings. Profile before/after each change withmemory_usage='deep'
• Visual Studio Code 📺 - Memory Optimisation – Python DataFrames vs Lists and Dictionaries (JSON-like) • DataFrame smaller • Joel Tok 📖
- Performance Benchmarking: Pandas DataFrame vs Python List of Dictionaries • DataFrame faster • Joel Tok 📖
Testing
- Testing • Pandas docs 📚
Editing a *.parquet file
- Read the parquet file using
pandas.read_parquet(‘some/path.parquet’)
, which returns a DataFrame - Modify the
df
however you need - Write the
df
to a parquet file again withdf.to_parquet(‘some/path.parquet’)
Pandas 2
• Pandas 2.0 : Everything You Need to Know • Focuses on the switch from NumPy to Pyarrow under the hood • Rob Mulla 📺
Plots
- Bar Plots - Simple & Effective • Rob Mulla 📺
Polars
- Newer, faster alternative to Pandas, written in Rust
- Will Polars replace Pandas for Data Science? • Rob Mulla 📺
- You might never need Pandas again… • Isaac Harris-Holt 📺
Inbox
- GitHub - jvns/pandas-cookbook: Recipes for using Python’s pandas library
- SQL Databases with Pandas and Python - A Complete Guide - How to use SQLAlchemy to connect to a sql db, then use df.read_sql / df.to_sql to query/write to the database directly to a pandas df - Rob Mulla
- Stop Wasting Time on Simple Excel Tasks, Use Python - Nice intro to working with CSVs in Pandas (including combining and summarizing results from multiple CSVs) - John Watson Rooney
- when concatenating a string, if any value is
NaN
, the whole resulting string will beNaN
(theNaN
“propogates”); so, if a Pandas column you expect to contain strings contains surprisingNaN
values instead, check every concatentated element - Difference between .query , .loc and .filter function in pandas python | by Junaid Amin | Medium - Junaid Amin
- Pandas query method saves double handling of variables | by Alexis Lucattini | Towards Data Science - Alexis Lucattini
- pandera documentation
- Pandas & Polars: is it time to migrate? definitely maybe 🤔 - Marvik - Demonstrates Polars is a faster alternative for almost everything - Arturo Collazo
- Comparison with spreadsheets — pandas 2.2.2 documentation - Pandas docs
- Comparison with SQL — pandas 2.2.2 documentation - Pandas docs
- pytest: if expected df doesn’t match one produced in test run, print out the actual result to take a look:
print("🚨 DEBUG: filtered_df", filtered_df[["column_1", "column_2"]])
- Coming from Pandas - Polars user guide • Pandas -> Polars migration guide • Polars 📚