Pandas • Michael Uloth

Introduction

Learning Pandas? Start Here. • Excellent whirlwind summary of the most useful methods and what they’re for • Rob Mulla 📺
Python Pandas Tutorial: A Complete Introduction for Beginners • “Learn some of the most important pandas features for exploring, cleaning, transforming, visualizing, and learning from data” • LearnDataSci 📖
Learn Data Science from SCRATCH (with GitHub CoPilot) • Visual Studio Code 📺

General

DataFrame • Pandas docs 📚
Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby) • Keith Gali 📺
Pandas code snippets • InterviewQs 📖
Exploratory Data Analysis with Pandas • Rob Mulla 📺
25 Nooby Pandas Coding Mistakes You Should NEVER make • Rob Mulla 📺
Filtering Pandas • Waylon Walker 📖
Pandas: How to Select Rows Based on Column Values • Statology 📖
Select rows from a Pandas DataFrame based on values in a column • InterviewQs 📖
How do I select rows from a DataFrame based on column values? • Stack Overflow 👩‍💻
How to filter Pandas dataframe using ‘in’ and ‘not in’ like in SQL • Stack Overflow 👩‍💻
How to Use “NOT IN” Filter in Pandas (With Examples) • Statology 📖

DataFrames

Merge Pandas Dataframes • Using pd.merge and df.merge • Rob Mulla 📺

Series

Memory usage

Stop wasting memory in your Pandas DataFrame! • When reading in a CSV with read_csv, specify which cols you care about with usecols and specify the data type of those columns with dtype, preferring efficient types like Category over broader types like strings. Profile before/after each change with memory_usage='deep' • Visual Studio Code 📺
Memory Optimisation – Python DataFrames vs Lists and Dictionaries (JSON-like) • DataFrame smaller • Joel Tok 📖
Performance Benchmarking: Pandas DataFrame vs Python List of Dictionaries • DataFrame faster • Joel Tok 📖

Testing

Testing • Pandas docs 📚

Editing a *.parquet file

Read the parquet file using pandas.read_parquet(‘some/path.parquet’), which returns a DataFrame
Modify the df however you need
Write the df to a parquet file again with df.to_parquet(‘some/path.parquet’)

Pandas 2

• Pandas 2.0 : Everything You Need to Know • Focuses on the switch from NumPy to Pyarrow under the hood • Rob Mulla 📺

Plots

Bar Plots - Simple & Effective • Rob Mulla 📺

Polars

Newer, faster alternative to Pandas, written in Rust
Will Polars replace Pandas for Data Science? • Rob Mulla 📺
You might never need Pandas again… • Isaac Harris-Holt 📺

Inbox

GitHub - jvns/pandas-cookbook: Recipes for using Python’s pandas library
SQL Databases with Pandas and Python - A Complete Guide - How to use SQLAlchemy to connect to a sql db, then use df.read_sql / df.to_sql to query/write to the database directly to a pandas df - Rob Mulla
Stop Wasting Time on Simple Excel Tasks, Use Python - Nice intro to working with CSVs in Pandas (including combining and summarizing results from multiple CSVs) - John Watson Rooney
when concatenating a string, if any value is NaN, the whole resulting string will be NaN (the NaN “propogates”); so, if a Pandas column you expect to contain strings contains surprising NaN values instead, check every concatentated element
Difference between .query , .loc and .filter function in pandas python | by Junaid Amin | Medium - Junaid Amin
Pandas query method saves double handling of variables | by Alexis Lucattini | Towards Data Science - Alexis Lucattini
pandera documentation
Pandas & Polars: is it time to migrate? definitely maybe 🤔 - Marvik - Demonstrates Polars is a faster alternative for almost everything - Arturo Collazo
Comparison with spreadsheets — pandas 2.2.2 documentation - Pandas docs
Comparison with SQL — pandas 2.2.2 documentation - Pandas docs
pytest: if expected df doesn’t match one produced in test run, print out the actual result to take a look: print("🚨 DEBUG: filtered_df", filtered_df[["column_1", "column_2"]])
Coming from Pandas - Polars user guide • Pandas -> Polars migration guide • Polars 📚