PyData Seattle 2025

We don't dataframe shame: A love letter to dataframes
2025-11-07 , Talk Track 1

This lighthearted educational talk explores the wild west of dataframes. We discuss where dataframes got their origin (it wasn't R), how dataframes have evolved over time, and why dataframe is such a confusing term (what even is a dataframe?). We will look at what makes dataframes special from both a theoretical computer science perspective (the math is brief, I promise!) and from a technology landscape perspective. This talk doesn't advocate for any specific tool or technology, but instead surveys the broad field of dataframes as a whole.


What is a dataframe? The term is so common in data science and used by so many different tools, that any two people's definitions are likely to be vastly different. Despite all this, “dataframe” has a concrete definition, and its history is more interesting and has deeper roots than many realize. This talk takes a comprehensive and accessible journey through the world of dataframes.

We'll begin by traveling back in time to uncover the origins of the dataframe, exploring its conceptual roots and original implementation that predate even the R programming language. From there, we will trace its evolution, examining how different tools and communities have shaped modern dataframe technologies into the versatile tools we use today.

The core of the talk will address the fundamental question: "What makes a dataframe special?" We'll look at this from two key angles:

A Gentle Introduction to the Theory: We'll briefly touch on the computer science principles, like relational algebra, that give dataframes their power and structure. (Don't worry, the focus will be on intuition, not dense math!)

A Tour of the Modern Landscape: We will survey the incredible diversity of dataframe libraries and tools available today (e.g., pandas, Polars, Dask, Modin, Spark). The goal isn't to declare a "winner," but to understand the trade-offs and design philosophies behind them.

Attendees will leave this talk with a deeper appreciation for the dataframe tools they use every day. They will gain a clear mental model of what a dataframe is, understand the historical context of its development, and have a framework for navigating the ever-growing ecosystem of dataframe technologies. This talk is perfect for anyone from new data scientists to seasoned practitioners who might be curious about the history and theory behind one of data's most fundamental structures.


Prior Knowledge Expected:

No previous knowledge expected

Devin Petersohn is a Software Engineer at Snowflake, focusing on dataframes and distributed systems. Prior to working at Snowflake, Devin did a PhD at UC Berkeley, where he created a dataframe project called Modin, and wrote his thesis on dataframes. Devin is passionate about making complex distributed systems more accessible, and has contributed to multiple open source projects.