PyData Berlin 2025

More than DataFrames: Data Pipelines with the Swiss Army Knife DuckDB
2025-09-01 , B09

Most Python developers reach for Pandas or Polars when working with tabular data—but DuckDB offers a powerful alternative that’s more than just another DataFrame library. In this tutorial, you’ll learn how to use DuckDB as an in-process analytical database: building data pipelines, caching datasets, and running complex queries with SQL—all without leaving Python. We’ll cover common use cases like ETL, lightweight data orchestration, and interactive analytics workflows. You’ll leave with a solid mental model for using DuckDB effectively as the “SQLite for analytics.”


The goal of this tutorial is to help Python users understand and use DuckDB not just as a DataFrame interface, but as a fully featured analytics database embedded in their Python workflows. We'll highlight real-world patterns where DuckDB shines compared to traditional libraries, especially for medium-scale datasets that don’t justify a full data warehouse.
You’ll learn:
- When and why to reach for DuckDB instead of Pandas/Polars
- How DuckDB handles local files (CSV, Parquet, JSON, Postgres database, and more)
- Using DuckDB to build lightweight, SQL-based data pipelines
- Techniques for caching intermediate data in-process
- How to analyze data from remote sources via HTTP or S3
- Tips for using DuckDB with Jupyter, dbt, or your favorite Python tools


Prerequisites:

Basic SQL and Python skills

Expected audience expertise: Domain:

Novice

Abstract as a tweet (X) or toot (Mastodon):

Most Python devs use Pandas or Polars—but DuckDB is more than a DataFrame lib. In this tutorial, learn how to use DuckDB as an in-process SQL database for pipelines, caching, and analytics. It’s like SQLite, but for analytics.

I'm Mehdi, also known as mehdio, a data enthusiast with nearly a decade of experience in data engineering for companies of all sizes. I'm not your average data guy—I inject humor and fun into my work to make complex topics easier to digest. When I'm not actively contributing to the data community through my blog, YouTube, and social media, you can find me off-beat, marching to the beat of my own data drum.

Recently, I joined Motherduck as a developer advocate, where I bring my data engineering expertise to supercharge DuckDB.