PyData Global 2025

Fast, Cost-Efficient Analytics on Blockchain data using DuckDB - Solana as a case study
2025-12-09 , Data Engineering & Infrastructure

Abstract:

Blockchain generates millions of transactions daily, making it a rich yet complex source of data for developers, analysts, and researchers. While Google BigQuery offers public access to Solana’s historical data, repeated querying at scale can become costly and slow, especially during iterative exploration and analysis.

In this talk, I’ll demonstrate a practical workflow that combines the power of BigQuery for data extraction with the speed and flexibility of DuckDB for local, in-memory analytics. We’ll show how to efficiently query Solana data in BigQuery, export it to partitioned Parquet files, and use DuckDB to run fast, repeatable SQL queries without incurring additional cloud costs.

You'll learn:
- Basic terms in blockchain data structure and how transactions are saved.
- How to navigate and query Solana’s public datasets on BigQuery.
- How to export filtered blockchain data to efficient Parquet files.
- How DuckDB can serve as a lightweight analytics engine for on-chain data.
- Tips for partitioning, enriching, and automating your Solana data pipeline.

This demo would all run within Google collab to save time and also enable participant follow through the session.

Whether you're working on blockchain analytics, wallet behavior analysis, or on-chain data engineering, this talk will equip you with a practical approach to blockchain data workflows using open tools.


This talk explores how to build a workflow for Solana blockchain data using BigQuery and DuckDB. You'll learn how to query Solana’s public datasets in BigQuery, export key data as Parquet files, and use DuckDB for high-speed, ideal for blockchain developers, data engineers, and analysts working with large on-chain datasets.


Prior Knowledge Expected:

Yes

Busirah Hammed is a data engineer at YellowCard financial with over 6 years experience building data solutions. She's a data enthusiast whose experience spans across data science and engineering.