Declarative Feature Engineering: Bridging Spark and Flink with a Unified DSL PyData Amsterdam 2025

Declarative Feature Engineering: Bridging Spark and Flink with a Unified DSL
.ical

09-26, 11:05–11:40 (Europe/Amsterdam), Nebula

Building ML features at scale shouldn’t require every ML Scientist to become an expert in Spark or Flink. At Adyen, the Feature Platform team built a Python-based DSL that lets data scientists define features declaratively — while automatically generating the necessary batch or real-time pipelines behind the scenes.

Adyen processes billions of payments globally, relying heavily on machine learning. As the demand for new features and faster experimentation grew across ML teams, it became clear that requiring data scientists to build and maintain their own pipelines was slowing development.

To address this, we built a Python-based domain-specific language (DSL) that allows ML scientists to define features declaratively — focusing on logic, not infrastructure. Behind the scenes, the DSL translates these definitions into production-ready pipelines: Spark jobs orchestrated by Airflow for batch processing, and Flink jobs for low-latency, real-time features. Features are stored and served via HDFS and/or Cassandra, ensuring consistency across training and inference.

This talk presents an architectural overview of the platform, including key design decisions and trade-offs — from the DSL’s structure and code generation to deployment, orchestration, and online serving. We'll also share lessons learned from scaling this system in a high-throughput payments environment, and how close collaboration between ML scientists and engineers helped bridge the gap between experimentation and production.

Co-presented by members of Adyen’s ML and Platform Engineering teams, this session offers a practical look at building, operating, and evolving a robust feature platform for production-grade machine learning.

Declarative Feature Engineering: Bridging Spark and Flink with a Unified DSL .ical 09-26, 11:05–11:40 (Europe/Amsterdam), Nebula

Declarative Feature Engineering: Bridging Spark and Flink with a Unified DSL
.ical

09-26, 11:05–11:40 (Europe/Amsterdam), Nebula