Ingestify: Rethinking Ingestion for Complex Data PyData Eindhoven 2025

Ingestify: Rethinking Ingestion for Complex Data
.ical
2025-12-09 15:20–15:50, Planck-Bohr

Traditional data pipelines often tie ingestion and transformation together, forcing data into rows and columns early in the process. But modern workloads - from large XML documents to high-resolution video files - have transformations that are far from trivial and require very different compute resources than ingestion. Separating these concerns becomes essential. When a transform fails, you shouldn’t have to re-download data or hit the source system again.

Ingestify is a Metadata-First Ingestion Layer that stores raw data as-is, enriched with structured metadata, and ingests only when the source data has actually changed. This makes the ingest step fast, durable and independent of downstream processing. Transformations can evolve, fail or be retried freely while the original data remains intact.

In this talk, I’ll show how metadata-first ingestion forms a clean and reliable foundation for working with large, complex or non-tabular data - and why it’s a good fit for modern data workflows.

Prior Knowledge Expected: Medium - Basic Understanding (read about it but never used it)

Koen Vossen

Ingestify: Rethinking Ingestion for Complex Data .ical 2025-12-09 15:20–15:50, Planck-Bohr

Ingestify: Rethinking Ingestion for Complex Data
.ical
2025-12-09 15:20–15:50, Planck-Bohr