06-08, 16:15–17:00 (Europe/London), Grand Hall
Ray is an open-source framework for scaling Python applications, particularly machine learning and AI workloads. It provides the layer for parallel processing and distributed computing. Many large language models (LLMs), including OpenAI's GPT models, are trained using Ray.
On the other hand, Apache Airflow is a consolidated data orchestration framework downloaded more than 20 million times monthly.
This talk presents the Airflow Ray provider package that allows users to interact with Ray from an Airflow workflow. In this talk, I'll show how to use the package to create Ray clusters and how Airflow can trigger Ray pipelines in those clusters.
This talk will discuss the benefits of using the Airflow Ray provider package to orchestrate Ray pipelines using Apache Airflow. They include:
- Integration: Incorporate Ray jobs into Airflow DAGs for unified workflow management.
- Distributed computing: Use Ray's distributed capabilities within Airflow pipelines for scalable ETL and LLM fine-tuning.
- Monitoring: Track Ray job progress through Airflow's user interface.
- Dependency management: Define and manage dependencies between Ray jobs and other tasks in DAGs.
- Resource allocation: Run Ray jobs alongside other task types within a single pipeline.-
Previous knowledge expected
Tatiana is a Staff Software Engineer at Astronomer and builds open-source tools to improve Apache Airflow.
Since graduating in Computer Engineering at Unicamp, Brazil, she has worked on multiple projects and contributed to various open-source projects. Before working at Astronomer, she worked for the Brazilian Ministry of Science and Technology, Globo, Education First, and BBC.