04-18, 10:20–10:55 (US/Eastern), Auditorium 4
Data system interoperability remains a significant challenge in open source ecosystems, with high costs in development time and resources when moving data across complex infrastructures. The Apache Arrow project offers a standardized solution to reduce these integration challenges.
Will Ayd (Apache Arrow Committer and pandas maintainer) and Matt Topol (Apache Arrow PMC Member and author of "In Memory Analytics with Apache Arrow") will discuss how Apache Arrow is changing the data landscape. A brief overview of Arrow standards will be provided, while also reviewing real world implementations of where the Arrow specification has driven down the cost of data interoperability.
The Apache Arrow project has been drastically improving the way analytical tools perform, interoperate, and scale. However, as Arrow is primarily used by developers, much of those improvements are happening "behind the scenes," leaving many uninformed as to what exactly Apache Arrow is.
In this talk, we will provide a more formal definition of Apache Arrow, and discuss its various components that collectively are helping to revolutionize the data landscape. We will also take some time to explore how popular Python packages like pandas, polars, and pantab have been leveraging Apache Arrow for interoperability between utilities, while also having an open discussion as to what can still be done.
By the end of this talk, users will have an appreciation of how Apache Arrow is powering their Python (and non-Python!) libraries today, and how it will shape the data landscape going forward. Topics like Arrow Flight, Arrow Flight SQL, Arrow ADBC, and nanoarrow will be discussed, and attendees will gain a deeper understanding of how these technologies are evolving the way data is used in embedded environments, relational databases, HTTP exchanges, AI applications, and more.
No previous knowledge expected
Will Ayd is the author of the Pandas Cookbook, Third Edition, and has served as a maintainer of the pandas project since 2018. Will is also a Committer to the Apache Arrow project, and has helped improve countless more open source data libraries.
In his day job, Will helps clients in the Retail and Apparel spaces optimize cloud data platforms in AWS and GCP, while also providing strategy and training around the use of open source technology in enterprise settings.
Hailing from the faraway land of Brentwood, NY and currently residing in the rolling hills of Connecticut, Matt Topol has always been passionate about software. After graduating from Brooklyn Polytechnic (now NYU-Poly), he joined FactSet Research Systems, Inc. in 2009 developing financial software. In the time since, Matt has worked in infrastructure and application development, has lead development teams, and architected large-scale distributed systems for processing analytics on financial data. Matt is a PMC member for the Apache Arrow project, frequently enhancing the Golang library among other enhancements and helping to grow the Arrow Community. Recently, Matt wrote the first and only book on Apache Arrow "In-Memory Analytics with Apache Arrow" and joined Voltron Data in order to work on the Apache Arrow libraries full time and grow the Arrow Golang community.
In his spare time, Matt likes to bash his head against a keyboard, develop/run delightfully demented games of fantasy for his victims--er--friends, and share his knowledge with anyone interested who'll listen to his rants.