PyData Tel Aviv 2025

Lukas Hafner

I am a biologist interested in the interface of AI, big data and life science. My current work comprises systems to automatize data-driven science with AI and reinforcement learning systems to control experimental workflows in microbiology and evolution.


Session

11-05
14:30
30min
Autonomous LLM-driven research - from data to human-verifiable research papers
Tal Ifargan, Lukas Hafner

AI has led to major accelerations across various domains, and is also prone to become a cornerstone of scientific discovery in the future. Yet, it remains unclear whether AI systems can perform fully autonomous research while also adhering to key scientific values, such as transparency, traceability and verifiability. Translating human scientific practices into a code workflow, we built data-to-paper, an automation platform that guides interacting LLM agents through a complete stepwise research process, from annotated data to comprehensive research papers. The platform can write, correct and execute code, perform literature research and produce simple figures and write and compile a scientific manuscript. To enhance accuracy and enforce good scientific practices during the process, data-to-paper features both programmatic guardrails and LLM-based feedback. As a key feature, data-to-paper programmatically back-traces the information flow during the process, resulting in “data-chained” manuscripts in which each data element is linked to its source and which are highly readable and explainable for a human user. The platform can run fully autonomously but also allows human intervention. Testing the platform on diverse datasets, it produced autonomously correct papers in 80%-90% of runs for simple datasets and research goals, yet human interventions became critical for more complex tasks. Data-to-paper is the first peer-reviewed, open-source system to present an agentic workflow of an LLM-driven AI-scientist and demonstrates a potential for AI-driven acceleration of scientific discovery in data-driven research and beyond, while setting through “data-chaining” a new standard for verifiability and traceability for the coming era of AI-driven science.

AI