PyData Global 2025

Building Production-Ready Research AI Assistants with One-Command Setup
2025-12-10 , Machine Learning & AI

Academic research is often fragmented across dense PDFs, complex jargon, and scattered media articles, making it hard to access for students, interns, and the broader public. To address this, we introduce SciChat: an open-source Research AI Assistant that unifies a lab’s papers and media coverage into a conversational system, where anyone can ask natural language questions and receive structured answers with full source citations.

This talk demonstrates how to build and deploy a production-ready RAG pipeline that uses Landing.AI for vision-based PDF parsing, Firecrawl for media extraction, and LangGraph for agentic orchestration. The entire system is containerized with FastAPI and Streamlit, launching with a single command: docker compose up.

Attendees will learn how to turn scattered research artifacts into a transparent, queryable knowledge base, making lab insights accessible, reproducible, and conversational for all.


In this talk, we introduce SciChat: an open-source framework for Research AI Assistant that allows labs to ingest scientific papers and media coverage, build a vector database, and query it via natural language—all in one reproducible command.

This 30-minute talk will explore:

  • 🧠 Architecture: How LangGraph, FastAPI, and Streamlit are combined with agentic reasoning for document Q&A.
  • 📄 Multi-modal Ingestion: How SciChat uses Landing.AI (vision agentic document extraction) and Firecrawl to intelligently extract content from complex PDFs and dynamic media pages.
  • 🤖 LLM Workflow: How intents are classified, documents retrieved, and responses synthesized with structured JSON output and source attribution.
  • 🔄 Reusability and Extensibility: How any lab or research group can plug in their own documents and deploy in minutes.
  • ⚙️ One-Line Setup: How a single YAML config and docker compose up sets up ingestion, vectorization, API, UI, and Slack bot integration.

We'll conclude with a live demo showing how SciChat answers real research questions using citation-backed reasoning, emphasizing transparency, reliability, and ease of use.

SciChat is designed for reproducibility, minimal setup, and immediate utility. If you're interested in bringing GenAI to your research workflow—or your research to the world—this talk will show you exactly how.

Target Audience: Researchers, students, and enthusiasts wanting practical AI tools.

Prerequisites:
- Python knowledge
- Familiarity with containerization concepts.

Resources Provided: Complete open-source codebase with Docker configuration for immediate deployment.
Remember that the main goal/advantage here is to make it accessible for the whole lab documents (papers and media coverage), so anyone can ask about it with a source citation.


Prior Knowledge Expected:

Yes

I’m a data scientist and AI engineer with 10+ years of experience across academic research and industry, building GenAI and machine learning solutions for research labs, startups, and Fortune 500 companies. I’m also a passionate educator, contributing to data training programs as a professor and consultant, and an active open-source contributor and speaker at conferences like SciPy and PyData.