Unplugged Intelligence: Building LLM-Powered Apps That Run Offline PyData Seattle 2025

Unplugged Intelligence: Building LLM-Powered Apps That Run Offline
.ical

2025-11-09 13:30–15:00, Tutorial Track 3

How can you use LLMs in professional settings where cloud APIs are off-limits due to cost, privacy, or compliance? In this talk, we’ll explore how to run powerful, open-source models like Mistral and LLaMA locally — and make them useful in the real world.

We’ll cover the engineering patterns, trade-offs, and deployment approaches that make local LLMs production-ready. You’ll learn how to build a private internal knowledge assistant that runs completely offline using RAG (retrieval-augmented generation), local embeddings, and quantized models. A short live demo will show it in action — answering organization-specific questions without sending a single token to the cloud.

This talk dives into the fast-growing space of local LLMs and their emerging role in secure, cost-sensitive, or regulated environments.

We’ll begin by examining:

Why local LLMs are gaining adoption (privacy, control, cost)
When it makes sense to use models like Mistral, Phi-3, or LLaMA 3 locally
Key trade-offs: quality vs speed, quantization formats (GGUF, GPTQ), and hardware choices

Then we shift to ML engineering and deployment architecture:

Serving local models efficiently: Ollama, vLLM, Transformers
Optimizing inference: batching, streaming, CPU vs GPU, latency tips
Embedding pipelines with LlamaIndex or LangChain
Indexing and chunking strategies for scalable RAG
Deployment approaches: Docker containers, on-premise clusters, offline workflows

To ground these ideas, the talk includes a short live demo:

A fully offline AI system that answers team-specific questions by searching across Slack exports, documentation, and meeting notes using a local LLM and vector store — no external API, no internet access.

Target Audience
- ML engineers, MLOps and platform teams
- Backend/infra engineers deploying AI tools internally
- Organizations in regulated industries (healthcare, finance, legal, defense)
- Developers looking to reduce GenAI cost or improve privacy

Prior Knowledge Expected:

Previous knowledge expected

Riya Joshi

As a Data and Applied Scientist at Microsoft with 7 years of experience spanning multiple geographies, I specialize in harnessing the power of AI to transform products and user experiences. My work ranges from developing on device AI models to implementing large language models that revolutionize how data science is practiced at scale. With a Master's degree in Computer Science and Artificial Intelligence from the University of Massachusetts Amherst, I bring both academic rigor and practical expertise to every challenge, consistently pushing the boundaries of what AI can achieve.

Unplugged Intelligence: Building LLM-Powered Apps That Run Offline .ical 2025-11-09 13:30–15:00, Tutorial Track 3

Unplugged Intelligence: Building LLM-Powered Apps That Run Offline
.ical

2025-11-09 13:30–15:00, Tutorial Track 3