Jyotinder Singh
I'm a software engineer working on model optimization techniques in the Keras team at Google. I spend my time writing code in OSS, publishing new issues of my newsletter, or making YouTube videos!
Session
11-08
10:10
45min
Practical Quantization in Keras: Running Large Models on Small Devices
Jyotinder Singh
Large language models are often too large to run on personal machines, requiring specialized hardware with massive memory. Quantization provides a way to shrink models, speed them up, and reduce memory usage - all while retaining most of their accuracy.
This talk introduces the fundamentals of neural network quantization, key techniques, and demonstrates how to apply them using Keras’s extensible quantization framework.
Talk Track 2