PyData Global 2025

torchTextClassifiers : Modernizing Text classification for French National Statistics
2025-12-09 , Machine Learning & AI

Discover how Insee (French National Statistics Institute) transitioned from fastText to a PyTorch-based model for text classification by developing and open-sourcing the torchTextClassifiers python package. This presentation will cover the creation, deployment, and practical applications of torchTextClassifiers in modernizing automatic coding systems, benefiting Insee and other European National Statistical Institutes (NSIs).


Insee, France's National Institute of Statistics and Economic Studies, has long relied on fastText for automatic coding tasks. Recognizing the need to modernize and future-proof this critical functionality, we developed torchTextClassifiers — an open-source Python package that enables easy training and deployment of a PyTorch-based model for text classification, paving the way for further innovation in this domain.

This session will delve into the motivations behind replacing the archived fastText package, the design and implementation of torchTextClassifiers , and its integration into Insee's production environment. We'll discuss the challenges faced during this transition, including model compatibility, performance optimization, and user adoption.​

Attendees will gain insights into:​

  • The rationale for moving from fastText to a PyTorch-based model​ in production

  • Packaging a PyTorch-based model architecture and open-source collaboration

  • Key features and architecture of torchTextClassifiers ​

  • Deployment strategies within a public administration (MLOps, cloud native tools, security)

  • Lessons learned and best practices for similar transitions​

This talk is intended for data scientists, machine learning engineers, and practitioners interested in NLP, model deployment, and open-source tool development.


Prior Knowledge Expected:

No

Cédric Couralet, Data Scientist at Insee, is an open-source enthusiast, with expertise in software architecture and secure system design.