Visualization of higher-dimensional feature spaces during model training PyData Virginia 2025

Visualization of higher-dimensional feature spaces during model training
.ical

04-18, 16:40–17:15 (US/Eastern), Auditorium 4

Modern machine learning models typically utilize extremely high-dimensional feature spaces, which inhibits robustness and explainability. Finer-grained control over model training requires more powerful tools for observing and interacting with latent features as they evolve over time. In this talk, we give several examples of visualizations of nearest-neighbor graphs that illuminate common training pitfalls and provide practical insights for diagnosing model performance issues.

The goal of this talk is to provide machine learning practitioners with a few simple visualizations for more effective model training. These techniques have been developed through several years of real-world experience with model training, validation, deployment, and maintenance. Since the internal workings of large models are usually somewhat opaque, model trainers often ask themselves a familiar set of questions:

When should I stop training my model?

Which one of my saved model checkpoints is the “best”?

What training data should I add (or remove) to achieve a given outcome?

How do I know if my model is giving the right answer for the wrong reasons, or vice versa?

How robust is my model to out-of-distribution data?

Why is there performance drift in my deployed model?

We argue that much greater emphasis on model observability and explainability is needed, and that the right sorts of visualizations can generate valuable insights and point toward specific improvements.

Prior Knowledge Expected –

No previous knowledge expected

Vivek Dhand

Vivek Dhand uses his background in pure mathematics to address complex real-world problems. He has led and contributed to several applied research projects involving data fusion, computer vision, and natural language processing. He strives to develop robust and explainable systems with transparency and accountability, in order to minimize bias and protect individual privacy.

Vivek received his Ph.D. in mathematics from Northwestern University. His research interests include representation theory, category theory, algebraic combinatorics, and visualizations of mathematical structures.

Visualization of higher-dimensional feature spaces during model training .ical 04-18, 16:40–17:15 (US/Eastern), Auditorium 4

Visualization of higher-dimensional feature spaces during model training
.ical

04-18, 16:40–17:15 (US/Eastern), Auditorium 4