PyData Amsterdam 2025

Evaluating the alignment of LLMs to Dutch societal values
2025-09-26 , Apollo

The City of Amsterdam is researching the responsible adoption of Large Language Models (LLMs) by evaluating their performance, environmental impact, and alignment with human values. In this talk, we will share how we develop tailored benchmarks and a dedicated assessment platform to raise awareness and guide responsible implementation.


The City of Amsterdam is known for continuously experimenting with AI and its commitment to the responsible adoption of emerging technologies. Naturally, this includes a strong focus on the use of Large Language Models (LLMs) to make municipal processes and services not only more efficient but also more transparent, inclusive, and user-friendly.

However, while LLMs offer plentiful opportunities, they also come with significant risks. One of the most critical decisions in designing LLM systems is the choice of an underlying model - one that aligns with the needs and values of the City and its citizens. But what are these values? And how do we ensure that LLMs meet them?

To address these questions, our team has been working to empower the City of Amsterdam and inspire the Dutch government to responsibly implement LLMs by creating a comprehensive overview of their performance, environmental impact, and alignment with human values.

In this talk, we will share our ethical considerations when employing LLMs and walk you through the aspects that experts and users find most important to evaluate. We will discuss how we translate these values into measurable benchmarks and curate datasets tailored to Dutch governmental needs. Finally, we will showcase our dedicated LLM benchmarking platform, which can be used to assess LLMs and inspire others to adopt them responsibly.

Iva is a data scientist at the City of Amsterdam, where she researches the responsible use of AI for municipal use cases. Her current work focuses on benchmarking Large Language Models to promote the adoption of more diverse open-source models and to help reduce the negative societal and environmental impacts of technology.

Laurens Samson leads a development team at the City of Amsterdam that guides the implementation of LLMs across municipal departments while benchmarking Dutch-speaking language models on metrics such as bias, truthfulness, and sustainability to ensure responsible and ethical use.

In parallel, he is pursuing a PhD in Safety in Multimodal Large Language Models at the University of Amsterdam, focusing on safety in multimodal generative models.