Rodrigo Silva Ferreira PyData Global 2025

Rodrigo Silva Ferreira
.ical

Rodrigo Silva Ferreira is a QA Engineer at Posit, where he contributes to the quality and usability of open-source tools that empower data scientists working in R and Python. He focuses on both manual and automated testing strategies to ensure reliability, performance, and an excellent user experience.

Rodrigo holds a BSc. in Chemistry with minors in Applied Math and Arabic from NYU Abu Dhabi and a MSc. in Analytical Chemistry from the University of Pittsburgh. Multilingual and globally minded, he enjoys working at the intersection of data, science, and technology — especially when it means building tools that help people better understand and navigate the world through its increasingly complex data.

Session

12-09

16:00

30min

Text Mining Orkut’s Community Data with Python: Cultural Memory, Platform Neglect, and Digital Amnesia

Rodrigo Silva Ferreira

Orkut was once the emotional and cultural core of Brazil’s internet. Its scraps, testimonials, and communities gave users a way to publicly shape identity, build relationships, and engage with everything from music and religion to politics and humor. When Google shut it down in 2014, most of its data was deleted. What remains today is fragmented and buried in the Wayback Machine.

In this talk, I use Python to recover and analyze limited traces of Orkut’s digital legacy. I scraped thousands of community names from archived HTML using requests and BeautifulSoup, processed them with multilingual sentence embeddings from sentence-transformers, and applied scikit-learn and BERTopic to cluster the data, surface major social themes, and quantify them. These techniques reveal how users created meaning, formed subcultures, and expressed identity through online interactions.

Alongside the technical walkthrough, I draw on Cory Doctorow’s concept of enshittification, defined as the slow decline of platforms as they shift from serving users to exploiting them. Orkut is a case of enshittification by neglect: its shutdown led not just to the death of a platform, but to the erasure of a generation’s digital memory. According to Google's farewell announcement, over its 10 years of existence, Orkut hosted 51 million communities, 120 million discussion topics, and more than 1 billion interactions; most of which were permanently deleted.

This talk is for Python users interested not only in working with social media text data but also in uncovering the cultural narratives embedded within it. It invites the audience to see datasets as more than technical artifacts, viewing them instead as living records of online social life.

General Track

Rodrigo Silva Ferreira .ical

Session

Rodrigo Silva Ferreira
.ical