Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed PyData Berlin 2025

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed
.ical
2025-09-01 14:20–14:50, B07-B08

Spare Cores is a Python-based, open-source, and vendor-independent ecosystem collecting, generating, and standardizing comprehensive data on cloud server pricing and performance. In our latest project, we started 2000+ server types across five cloud vendors to evaluate their suitability for serving Large Language Models from 135M to 70B parameters. We tested how efficiently models can be loaded into memory of VRAM, and measured inference speed across varying token lengths for prompt processing and text generation. The published data can help you find the optimal instance type for your LLM serving needs, and we will also share our experiences and challenges with the data collection and insights into general patterns.

Spare Cores is a vendor-independent, open-source, Python-based ecosystem offering a comprehensive inventory and performance evaluation of servers across cloud providers. We automate the discovery and provisioning of thousands of server types in public using GitHub Actions to run hardware inspection tools and benchmarks for different workloads, including:
- General performance (GeekBench, PassMark)
- Memory bandwidth and compressions algorightms
- OpenSSL, Redis, and web serving speed
- DS/ML-specific benchmarks like GBM training and LLM inference on CPUs and GPUs

All results and open-source tools (such as database dumps, APIs, and SDKs) are openly published to help users identify and launch the most cost-efficient instance type for their specific use case in their own cloud environment.

This talk introduces the open-source ecosystem, then highlights our latest benchmarking efforts, including the performance evaluation of ~2,000 server types to determine the largest LLM model (from 135M to 70B parameters) that can be loaded on the machines and the inference speeds achievable with various token length for prompt processing and text generation.

Slides: https://sparecores.com/assets/slides/pydata-berlin-2025.html#/cover-slide

Prerequisites:

Expected audience expertise: Domain: Novice Abstract as a tweet (X) or toot (Mastodon):

Spare Cores benchmarked 2,000+ cloud server types for LLM inference speed

Gergely Daroczi

Gergely Daroczi, PhD, is a passionate R/Python user and package developer for two decades. With over 15 years in the industry, he has expertise in data science, engineering, cloud infrastructure, and data operations across SaaS, fintech, adtech, and healthtech startups in California and Hungary, focusing on building scalable data platforms. Gergely maintains a dozen open-source R and Python projects and organizes a tech meetup with 1,800 members in Hungary – along with other open-source and data conferences.

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed .ical 2025-09-01 14:20–14:50, B07-B08

Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed
.ical
2025-09-01 14:20–14:50, B07-B08