
%20(1200%20x%20485%20px).png)
2025 was a big year at Bodo. We delivered major performance improvements, richer APIs, tighter lakehouse integration, and practical support for AI workflows — and along the way, we launched entirely new products too.
Here’s a look back at what we shipped, shared, and learned this year.
Bodo has always been shaped by the open-source ecosystem, so this wasn’t a philosophical pivot as much as a natural step forward. We wanted to give back to the community that has inspired us from the beginning, ensuring everyone can benefit from—and contribute to—what we’ve built.
At its core, our compute engine transforms standard Python code into efficient, parallelized execution without requiring changes to the codebase— making it possible to achieve performance at scale, no HPC expertise required. We’re excited to see more teams try it, push it in new directions, and help shape what comes next!
🔗 Bodo Open Source Announcement [blog]
In May, we launched Bodo DataFrames with the goal of combining the simplicity and usability of Pandas with database-grade optimization and HPC-class performance.
It’s a drop-in replacement for Pandas that uses compiler-driven execution instead of task orchestration. Under the hood, it uses Bodo Engine’s JIT compilation and a database-grade query optimizer to turn DataFrame pipelines into efficient, parallel execution plans.
In practice, that means:
import bodo.pandas as pdThroughout the year, we also:
pd.read_iceberg) 🔗 Bodo DataFrames Launch Announcement [blog]
🔗 Scaling Amazon S3 Vectors Workflows Effortlessly in Python with Bodo [blog]
Many teams want Pandas-level expressiveness on top of Iceberg tables, but in practice that often means slow scans, heavy metadata overhead, or awkward handoffs to other engines. So this year, we invested in closing that gap:
pd.read_iceberg() and DataFrame.to_iceberg(), scaling transparently from a laptop to multi-node clusters—no JVM required🔗 Bodo Native Integration in PyIceberg 0.10: Bringing Scalability to PyIceberg with Pandas APIs [blog]
The AI toolkit extends Pandas and Series APIs to support LLM and embedding workloads directly, using the same compiler-driven parallel execution model as the rest of Bodo DataFrames.
Users can:
Under the hood, these operations benefit from the same infrastructure as the rest of Bodo DataFrames: Bodo engine’s JIT compilation, query optimization, MPI-backed distributed execution, and spill-to-disk support.
This means teams can treat LLMs as just another step in a Pandas workflow, instead of a separate system bolted on at the edges. Analytics, embeddings, and inference all run in the same execution substrate, making pipelines easier to build, easier to debug, and easier to scale.
We demonstrated this end-to-end in a unified Iceberg-to-LoRA pipeline, where Iceberg tables serve as the system of record, Pandas-style transformations handle filtering and feature preparation, and Bodo scales the entire workflow through to LoRA-based fine-tuning. The same execution model powers analytics, training data preparation, and fine-tuning — no framework switching required.
🔗 From Iceberg to LoRA: A Unified LLM Fine Tuning Pipeline [blog]
🔗 Using LLMs at Scale: A Pandas-Native Approach [blog]
We also introduced PyDough Community Edition, driven by a problem we kept seeing as natural language interfaces became more common in analytics: generating SQL is easy; generating correct, secure, trustworthy analytics is not.
At the center of PyDough is a formal but friendly, Python-native domain-specific language (DSL) built specifically for LLM-driven analytics. This DSL acts as a bridge between natural language and executable logic, making intent explicit rather than implicit.
PyDough is grounded in a knowledge graph of business semantics that resolves ambiguity, prevents invalid joins and overcounts, and enforces structure that mirrors how businesses actually reason about their data. And to further improve accuracy, PyDough also uses an AI ensemble to generate, review, and challenge proposed logic.
Early benchmark results have been encouraging, and we’re excited to share more as PyDough continues to mature!
🔗 Introducing PyDough CE: A Simpler, Safer Path for National Language Analytics [blog]
We published benchmarks and comparisons not as a humble brag, but as a way to understand where different approaches shine. We focused on realistic workloads and the kinds of tradeoffs engineers actually have to make.
Connecting with the community was a big part of 2025 for us.
We spoke at PyData Global, PyData Pittsburgh, and an Open Source Architect Community event about how we think about scaling Pandas in practice with Bodo DataFrames. We also presented at the Iceberg Summit, diving into Iceberg I/O performance and the optimizations we’ve built into the Bodo compute engine. We also spent time exhibiting with our friends at OpenTeams at PyTorch Con.
Beyond conferences, many of the most impactful discussions happened online. Our Slack community grew steadily this year, and the questions, feedback, and real-world use cases shared there have been invaluable.
Thank you to everyone who came to a talk, stopped by a booth, joined Slack, opened an issue, or asked a great question. Those conversations directly influence what we build next — and they’re a big part of what makes this work fun.
🔗 Iceberg Summit: Iceberg I/O Optimizations in Compute Engines [session recording]
🔗 OSA Community Event: Bodo DataFrames [session recording]
We’re incredibly proud of what we’ve built this year — and even more excited about what’s coming next. We’ve got big plans for 2026: CPU-GPU hybrid execution, advanced query optimizations, more Pandas coverage, new interactive BodoSQL, improved I/O and Iceberg integration. Stay tuned!