Bodo 2025.7: Turbo‑Charging DataFrames for Production Lakes

July 4, 2025

Rohit Krishnan

Bodo’s July release updates the best of the Bodo DataFrame library and adds new features—from the best way to interact with Iceberg, to making database-grade analytics now possible!

1. Native Iceberg & Parquet Writes

New features

DataFrame.to_iceberg() now writes Iceberg tables (including partition spec and sort order).
DataFrame.to_parquet() adds first‑class Parquet write.
read_iceberg() supports simple filesystem reads.

So what?

You can move computation‐heavy Bodo pipelines straight into open‑table‑format data lakes—no Spark detour, no flaky export scripts. That shrinks end‑to‑end latency and lets you query the same files immediately from DuckDB, Trino, or any Iceberg‑aware engine.

2. Full‑Blooded GroupBy

New features

DataFrame.groupby() with sum, count, max.
DataFrameGroupBy and SeriesGroupBy.agg().

So what?

The workhorse of analytical code is now compiled and parallelized by Bodo. Complex aggregations that once forced a round‑trip to Pandas or PySpark stay in‑process and scale linearly with cores.

3. Very Large Pandas API Surface

Area	New Coverage	Impact
String ops	8 new `Series.str` methods → 96 % coverage	Text ETL and log parsing just work
Reductions	`max()`, `sum()`, etc. on Series	One‑liner KPI calcs
Datetime	`pd.to_datetime()` + `timedelta`	Working with date and time data becomes trivial
Null checks	`pd.isnull()` top‑level	Cleaner NA handling in pipelines
Sorting + head()	Optimized	Faster “top‑N” queries at any scale

4. Quality‑of‑Life Tweaks

Column renaming, arithmetic column creation, boolean filtering, and constructors now mirror vanilla Pandas.
Fallback warnings instead of cryptic errors on unsupported ops keep you productive.

5. Modern Python, Fewer Locks

Python 3.13 support lets you adopt the newest language features immediately.
Relaxed dependency constraints mean easier integration into existing envs and CI pipelines.

Bottom Line

Bodo 2025.7 eliminates the “last‑mile” friction between high‑speed DataFrame computation and production‑grade data lakes, while pushing API coverage toward parity with Pandas. If you’re building heterogeneous pipelines that must both crunch numbers fast and land in Iceberg/Parquet for everyone else to consume, this release is the missing piece.

👉 Try it today: pip install bodo.

👉 Check out our Github repo