Bodo 2025.7: Turbo‑Charging DataFrames for Production LakesBodo 2025.7: Turbo‑Charging DataFrames for Production Lakes

Bodo 2025.7: Turbo‑Charging DataFrames for Production Lakes

Date
July 4, 2025
Author
Rohit Krishnan

Bodo’s July release updates the best of the Bodo DataFrame library and adds new features—from the best way to interact with Iceberg, to making database-grade analytics now possible!

1. Native Iceberg & Parquet Writes

New features

  • DataFrame.to_iceberg() now writes Iceberg tables (including partition spec and sort order).
  • DataFrame.to_parquet() adds first‑class Parquet write.
  • read_iceberg() supports simple filesystem reads.

So what?

You can move computation‐heavy Bodo pipelines straight into open‑table‑format data lakes—no Spark detour, no flaky export scripts. That shrinks end‑to‑end latency and lets you query the same files immediately from DuckDB, Trino, or any Iceberg‑aware engine.

2. Full‑Blooded GroupBy

New features

  • DataFrame.groupby() with sum, count, max.
  • DataFrameGroupBy and SeriesGroupBy.agg().

So what?

The workhorse of analytical code is now compiled and parallelized by Bodo. Complex aggregations that once forced a round‑trip to Pandas or PySpark stay in‑process and scale linearly with cores.

3. Very Large Pandas API Surface

Area New Coverage Impact
String ops 8 new Series.str methods → 96 % coverage Text ETL and log parsing just work
Reductions max(), sum(), etc. on Series One‑liner KPI calcs
Datetime pd.to_datetime() + timedelta Working with date and time data becomes trivial
Null checks pd.isnull() top‑level Cleaner NA handling in pipelines
Sorting + head() Optimized Faster “top‑N” queries at any scale

4. Quality‑of‑Life Tweaks

  • Column renaming, arithmetic column creation, boolean filtering, and constructors now mirror vanilla Pandas.
  • Fallback warnings instead of cryptic errors on unsupported ops keep you productive.

5. Modern Python, Fewer Locks

  • Python 3.13 support lets you adopt the newest language features immediately.
  • Relaxed dependency constraints mean easier integration into existing envs and CI pipelines.

Bottom Line

Bodo 2025.7 eliminates the “last‑mile” friction between high‑speed DataFrame computation and production‑grade data lakes, while pushing API coverage toward parity with Pandas. If you’re building heterogeneous pipelines that must both crunch numbers fast and land in Iceberg/Parquet for everyone else to consume, this release is the missing piece.

👉 Try it today: pip install bodo.

👉 Check out our Github repo

Ready to see Bodo in action?
Schedule a demo with a Bodo expert

Let’s go