Bodo Dataframe Library: First LookBodo Dataframe Library: First Look

Bodo Dataframe Library: First Look

Date
May 19, 2025
Author
Rohit Krishnan

Last week, we shipped Bodo 2025.5, the first release that bundles our new Bodo DataFrame library: a drop-in-replacement for Pandas that provides advanced database optimizations and Bodo’s MPI backend.

Why This Matters

Data scientists love the Pandas API but hit a wall the moment data stops fitting in memory or when single core processing is way too slow. SQL and SQL-like engines solve scale but abandon Python ergonomics. Bodo DataFrame closes that gap: easy as Pandas, fast as a data warehouse, on your laptop or a 1000-node cluster.

What’s Inside 2025.5

  • I/O: read_parquet, from_pandas
  • Transform: Series.map, DataFrame.apply, string ops (str.lower, str.strip)
  • Query: Column projection, filter & limit push-down, head()
  • Mutate: In-place column assignment
  • Engine: DuckDB optimizer, lazy plans, streaming execution to avoid OOM
  • Safety net: Automatic fallback to Pandas for unsupported ops

Under the hood we integrate DuckDB’s optimizer for logical plan optimization and use Bodo and BodoSQL’s high-performance execution runtime. Anything not yet covered drops through to Pandas, so you can start migrating notebooks today without rewrites.

Quick Taste

import bodo_dataframe as bd    # same shape as `import pandas as pd`

taxi = bd.read_parquet("s3://nyc/trips_2024/")
short = taxi[taxi.trip_distance < 10][["fare_amount", "trip_distance"]]
print(short.head())            # compiles, optimizes, runs in parallel

What’s Next

Expect rapid coverage of the Pandas surface area, vectorized UDFs, and tighter Iceberg integration. As always, we value brutal feedback—file issues, benchmark us, break things.

Read the design backstory in Rethinking DataFrames: Easy as Pandas, Fast as a Data Warehouse and check the docs for the growing API matrix.

Our goal is to provide a DataFrame experience that is intuitive for Pandas users while delivering the speed and scalability of a distributed data warehouse.

This is an early experimental release of the Bodo DataFrame, and we encourage you to try it out. You can install Bodo using just pip install bodo. Visit our GitHub repository for more information and join the conversation in our community Slack.

Ready to see Bodo in action?
Schedule a demo with a Bodo expert

Let’s go