Last week, we shipped Bodo 2025.5, the first release that bundles our new Bodo DataFrame library: a drop-in-replacement for Pandas that provides advanced database optimizations and Bodo’s MPI backend.
Data scientists love the Pandas API but hit a wall the moment data stops fitting in memory or when single core processing is way too slow. SQL and SQL-like engines solve scale but abandon Python ergonomics. Bodo DataFrame closes that gap: easy as Pandas, fast as a data warehouse, on your laptop or a 1000-node cluster.
read_parquet
, from_pandas
Series.map
, DataFrame.apply,
string ops (str.lower, str.strip
)Under the hood we integrate DuckDB’s optimizer for logical plan optimization and use Bodo and BodoSQL’s high-performance execution runtime. Anything not yet covered drops through to Pandas, so you can start migrating notebooks today without rewrites.
import bodo_dataframe as bd # same shape as `import pandas as pd`
taxi = bd.read_parquet("s3://nyc/trips_2024/")
short = taxi[taxi.trip_distance < 10][["fare_amount", "trip_distance"]]
print(short.head()) # compiles, optimizes, runs in parallel
Expect rapid coverage of the Pandas surface area, vectorized UDFs, and tighter Iceberg integration. As always, we value brutal feedback—file issues, benchmark us, break things.
Read the design backstory in Rethinking DataFrames: Easy as Pandas, Fast as a Data Warehouse and check the docs for the growing API matrix.
Our goal is to provide a DataFrame experience that is intuitive for Pandas users while delivering the speed and scalability of a distributed data warehouse.
This is an early experimental release of the Bodo DataFrame, and we encourage you to try it out. You can install Bodo using just pip install bodo
. Visit our GitHub repository for more information and join the conversation in our community Slack.