Pandas 3’s Native Integration with Bodo JIT and Iceberg

February 19, 2026

Ehsan Totoni

Pandas 3 is one of the most significant releases in the project’s history. It modernizes core behavior, improves developer ergonomics, and expands interoperability with the broader data ecosystem.

This major release introduces:

A dedicated string data type by default
More predictable view vs. copy semantics
Expanded Apache Arrow integration for faster interoperability

Beyond these foundational improvements, Pandas 3 also takes a major step forward in performance and scalability by introducing:

Native Bodo JIT integration for accelerating user-defined functions (UDFs)
Native Apache Iceberg support for reliable, scalable data management

In this post, we’ll walk through both integrations and show how to get started.

‍

Accelerating Pandas 3 UDFs with Bodo JIT

User-defined functions (UDFs) are very often a performance bottleneck in Pandas. When you use operations like DataFrame.apply() and Series.map(), Pandas executes your function in pure Python. This bypasses Pandas’ optimized C/NumPy vectorized routines and introduces per-element iteration and interpreter overhead, which can dramatically slow down large workloads.

‍

The Solution: Native Bodo JIT Integration

Pandas 3 introduces an “engine” parameter that allows you to plug in Bodo JIT as the execution backend for UDFs. With Bodo:

UDFs are just-in-time compiled into optimized native binaries
Python interpreter overhead is eliminated
Execution runs in parallel across all available CPU cores

The result can be orders-of-magnitude performance improvements, depending on the workload.

‍

Example

The following example (from Marc Garcia’s excellent Pandas 3 blog post) transforms room descriptions such as:

"Superior Double Room with Patio View"

into a structured string like:

“property_type=hotel, room_type=superior double, view=patio”.

On a local Mac laptop with 25 million rows, using the Bodo JIT engine this example runs 7× faster.

import pandas as pd
import bodo


def format_room_info(row):
   result = "property_type=" + row["property_type"]


   desc = row["name"].lower()
   if " with " not in desc:
       return result + ", room_type=" + desc.removesuffix(" room")


   before, after = desc.split(" with ", 1)
   result += ", room_type=" + before.removesuffix(" room")


   if after.endswith(" view"):
       result += ", view=" + after.removesuffix(" view")
   elif after.endswith(" bathroom"):
       result += ", bathroom=" + after.removesuffix(" bathroom")


   return result




df = pd.read_parquet("rooms.parquet")
df2 = df.apply(format_room_info, axis=1, engine=bodo.jit())

That’s it, no rewrites, no refactoring. Just add the engine argument.

In general, it’s also possible to use the Numba JIT engine in DataFrame.apply(), but it is limited to numerical code, doesn’t support Pandas data structures, and doesn’t parallelize the computation. The above example code fails with the Numba engine due to string data types.

‍

Things to Keep in Mind

JIT compilation is powerful, but there are a few considerations:

Compilation overhead: The first execution includes compilation time (a few seconds depending on complexity). For very small datasets or lightweight functions, JIT may not provide a benefit.
Type stability is required: Variables should not change types during execution.
API coverage: Not all Python, NumPy, or Pandas APIs are supported under JIT.

See the Bodo JIT documentation for full details on supported features and best practices.

‍

Scaling Even Further

While the native Pandas integration with Bodo JIT simplifies acceleration of UDFs, you can also scale all of your Pandas code by replacing:

import pandas as pd

with

import bodo.pandas as pd

This enables automatic parallel execution and scalable performance across CPUs and clusters, without rewriting your code.

‍

Apache Iceberg in Pandas 3

Apache Iceberg is a modern open table format designed to provide a robust foundation for managing complex data at scale and has become the table format of choice for many data teams. It brings database-like features to data lakes such as ACID transactions, time travel and fast querying. Pandas 3’s native Iceberg support simplifies working with Iceberg data in Pandas substantially.

‍

Reading and Writing Iceberg Tables in Pandas 3

Pandas 3 provides pd.read_iceberg() and DataFrame.to_iceberg() for reading and writing Iceberg tables. For example, the simple code below writes an Iceberg table and reads it back using these APIs. Before running this code, create a temporary directory using “mkdir /tmp/warehouse” and make sure you have the PyIceberg package installed (available through pip and conda).

import pandas as pd
from pyiceberg.catalog import load_catalog


warehouse_path = "/tmp/warehouse"
catalog_properties = {
       'type': 'sql',
       "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
       "warehouse": f"file://{warehouse_path}",
   }
catalog = load_catalog("default", **catalog_properties)
catalog.create_namespace_if_not_exists("test")


df = pd.DataFrame({"A": [1, 2, 3], "B": ["x", "y", "z"]})
df.to_iceberg("test.test_table", "default", catalog_properties=catalog_properties)


df2 = pd.read_iceberg("test.test_table", "default", catalog_properties=catalog_properties)
print(df2)

‍

Scaling Iceberg with Bodo DataFrames

Bodo DataFrames library supports compatible APIs that accelerate and scale Iceberg read and write on any number of available CPU cores (from laptops to clusters). In addition, Bodo supports writing tables with partition spec and sort order features of Iceberg, allowing predicate pushdown for large data sets. Moreover, Bodo supports a simple filesystem catalog to allow getting started quickly without catalog setup (not recommended for production).

Here is the same example in Bodo DataFrames code and using filesystem catalog:

import bodo.pandas as pd


df = pd.DataFrame({"A": [1, 2, 3], "B": ["x", "y", "z"]})
df.to_iceberg("test_table", location="./warehouse/")


df2 = pd.read_iceberg("test_table", location="./warehouse/")
print(df2)

This version scales seamlessly from a laptop to a distributed cluster. See Bodo DataFrames documentation for more information.

‍

Conclusion

Pandas 3 is a major leap forward for data processing with native UDF acceleration through Bodo JIT and native Iceberg integration. Bodo’s full compatibility with Pandas enhances the performance and scalability of Pandas code seamlessly and without code rewrites.

To get started using Bodo yourself: