Iceberg I/O Optimizations in Compute Engines – Summit Recap

June 19, 2025

Isaac Warren

At this year’s Iceberg Summit, I had the opportunity to present a talk alongside my colleague Srinivas Lade on I/O optimizations in compute engines, particularly within the context of the Bodo engine.

We focused on practical strategies to reduce query times, optimize file access, and improve performance in Iceberg-backed environments. Our talk explored:

Why I/O is often the dominant cost in modern data workflows
How Bodo’s manager-less, SPMD architecture enables high-performance distributed computation

‍

Core I/O optimizations like column pruning, filter pushdown, and runtime join filters

A behind-the-scenes look at how Bodo reads Parquet vs. Iceberg datasets

‍

File redistribution strategies and table maintenance impacts

‍

Experimental improvements like using Feather/IPC files and advanced heuristics for smarter file planning

‍

The highlight that got us excited was that some of our techniques cut query times from 900 seconds to just 60, reducing the number of rows read from 60 billion to 60 million.

Watch the full talk here: Iceberg I/O Optimizations in Compute Engines

We also discussed some of the forward-looking improvements we’re exploring, such as caching strategies, alternative file formats, and leveraging sort/partition information more aggressively at read time.

Whether you're working on a new compute engine, optimizing queries on Iceberg, or just curious how these systems work under the hood — I think you’ll find something useful in the talk.

As always, we’re happy to chat — feel free to reach out with questions or feedback!

To learn more, check out: