Key Takeaways from Iceberg Summit 2025

April 22, 2025

Bodo Engineering Team

A few of us from Bodo recently attended the Iceberg Summit, and it was great to see how Iceberg is rapidly evolving to meet the demands of modern data infrastructure. As long-time champions of open table formats and scalable analytics, we were excited not just to attend—but also to contribute. We had the opportunity to present a session on Iceberg I/O optimizations for compute engines.

It was energizing to see the growing enthusiasm around Iceberg, especially in conversations about performance, automation, AI workloads, and Python integration. In this post, we’re sharing some of our key takeaways and major themes from the summit, including the expanding ecosystem, new technical capabilities, and the increasing role of Iceberg in next-generation data platforms.

‍

Iceberg's Growing Popularity

Iceberg adoption is booming, with large enterprises and dynamic startups alike jumping on board. We heard how major companies are already using Iceberg in production for massive datasets.

The most common reasons for adopting Iceberg include interoperability with multiple compute engines, cost reduction (especially with streaming ingestion), and support for new use cases like near-real-time decision making. Companies like Slack and Bloomberg emphasized the ability to manage time travel, snapshotting, and idempotent merges—making Iceberg viable for high-reliability environments.

Much of this growth is fueled by the Iceberg community’s continued technical innovation and momentum. Some that stood out:

Iceberg V3 introduced exciting new features like Variant and Geospatial data types, significantly broadening its use cases. Variant data type in particular will make managing JSON data a lot easier.
Iceberg V4: There's a lot of anticipation for Iceberg V4. A key change is likely to be a more adaptive metadata layout. This is expected to significantly speed up certain operations, particularly reading small tables faster.

‍

Python and UDFs are in High Demand

One clear trend we noticed was a strong demand from data science and AI teams for better Python support. This was music to our ears because that’s what Bodo tries to do!

This was true especially for conversations about Python UDF compilation, which Bodo excels at and got a lot of positive feedback on. Many attendees shared the same challenge: they want to run custom Python logic directly on data in Iceberg, but doing so efficiently is still a major hurdle. We saw that compilation techniques (like those used by Bodo) sparked interest for speeding up these Python UDFs. While the core idea of compilation clicked, explaining deeper technical aspects (like Bodo's use of MPI) sometimes required more discussion.

‍

Automation is the Way Forward

Automating routine tasks like table partitioning, sorting, and general maintenance was another hot topic. Multiple attendees expressed interest in solutions that simplify operations and cut down manual workload, reflecting a broader industry trend toward efficiency.

‍

New File Formats and Catalog Improvements

While Parquet is still king, there's excitement about new file formats like Vortex and Lance, which could offer faster performance for certain types of data queries. We also heard a clear interest in data engines that can easily read multiple formats—whether it's Avro, Parquet, Lance, or others—depending on the specific need.

REST catalogs stood out as the emerging standard. Teams like Airbnb and Bloomberg have already built custom REST-compatible catalogs (often backed by Postgres), while others still use Hive or AWS Glue. There’s a clear push to unlock access control, cross-engine support, and metadata-based governance through REST interfaces.

‍

Governance, Streaming, and Operational Complexity

Data governance features like row-level security, column masking, and role-based access control are top of mind, especially for enterprise users like Microsoft, Bloomberg, and Autodesk. OSS catalogs are playing catch-up here.

Streaming ingestion is common (Airbnb, Pinterest, Slack, Wise), but batch processing remains dominant. The real challenge is operational: streaming ingestion plus table maintenance equals complexity—with issues like compaction, orphan files, and writer coordination being major concerns.

Teams are solving this by building DevOps-style interfaces: YAML config, pull requests, CI/CD flows, and abstraction layers are being used to give end-users frictionless access to data platforms.

‍

Iceberg Meets AI and Multi-Format Compute

A surprising but exciting takeaway was seeing how Iceberg is fitting into AI workflows. There's a growing demand for compute engines that can seamlessly handle various data formats like Avro, Parquet, and Lance, highlighting the need for flexible, high-performance processing solutions.

Overall, the Iceberg Summit showed that Iceberg is becoming central to the future of data infrastructure, driven by strong community innovation, Python integration, smarter automation, and flexible multi-format support. We're happy to be part of this journey, providing the best ways to interact with Iceberg in S3 Tables for example and even with pandas style interfaces.

If you're building on Iceberg or curious about what’s coming next—especially around Python UDF compilation and high-performance compute—be sure to join our Slack community to stay in the loop on our latest Iceberg features.