A few of us from Bodo recently attended the Iceberg Summit, and it was great to see how Iceberg is rapidly evolving to meet the demands of modern data infrastructure. As long-time champions of open table formats and scalable analytics, we were excited not just to attend—but also to contribute. We had the opportunity to present a session on Iceberg I/O optimizations for compute engines.
It was energizing to see the growing enthusiasm around Iceberg, especially in conversations about performance, automation, AI workloads, and Python integration. In this post, we’re sharing some of our key takeaways and major themes from the summit, including the expanding ecosystem, new technical capabilities, and the increasing role of Iceberg in next-generation data platforms.
Iceberg adoption is booming, with large enterprises and dynamic startups alike jumping on board. We heard how major companies are already using Iceberg in production for massive datasets.
The most common reasons for adopting Iceberg include interoperability with multiple compute engines, cost reduction (especially with streaming ingestion), and support for new use cases like near-real-time decision making. Companies like Slack and Bloomberg emphasized the ability to manage time travel, snapshotting, and idempotent merges—making Iceberg viable for high-reliability environments.
Much of this growth is fueled by the Iceberg community’s continued technical innovation and momentum. Some that stood out:
One clear trend we noticed was a strong demand from data science and AI teams for better Python support. This was music to our ears because that’s what Bodo tries to do!
This was true especially for conversations about Python UDF compilation, which Bodo excels at and got a lot of positive feedback on. Many attendees shared the same challenge: they want to run custom Python logic directly on data in Iceberg, but doing so efficiently is still a major hurdle. We saw that compilation techniques (like those used by Bodo) sparked interest for speeding up these Python UDFs. While the core idea of compilation clicked, explaining deeper technical aspects (like Bodo's use of MPI) sometimes required more discussion.
Automating routine tasks like table partitioning, sorting, and general maintenance was another hot topic. Multiple attendees expressed interest in solutions that simplify operations and cut down manual workload, reflecting a broader industry trend toward efficiency.
While Parquet is still king, there's excitement about new file formats like Vortex and Lance, which could offer faster performance for certain types of data queries. We also heard a clear interest in data engines that can easily read multiple formats—whether it's Avro, Parquet, Lance, or others—depending on the specific need.
REST catalogs stood out as the emerging standard. Teams like Airbnb and Bloomberg have already built custom REST-compatible catalogs (often backed by Postgres), while others still use Hive or AWS Glue. There’s a clear push to unlock access control, cross-engine support, and metadata-based governance through REST interfaces.
Data governance features like row-level security, column masking, and role-based access control are top of mind, especially for enterprise users like Microsoft, Bloomberg, and Autodesk. OSS catalogs are playing catch-up here.
Streaming ingestion is common (Airbnb, Pinterest, Slack, Wise), but batch processing remains dominant. The real challenge is operational: streaming ingestion plus table maintenance equals complexity—with issues like compaction, orphan files, and writer coordination being major concerns.
Teams are solving this by building DevOps-style interfaces: YAML config, pull requests, CI/CD flows, and abstraction layers are being used to give end-users frictionless access to data platforms.
A surprising but exciting takeaway was seeing how Iceberg is fitting into AI workflows. There's a growing demand for compute engines that can seamlessly handle various data formats like Avro, Parquet, and Lance, highlighting the need for flexible, high-performance processing solutions.
Overall, the Iceberg Summit showed that Iceberg is becoming central to the future of data infrastructure, driven by strong community innovation, Python integration, smarter automation, and flexible multi-format support. We're happy to be part of this journey, providing the best ways to interact with Iceberg in S3 Tables for example and even with pandas style interfaces.
If you're building on Iceberg or curious about what’s coming next—especially around Python UDF compilation and high-performance compute—be sure to join our Slack community to stay in the loop on our latest Iceberg features.