Bodo Blog

The latest news, insights, product release information, guides, and everything in between from the Bodo team.

November 23, 2022

2022 at Bodo: A Year in Review

Bodo has accomplished so much this year, we wanted to take a moment to reflect on some of our highlights as 2022 comes to a close.

August 4, 2022

Bodo & Iceberg: The Simple and Fast Open Data Warehouse of the Future

Bodo on Iceberg simplifies data management for users substantially by supporting native Python (in addition to SQL). The connector eliminates the need for an involved setup process and parameter tuning to make sure data engineers and data scientists are solving business problems instead of fighting data infrastructure.

July 7, 2022

Creating ETL Pipelines With Bodo

In this post, we look at some of the considerations you may want to keep in mind when building an ETL pipeline similar to the data collection and exploration as seen...

June 1, 2022

Bodo Now Available on Saturn Cloud

Over the last few months, Bodo and SaturnCloud have worked together to provide a joint solution with Bodo software running within Saturn Cloud resources. Data scientists can now access...

May 25, 2022

Natively Integrated Python and SQL with Extreme Performance and Scale: Introducing BodoSQL

Python is the language of choice for AI and machine learning (ML), but SQL has been used...

May 17, 2022

Machine Learning Series: NYC Yellow Taxi Tips Prediction

Bodo allows machine learning practitioners to rapidly explore data and build complex pipelines. Using Bodo, developers can seamlessly scale their codes from using their own laptop to using Bodo's platform. In this series, we will...

May 17, 2022

Machine Learning Series: Credit Card Fraud Detection

In this post, we will run the code using Bodo. It means that the data will be distributed in chunks across processes. Bodo's documentation provides more information about the parallel execution model. If you want to run the example using Pandas only (without Bodo), simply...

May 9, 2022

Bodo’s Approach to Open Platforms and Open Source

Bodo’s mission is to enable easy access to high-performance computing; to build a platform that makes working with petabyte-scale datasets as fast and straightforward as running pandas on small datasets using a laptop. We believe...

May 2, 2022

Bodo Plus Snowflake: Bringing Extreme Performance to Massive-Scale ETL Using Python

The Snowflake Data Cloud simplifies data management for data engineers at a near-unlimited scale, while the Bodo Platform brings extreme performance and scalability to large-scale Python data processing. Snowflake and Bodo have combined forces to give data teams...

April 22, 2022

Robustness and Resilience of Bodo Execution in Data Pipelines

Overheads of master-executor systems like Spark have been justified as a “necessary evil” for achieving resilience. However, we have shown that Bodo can achieve much higher resilience without extra overheads.

April 5, 2022

Accelerate Python with Bodo: Read Data From AWS S3

Amazon S3 is one of the most popular technologies that data engineers use to store data as a data lake. One of the typical applications is to read compressed parquet files as part of the extract process in an ETL (extract-transform-load) pipeline...

February 23, 2022

How to Scale Your Python Application: Multiprocessing Library vs. Bodo

For a long time, the python multiprocessing library has been a solution for many data scientists and engineers to get faster results when processing time is a pain point. I want to show you a much faster solution: Bodo.