Bodo Benchmarks

Bodo's parallel computing architecture avoids the task overheads and sequential bottlenecks of driver-executor distributed systems, enabling extreme performance and linear scaling.

Performance Derived From the TPC-H Benchmark

Derived from the standard TPC-H benchmarks we compared Bodo’s to Spark and Dask for data processing workloads. We used scale factor 1,000 (~1 TB dataset) on a cluster of 16 c5n.18xlarge AWS instances which has 576 physical CPU cores and 3 TB of total memory. Bodo provided a median speedup of 23x over Spark (95%+ infrastructure cost savings) and 148x over Dask. (Note: Equivalent Python versions of TPC-H SQL queries were used to evaluate Python systems.)
"These are real benchmarks. We reviewed the Dask comparison and  it is real. Dask will likely improve over time as a couple of bugs are  fixed and additions are made. Also Bodo and Dask can work  together on larger problems. Bodo is excellent and "easy scaling"  straightforward pandas code — you let Bodo handle the parallelism."
Travis Oliphant
CEO at OpenTeams and Quansight, Founder of Anaconda, NumFOCUS and PyData.
Creator of NumPy, SciPy, and Numba.

Data Engineering for ML: Data Derived from TPC-xBB Q26

Customer benchmark for data engineering (ETL and feature engineering): Bodo is 16.5x faster than optimized Spark on a 125-node cluster (AWS c5n.18xlarge) with 4,500 CPU cores, input data is scale 40,000 with 52 billion rows (2.5TB data in compressed Parquet format).  Note: Equivalent Python versions of TPC-H SQL queries were used to evaluate Python systems.
Chart showing performance comparision on scaling factor 40000 for TPCXbb
LEARN MORE

Data Engineering: TeraSort

Customer benchmark for data engineering: Bodo is 9x faster than optimized Spark on a 125-node cluster (AWS c5n.18xlarge) with 4,500 CPU cores, input data is scale factor 10,000 of TeraSort with 100B rows (4TB in compressed Parquet format).
Chart showing performance comparison on 4TB data for Terasort

Retail Product Analytics

Customer benchmark for filtering data using customized user-input filters and and joining the resulting group back with the original dataset. Bodo is 11x faster than optimized PySpark on a 16-node cluster (AWS c5n.18xlarge) with 576 CPU cores (input data is a 120GB data in compressed Parquet).
Chart showing performance comparison on 4TB data for Terasort

End-to-End Machine Learning

Customer benchmark for an End-to-End ML pipeline including Data Load, Data Prep, Feature Engineering, ML Training, ML Prediction. Bodo is 120x faster than PySpark on an r5d.16xlarge AWS node (32 CPU cores).
Chart showing linear scaling of Bodo

Retail Price Image Management

Customer benchmark for retail price image management using simulations. Bodo on a 4 node cluster (AWS m5.24xlarge) is 85x faster than multi-processing Python on a single node.
Chart showing linear scaling of Bodo
© Bodo, Inc
Socials:
By using this website, you agree to our
privacy policyX