Amazon S3 is one of the most popular technologies that data engineers use to store data as a data lake. One of the typical applications is to read compressed parquet files as part of the extract process in an ETL (extract-transform-load) pipeline...
For a long time, the python multiprocessing library has been a solution for many data scientists and engineers to get faster results when processing time is a pain point. I want to show you a much faster solution: Bodo.
Today I’m happy to announce our collaboration with Xilinx, Inc., including their taking an investment in Bodo. This will be meaningful to more than just Xilinx and Bodo customers... It signifies another stage of our democratizing access to large-scale parallel computing using Python.
Previously, I’ve benchmarked Bodo using the popular example: The Monte Carlo approximation of Pi. In this post, I wanted to test how Bodo performs on another popular data analytics example benchmark: Word Count of Beer Review.
What if we could improve analytics performance by 1,000x and and reduce aggregate operational expenses costs to 1/10 -- using the programming techniques and hardware already in use today? With our Series A funding, that’s what we at Bodo are committed to doing.
A large Fortune 10 enterprise evaluated Bodo for data engineering workloads in their new data infrastructure. They found that Bodo is much simpler to use and 10x faster than highly optimized Spark with the same cluster setup.