What are Shuffle files and "Tiny Tasks"?

In a driver/executor distributed computing model, since the tasks are asynchronous and not communicating directly with each other, they are forced to use workarounds with shuffle files. New tasks must be created to read the shuffle file. This additional overhead creates inefficiencies that, for example, can crash JVMs, crash linux, hit file descriptor limits, and so on.

All-to-all communication contrasts with approaches using shuffle files to create and manage shuffle files for intermediate management of task results for other dependent tasks.

As Bodo uses all-to-all communication which is much more efficient, Bodo is more efficient than traditional approaches using shuffle files and tiny tasks.