What Is Distributed vs. Parallel Computing?

Distributed computing: Each task runs on a single core, and all the tasks collectively are spread across many machines, to be executed in a concurrent manner via a driver/executor architecture. Tasks are scheduled and managed by the driver and processed by the executors. As this approach is based upon internet technologies, it has inherent inefficiencies and overhead, both in the driver/executor model as well as communication between machines using inefficient communications (such as TCP), that compound as the number of nodes increases. Distributed computing was popularized to address big data analytics challenges in the enterprise, first with Apache Hadoop using MapReduce and subsequently with Apache Spark. Inefficiency challenges increasingly arise in this approach as the number of nodes increases.

Parallel computing: Processes are carried out simultaneously on all available cores and memory. Large problems can often be divided into smaller ones, which can then be solved at the same time. Single Program Multiple Data (SPMD), the most common style of parallel programming, is a technique to achieve parallelism whereby processes are split up and run simultaneously on multiple processors with different input in order to obtain results faster. Communication between processes is typically done via Message Passing Interface (MPI). Parallel computing was developed based upon techniques proven over the past several decades in high-performance computing (HPC) and supercomputing research and development. Applications historically applied to scientific research such as climate and weather predictions, fluid dynamics with highly trained experts developing the code. Parallel computing technologies and techniques are becoming mainstream with HPC computing in the cloud by providers such as AWS and Microsoft Azure.

what is distributed vs parallel computing

Bodo uses both core techniques of parallel computing: SPMD as the style of parallel programming and MPI for communication between processes. Bodo is the first platform to combine the simplicity of a scripting language, using Python, with the power of parallel programming that can run on commonly available hardware. Bodo is more efficient than distributed computing with: SPMD, providing no overhead of task management as in driver/executor approaches; MPI, yielding highly efficient communication between processes; and high reliability, due to faster processing and less failure points.