Massively Parallel Processing

Paola Reyes
Jul 18, 2023
3 min read

April 27, 2023, by Paola Reyes

I found it simple to comprehend Massive Parallel Processing (MPP) as a group of people working together to complete a task.

Let's say I want to assist a family in rebuilding their home after a natural calamity has destroyed it. Despite my best efforts and good intentions, it will still take me a few months to complete this job. But what if we enlist the assistance of some friends who are willing to volunteer? So, the family can return to their home in fewer months.

This is what MMP can do, finishing the task faster by splitting them into multiple subtasks and working on them, at the same time.

In MMP, “the data and processing power are split up among several different nodes (servers), with one leader node and one or many compute nodes.” (Dearmer, 2021).

Figure 1 shows the general MPP diagram divided into 5 steps. First, the process starts when the client has a query that is transfer to the master node, which contains information (data dictionary, session information) that is needed to build a plan to send tasks to each node. Then, parallel execution is defined as the execution of the plan created in the previous step. After this, each node executes its specific query and then gives its results to the master node. Each node has his own RAM, disk space and operating system. Finally, the master node, integrate the results and send it to the client (Packt, MPP).

Figure 1. General Massively Parallel Processing Architecture (Packt, MPP).

MMP Architecture & Components

The MMP can be divided into two types of architecture: grid and cluster computing.

Grid computing:

Different computers work as a system to work on independent subtasks. Each device has different hardware and operating system, reason why it is considered as a heterogeneous network (Lithmee, 2018).

As seem in Figure 2, grid computing networks is built by three machines type. First the user, referring to the Figure 2. Grid Computing computer that created the task based on the resources on the network and send it to the control server, where the coordination and the management of the tasks will be send to the grid node. Each of them, is assigned by subtask to. xx complete the main query (Kanade, 2022).

Cluster computing:

Two or more computers works together to solve a task. As seen in Figure 3, each node has the same hardware and operating system, considered as homogeneous system.

Figure 3. Cluster Computing

A variety of factors, such as the availability of low-cost microprocessors, fast networks, and software for high performance distributed computing, resulted in the creation of cluster computing. Both fast supercomputers and small businesses can use it. In general, cluster computing is more efficient financially than employing a collection of individual computers while also enhancing performance (Lithmee, 2018).

Azure Synapse | MMP Solution

Azure Synapse is a company that provides Massively Parallel Processing to a variety of companies such as Walgreens, Coca-Cola, and others (HG-Insights, Companies). As seen in Figure 4, the architecture components are the user connection, the control node, the computes nodes, and the Azure storage (Microsoft, 2022).

Figure 4. Azure Synapse MPP Architecture (Microsoft, 2022)

The data is kept safe in Azure Storage, and the company charges a fee for this service. The client can choose the sharding pattern for data distribution, which are hash, round robin, and replicate. Then, the control node is the brain of the system, where the distributed query engine runs with the goal of optimization and coordination of parallel queries. The next component, the compute modes, they provide the computational power, which can range from 1 to 60 based on the service level, and finally, the data movement service, where the coordination of the movement of the data between the compute nodes is done (Microsoft, 2022).

The MMP starts when the user sends the query to the control node, where each optimized subtask is sent to the compute node to solve the query while working in parallel. Then, the compute node stores the data in Azure storage.

References

Dearmer (2022). What is an MPP Database? Intro to Massively Parallel Processing. https://www.integrate.io/blog/what-is-an-mpp-database/ Dommata, S. (2017, May 30th) Computer Cluster. https://medium.com/lvs-load-balance-clustering-configuration-on/what-is-a-computer-cluster-c4c219f4f6d9

HG-Insights. Companies currently using Azure Synapse. https://discovery.hgdata.com/product/azure-synapse

Lithmee. (2018, September 18 th). Difference Between Cluster and Grid Computing. https://pediaa.com/difference-between-cluster-and-grid-computing/

Microsoft (2022, July 26th). Dedicated SQL pool (formerly SQL DW) architecture in Azure Synapse Analytics. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture/ Packt. Massively Parallel Processing. https://subscription.packtpub.com/book/data/9781789533880/6/ch06lvl1sec51/massively-parallel-processing Kanade. (2022, January 19th). What Is Grid Computing? Key Components, Types, and Applications. https://www.spiceworks.com/tech/cloud/articles/what-is-grid-computing/

Massively Parallel Processing

MMP Architecture & Components

Azure Synapse | MMP Solution

References

Recent Posts

Comments