12  
Ernesto Soto Gómez  
Universidad de las Ciencias Informáticas  
RECIBIDO 22/09/2020 ● ACEPTADO 23/09/2020 ● PUBLICADO 30/09/2020  
RESUMEN  
Algunas de las herramientas más populares hoy en día para la programación paralela son Interfaz  
de Paso de Mensajes y Multiprocesamiento Abierto. Es de interés comparar estas herramientas  
en la resolución de los mismos tipos de problemas, debido a la utilización de diferentes enfoques  
en la comunicación entre tareas. Este trabajo tiene como objetivo contribuir a este empeño al  
ejecutar pruebas en una arquitectura de memoria compartida y centralizada en el caso de  
problemas con una solución completamente paralela. El caso de estudio seleccionado fue la  
computación paralela del conjunto de Mandelbrot. Las pruebas se realizaron para diferentes  
límites de iteración, cantidad de procesadores y variantes de implementación en C++. Los  
resultados muestran un mejor desempeño en el caso de Multiprocesamiento Abierto.  
Palabras claves: C++, computación paralela, conjunto de Mandelbrot, MPI, OpenMP.  
ABSTRACT  
Nowadays, some of the most popular tools for parallel programming are Message Passing  
Interface and Open Multi-Processing. It is of interest to compare these tools in solving the same  
kind of problems, because of the use of different approaches to inter-task communication. This  
work attempts to contribute to this goal by running trials in a centralized shared memory  
architecture in the case of problems with an entirely parallel solution. The selected case study  
was the parallel computation of Mandelbrot set. Trials were conducted for different iteration limits,  
processors amount, and C++ implementation variants. The results show better performance in  
the case of Open Multi-Processing.  
Keywords: C++, Mandelbrot set, MPI, OpenMP, parallel computing.  
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
13  
INTRODUCTION  
There are diverse tools for parallel programming. Some of the most popular nowadays are  
1
2
Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) . Both tools are  
essentially dissimilar because of the use of different approaches to inter-task communication:  
OpenMP uses shared-memory (tasks are realized by using threads in the same operating system  
process) [1,2] but MPI uses message-passing (tasks are realized by using a different operating  
system processes) [3,4]. For this reason, it is of interest to compare these tools in solving the  
same kind of problems. That is, which is the best in computing the same kind of solution for the  
same kind of problems taking into account that Do the concerned tools use different inter-task  
communication mechanisms? This article attempts to contribute to the answer of this question in  
a centralized shared memory architecture [5] in the case of problems with an entirely parallel  
solution, that is, a solution with the absolute absence of the need for synchronization except for  
the gathering of the partial solutions from several subtasks in order to construct one final  
solution.  
To accomplish this goal, the parallel generation of the Mandelbrot set has been chosen as an  
example. This case has been studied in the parallel computing context, usually as a didactic  
example [1,6,7] because it can be generated from a simple mathematical expression. Also, the  
Mandelbrot set is a fractal: a figure that possesses a detailed structure in a wide range of scales.  
Fractal geometrical relations are found in several natural structures, thereby fractals are of great  
interest to science [8]. This last point adds to the motivation of the study of this example.  
The parallel computing of the Mandelbrot set has been already studied in the case of MPI and  
OpenMP independently from each other [1, 9, 10]. The current work makes comparisons between  
the straightforward sequential implementation and corresponding parallel versions implemented  
in MPI and OpenMP with different schedule strategies. C++ has been used as the programming  
language and the comparisons were made for different iteration limits and number of processors.  
used  
code  
may  
be  
found  
in  
The current document is structured in the following manner. First, the fundamental theoretical  
elements, the proposed sequential algorithm, and the corresponding parallel versions are  
exposed. Second, the characteristics of the experiment and the obtained results are described.  
Last, final remarks are made  
1
2
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
14  
METHODS AND MATERIALS  
Sequential implementation  
The Mandelbrot set is the set of all c   for which the recurrence relation (Equation 1):  
 = 푧2 + 푐  
(1)  
푛−1  
does not diverge with   ℂ and  = 0.  
0
It is known [6,7] that such sequence does not diverge when (Equation 2):  
|
| ≤ ꢀ  
(2)  
(3)  
3
for all   ℂ where (Equation 3):  
2 2  
푧 | = √ℜ(푧 ) + ℑ(푧 )  
푛 푛 푛  
|
As a way of visualization, the values of c that are members of the Mandelbrot set may be drawn  
in the complex plane. Figure 1 shows images of the Mandelbrot set. The images were generated  
by using the C++ solution developed for this research.  
(a) iteration limit = 10 (b) iteration limit = 20  
(c) iteration limit = 40 (d) iteration limit = 80  
Figure 1. Representation of the Mandelbrot set in the complex plane.  
3
(z) and =(z) stands for real and imaginary parts, respectively.  
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
15  
A straightforward algorithm that gives an approximation of the Mandelbrot set is to move along  
a subset of a discrete version of the domain of c and verify that  does not diverge by using  
(Equation 2). The generation of the sequence determined by (Equation 1) is made while a given  
iteration limit is not exceeded [6,7]. A sequential C++ implementation of the mentioned algorithm  
is given in (Listing 1).  
Listing 1. Sequential C++ implementation of the straightforward algorithm to approximately  
compute the Mandelbrot set.  
1
2
3
4
5
6
7
8
9
1
1
1
1
1
void compute_mandelbrot_subset(int* result, int iter_limit, int x_resolution,  
int y_resolution, double x_begin, double y_begin) {  
double x_step = (x_end-x_begin) / (x_resolution-1);  
double y_step = (y_end-y_begin) / (y_resolution-1);  
int i, j;  
complex<double> c, z;  
for (i = 0; i < x_resolution * y_resolution; i++) {  
c = complex<double>(x_begin + (i % x_resolution) * x_step,  
y_begin + (i / x_resolution) * y_step );  
0
1
2
3
4
z = 0; j = 0;  
while (norm(z) <= 4 && j < iter_limit) { z = z*z + c; j++; }  
result[i-start] = j;  
}
}
Procedure compute_mandelbrot_set in (Listing 1) receives an array result where the computed  
set will be stored. Although it represents the complex plane, result is a unidimensional array. This  
will allow the implementation of similar parallel versions for MPI and OpenMP even though, in the  
moment of the visualization of the set in a two-dimensional space, some transformations must  
be done. The mandelbrot set and its complement are given as an array of integers. Each of these  
values is the number of iterations before 2 is found true. This is useful when visualizing the  
mandelbrot set. Figure 1 shows some examples. The images were generated for different iteration  
4
limits with a similar procedure to those described in [10] and [7, pp. 103108]. The iteration  
limit is given by parameter iter_limit. Parameters x_resolution and y_resolution stand for how big  
the computed set is, that is, the amount of computed detail. In this case, the length of result is  
the product of x_resolution and y_resolution. Parameters x_begin, x_end, y_begin and y_end  
4
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
16  
5
denote the domain of real and imaginary dimensions, respectively. That is, if  = 푥 + 푦푖 then  ∈  
푥_푏푒푔푖ꢁ, 푥_푒ꢁ푑] and  ∈ [푦_푏푒푔푖ꢁ, 푦_푒ꢁ푑]. When visualizing the set, the ranges are usually around  
[
푥 ∈ [ꢂꢀ.5,ꢃ] and  ∈ [ꢂꢃ,ꢃ]. Variables x_step and y_step determine the level of discretization of  
the plane, that is, the width of the steps taken in each dimension. The full C++ sequential  
6
implementation may be found in folder code/mandelbrot_sequential .  
Parallel implementation  
The parallel computing of  may be difficult due to the nonlinear character of (Equation 1).  
Moreover, if (Equation 1) is expanded the following relations hold (Equation 4 and 5):  
푅푒(푧 ) = ℜ(푧 )2  ℑ(푧 )2 + ℜ(c)  
(4)  
(5)  
푛−1  
푛−1  
퐼푚(푧 ) = ꢀℜ(푧1)ℑ(푧1) + ℑ(c)  
May be observed that (Equation 4) and (Equation 5) reference to each other recursively, making  
more difficult the problem of the parallel computing of  . For these reasons, normally, the parallel  
computing of the Mandelbrot set is realized by making parallel computations of the iterations. In  
this case, the plane is divided into parts. In the proposed sequential procedure, result a  
unidimensional array, which means that only one loop must be parallelized.  
Implementation in OpenMP is straightforward by using directive omp for [1, pp. 5378]. The fact  
that a unidimensional array has been chosen to store the solution simplifies the division of its  
range, making it possible to use the same procedure code in the sequential version as well as in  
the OpenMP implementation and in each subtask of the MPI implementation. In both parallel  
versions, because each part is independent among each other, it is not necessary to synchronize  
the execution of the tasks. The C++ code for this procedure is shown in (Listing 2). Its  
7
implementation may be found in file code/common/compute_mandelbrot_subset.cpp .  
In this case, parameters start and end mark the beginning and the ending of the corresponding  
part. This will allow using the procedure in each subtask in the MPI implementation. In the  
sequential version, the procedure is called with start=0 and end=x_resolution*y_resolution.  
When the procedure is used by the sequential and MPI variants the directives of OpenMP have  
not effect because the compiler flags for OpenMP are no used. Also, in this general procedure, all  
the other referenced variables (x_resolution, x_begin, y_begin, x_step, and y_step) are defined  
as global constants because their values will not change while the execution of the programs. The  
8
full C++ implementation using OpenMP may be found in folder code/mandelbrot_openmp .  
5
6
7
8
2
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
17  
Listing 2. C++ procedure used to compute the Mandelbrot set in the sequential implementation  
as well as in the parallel ones.  
1
2
3
4
5
6
7
8
9
1
1
1
1
1
void compute_mandelbrot_subset(int* result, int iter_limit, int start, int end) {  
int i, j;  
complex<double> c, z;  
# pragma omp parallel shared(result, iter_limit, start, end) private(i, j, c, z)  
# pragma omp for schedule(runtime)  
complex<double> c, z;  
for (i = start; i < end; i++) {  
c = complex<double>(x_begin + (i % x_resolution) * x_step,  
y_begin + (i / x_resolution) * y_step );  
z = 0; j = 0;  
while (norm(z) <= 4 && j < iter_limit) { z = z*z + c; j++; }  
result[i-start] = j;  
0
1
2
3
4
}
}
In the case of MPI, partition of the loop has to be done by hand. That is, to follow the master-  
slave procedure [11]:  
1.  
Divide the range of the array into p parts, approximately of the same size, where p is the  
number of available processors.  
2
.
.
Compute the i-th part by using processor i.  
3
Group the result of each partial computation together into one array.  
In MPI, two variants may be considered to realize this procedure. One of the variants is to use  
MPI_Send and MPI_Recv functions to send and receive messages directly between the processors  
[12,13]. One of the processors, the master, distribute the tasks between the others and group  
the results together into one array. That processor also computes a part of the whole solution.  
The C++ code for this processor is shown in (Listing 3). The other processors, the slaves, only  
receive the indexes that define a part to be computed. After generated, they send the part to the  
master. The C++ code for these processors are shown in (Listing 4). In the two cases (master  
and slaves) part_width=result_size/processors_amount. The full C++ implementation using MPI  
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
18  
with  
MPI_Send  
and  
MPI_Recv  
functions  
may  
be  
found  
in  
folder  
9
code/mandelbrot_mpi_send_recv .  
The other variant in MPI is to use MPI_Gather function which allows gathering the partial  
computations of each slave into one array [12]. Its use, in this case, is very concise as can be  
seen in (Listing 5). After space has been reserved for arrays result and partial_result, only  
remains to compute the part in each processor including the masterand then gather this result  
by using MPI_gather function. In this case part_width=result_size/processors_amount and  
start=current_processor*part_width. The complete C++ implementation may be found in folder  
10  
code/mandelbrot_mpi_gather .  
Listing 3. C++ code executed by the master in one of the MPI implementation variants.  
1
2
3
4
5
6
7
8
9
1
1
1
1
// distribute tasks  
int start, end = 0;  
for (int i = 1; i < processors_amount; i++) {  
start = end; end += part_width;  
int message[2] = {start, end};  
MPI_Send(message, 2, MPI_INT, i, 0, MPI_COMM_WORLD);  
}
compute_mandelbrot_subset(current+end, iter_limit, end, result_size);  
// join pieces together  
for (int i = 1; i < processors_amount; i++) {  
MPI_Recv(current, part_width, MPI_INT, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);  
current += part_width;  
}
0
1
2
3
Listing 4. C++ code executed by the slaves in the MPI implementation.  
1
2
3
4
5
6
int message[2];  
MPI_Recv(message, 2, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);  
int* partial_result = new int[part_width];  
compute_mandelbrot_subset(partial_result, iter_limit, message[0], message[1]);  
MPI_Send(partial_result, part_width, MPI_INT, 0, 0, MPI_COMM_WORLD);  
delete[] partial_result;  
9
0
1
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
19  
Listing 5. C++ code for MPI using MPI_gather function.  
1
2
compute_mandelbrot_subset(partial_result, iter_limit, start, start+part_width);  
MPI_Gather(partial_result, part_width, MPI_INT, result, part_width, MPI_INT, 0, MPI_COMM_WORLD);  
RESULTS AND DISCUSSION  
Execution environment  
The trials consisted in running each implementation for iteration limits 100, 1000, 10000, and  
00000 and with one, two, four, and eight processors. In the case of OpenMP the schedule  
1
strategies static, dynamic, and guided were considered. The scheduling strategy and the number  
of processors were passed to the program through environment variables OMP_SCHEDULE and  
OMP_NUM_THREADS [12]. Each combination of program, iteration limit, and the number of  
processors was executed three times and the average of the results were studied by using high-  
performance computing metrics. Each program was executed in random order with respect to  
each other, each iteration limit and number of processors in a machine dedicated solely to the  
11  
running of the trials . Also, the considered resolution that is, the size of the computed set was  
024x1024.  
1
12  
The running machine was a computer model HP Notebook - 15-db0069wm . In Tables 1 and 2  
it is shown relevant information about the running machine and operating system as well as  
13  
programming and execution tools and libraries, respectively. In file data/info.txt may be found  
information that was automatically recorded at the beginning of the whole experiment by using  
14  
15  
program inxi in root mode .  
A Python 3 script was developed for the purpose of automatically recording the results to a csv  
16  
file called data/run_data.csv and plotting the data by using Python libraries pandas [14] and  
1
1
1
1
1
1
1
2
3
4
5
6
See Linux man page by using command man inxi.  
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
20  
17  
seaborn [14], respectively. These results may be found in folder data . The whole Python  
18  
program may be found in folder trials_runner .  
Table 1. Characteristics of the running machine and operating system.  
Model  
HP Notebook - 15-db0069wm  
AMD Ryzen™ 5 2500U Quad-Core  
8 GB DDR4-2400 SDRAM  
OpenSUSE Leap 15.1  
Microprocessor  
RAM  
Operating System  
Type  
64 bits  
Kernel  
4.12.14-lp151.28.40-default  
Table 2. Development and execution tools and libraries.  
Tools and libraries  
C++ compiler  
Name  
Version OpenSUSE package  
GNU Compiler for C/C++  
CMake  
7.5.0  
gcc 7-lp151.3.5  
Building system  
3.10.2  
cmake 3.10.2-lp151.4.1  
MPI development and  
Local Area Multicomputer  
(LAM)  
7
.1.4  
lam 7.1.4-lp151.2.38  
openmpi  
programming environment  
MPI library  
OpenMPI  
1.10.7  
8.2.1  
1
.10.7-lp151.11.4  
GNU Offloading and Multi  
Processing Runtime Library  
libgomp1  
.2.1+r264010-lp151.1.33  
OpenMP library  
8
1
1
7
8
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
21  
Execution time  
Although OpenMP and MPI provide specialized functions to measure the execution time of a  
program [12,15], the execution time was measured by using a method that is valid to all the  
considered implementations. The function clock_gettime and the clock  
19  
CLOCK_MONOTONIC_RAW were used to obtain a monotonic raw hardware-based real-time that  
cannot be disturbed by system calls. This allowed having a normalized and non-biased way of  
measuring time. Only the master and main thread execution time were measured in the case of  
MPI and OpenMP, respectively. Also, in all variants, only the code involved in computing the  
mandelbrot set was measured. That is, initialization and finalization code, including the allocation  
and deallocation of result array, was not measured. In (Listing 6) is shown the function used to  
20  
obtain the current time. This function may be found in file code/common/now.cpp .  
Listing 6. C++ function used to obtain current time.  
1
2
3
4
5
6
7
8
#include <ctime>  
#include <cstdlib>  
double now() {  
struct timespec tp;  
if (clock_gettime(CLOCK_MONOTONIC_RAW, &tp) != 0) exit(1);  
return tp.tv_sec + tp.tv_nsec / (double)1000000000;  
}
The obtained execution time is showed in Figure 2. The graphics show how MPI variants have the  
worst execution time while OpenMP implementation is best when using a dynamic schedule. Also,  
it is important to notice that the three OpenMP schedules variants behave with different  
performances. Moreover, in spite of the fact that both use the same basic strategy, OpenMP with  
a static schedule has better results than the MPI implementations in these trials. This may be due  
to the fact that each slave has to allocate memory to store the computed part and deallocate it  
at the end and later sent it to the master. This may cause an overhead that is not seen in the  
OpenMP variants. Finally, it is observed that MPI variant with MPI_Send and MPI_Recv functions  
obtained better results than the variant with MPI_gather function. This suggests that, in some  
cases, it is better to use low-level functions than high-level functions to build a concrete solution  
in order to manifest better performance.  
1
2
9
0
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
22  
Figure 2. Execution time in seconds. Lower is better.  
Speedup  
Speedup is a high-performance computing metric that gives an idea of how much the parallel  
execution time is better than the sequential execution time. The obtained value is better while  
closer to the number of available processors. The speedup for p processors is (Equation 6):  
푡(1)  
푆(푝) =  
(6)  
푡(ꢄ)  
Here t(1) is the sequential execution time and t(p) is the execution time when p processors are  
available in the considered parallel alternative [1618].  
The obtained results for speedup are shown in Figure 3. The graphics show in a better manner  
the performance difference between the variants. Also, it is noticed that the speedup for four and  
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
23  
eight processors do not come near to these values. This suggests that an increase of processors  
amount will not bring much more improvement to performance in the case of the considered  
resolution (1024x1024).  
Figure 3. Speedup. Higher is better.  
Parallel efficiency  
Parallel efficiency is a high-performance computing metric that gives an idea of how much the  
speedup is close to the number of available processors, that is, how well the parallel program had  
used the available computational resources (processors in this case). The best-case scenery is  
when the speedup equals the number of available processors, meaning that the parallel program  
had maximum exploitation of the available processing units.  
The parallel efficiency for p processors is (Equation 7) [1618]:  
ꢅ(ꢄ)  
퐸(푝) =  
(7)  
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
24  
The obtained results are shown in Figure 4. The graphics show the decrease of parallel efficiency  
with the increment of processors amount. The results are consistent in each iteration limit. This  
reaffirm the idea that an increase of processors amount will not bring better performance, which  
is more obvious in the case of MPI. In this case, the decrease in efficiency may be due to the fact  
that the resolution has been taken constant in these trials, and there will be a moment when the  
parts to compute become too small. This may bring as a consequence that little gain in  
performance is obtained by computing the parts in a parallel manner because the time that takes  
to transmit a message is almost the same as the time to compute apart.  
Figure 4. Parallel efficiency. Higher is better.  
CONCLUSIONS  
In the present work, a comparison of the parallel generation of Mandelbrot set by using OpenMP  
and MPI has been conducted. The trials were executed for different iteration limits, the number  
of processors, and C++ implementation variants. In this case, and in general, OpenMP obtained  
better performance results than the MPI implementations. It is worth to notice that, although the  
present work is a case study and for that reason, results should not be taken as conclusive, the  
conducted trials may contribute to further research and study. Also, running scripts, images, as  
well as C++ source code is provided to allow reproduction and enhancing of the experiments.  
Moreover, the current work may be used as a didactic example to the study of the performance  
of parallel programs.  
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
25  
REFERENCES  
1] R. Trobec, B. Slivnik, P. Bulić, and B. Robič, “Programming Multi-core and Shared Memory  
[
Multiprocessors Using OpenMP,” in Introduction to Parallel Computing: From Algorithms to  
Programming on State-of-the-Art Platforms, ser. Undergraduate Topics in Computer Science, R.  
Trobec, B. Slivnik, P. Bulić, and B. Robič, Eds. Cham: Springer International Publishing, 2018, pp.  
4786.  
[2] M. J. Quinn, “Shared-Memory Programming,” in Parallel Programming in C with MPI and  
OpenMP. McGraw-Hill Education, 2003, pp. 404435.  
[3] R. Trobec, B. Slivnik, P. Bulić, and B. Robič, “MPI Processes and Messaging,” in Introduction  
to Parallel Computing: From Algorithms to Programming on State-of-the-Art Platforms, ser.  
Undergraduate Topics in Computer Science, R. Trobec, B. Slivnik, P. Bulić, and B. Robič, Eds.  
Cham: Springer International Publishing, 2018, pp. 87132.  
[4] M. J. Quinn, “Message-Passing Programming,” in Parallel Programming in C with MPI and  
OpenMP. McGraw-Hill Education, 2003, pp. 93114.  
[5] P. Czarnul, “Generic Taxonomy of Parallel Computing Systems,” in Parallel Programming for  
Modern High Performance Computing Systems. Chapman & Hall/CRC, 2018, pp. 1112.  
[6] M. McCool, J. Reinders, and A. Robison, “Mandelbrot,” in Structured Parallel Programming:  
Patterns for Efficient Computation. Morgan Kaufmann, Jun. 2012, pp. 131143.  
[7] J. M. Stewart, “Two-Dimensional Graphics,” in Python for Scientists, 2nd ed. Cambridge  
University Press, 2017, pp. 82108.  
[8] I. Stewart and A. C. Clarke, “The Nature of Fractal Geometry,” in The Colours of Infinity:  
The Beauty and Power of Fractals. Clear Press Ltd, 2004, pp. 223.  
[9] M. Tracolli, “Parallel generation of a Mandelbrot set,” VIRT&L-COMM, Apr. 2016. [Online].  
Available: http://services.chm.unipg.it/ojs/index.php/virtlcomm/article/view/112  
[
10] John Burkardt, “MANDEBRTOT - ASCII Portable Pixel Map (PPM) Image of the Mandelbrot  
Set,” Mar. 2020. [Online]. Available:  
https://people.sc.fsu.edu/~jburkardt/cpp_src/mandelbrot_openmp/ mandelbrot_openmp.html  
[11] P. Czarnul, “Master-Slave,” in Parallel Programming for Modern High Performance Computing  
Systems. Chapman & Hall/CRC, 2018, pp. 3539.  
[12] ——, “Message Passing Interface (MPI),” in Parallel Programming for Modern High  
Performance Computing Systems. Chapman & Hall/CRC, 2018, pp. 74102.  
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935  
26  
[13] M. J. Quinn, “Floyd’s Algorithm,” in Parallel Programming in C with MPI and OpenMP.  
McGraw-Hill Education, 2003, pp. 137158.  
[14] W. McKinney, “Plotting and Visualization,” in Python for Data Analysis: Data Wrangling with  
Pandas, NumPy, and IPython, 2nd ed. O’Reilly Media, Inc., 2017, pp. 250–283.  
[15] P. Czarnul, “OpenMP,” in Parallel Programming for Modern High Performance Computing  
Systems. Chapman & Hall/CRC, 2018, pp. 102118.  
REVISTA INNOVACIÓN Y SOFTWARE  
VOL 1 Nº 2 Septiembre - Febrero 2020 ISSN Nº 2708-0935