18 research outputs found
Parallel source code transformation techniques using design patterns
Mención Internacional en el título de doctorIn recent years, the traditional approaches for improving performance, such as increasing
the clock frequency, has come to a dead-end. To tackle this issue, parallel architectures,
such as multi-/many-core processors, have been envisioned to increase
the performance by providing greater processing capabilities. However, programming
efficiently for this architectures demands big efforts in order to transform sequential
applications into parallel and to optimize such applications. Compared to
sequential programming, designing and implementing parallel applications for operating
on modern hardware poses a number of new challenges to developers such
as data races, deadlocks, load imbalance, etc.
To pave the way, parallel design patterns provide a way to encapsulate algorithmic
aspects, allowing users to implement robust, readable and portable solutions
with such high-level abstractions. Basically, these patterns instantiate parallelism
while hiding away the complexity of concurrency mechanisms, such as thread management,
synchronizations or data sharing. Nonetheless, frameworks following this
philosophy does not share the same interface and users require understanding different
libraries, and their capabilities, not only to decide which fits best for their
purposes but also to properly leverage them. Furthermore, in order to parallelize
these applications, it is necessary to analyze the sequential code in order to detect the
regions of code that can be parallelized that is a time consuming and complex task.
Additionally, different libraries targeted to specific devices provide some algorithms
implementations that are already parallel and highly-tuned. In these situations, it is
also necessary to analyze and determine which routine implementation is the most
suitable for a given problem.
To tackle these issues, this thesis aims at simplifying and minimizing the necessary
efforts to transform sequential applications into parallel. This way, resulting
codes will improve their performance by fully exploiting the available resources
while the development efforts will be considerably reduced. Basically, in this thesis,
we contribute with the following. First, we propose a technique to detect potential
parallel patterns in sequential code. Second, we provide a novel generic C++ interface
for parallel patterns which acts as a switch among existing frameworks. Third,
we implement a framework that is able to transform sequential code into parallel
using the proposed pattern discovery technique and pattern interface. Finally, we
propose mechanisms that are able to select the most suitable device and routine implementation
to solve a given problem based on previous performance information.
The evaluation demonstrates that using the proposed techniques can minimize the
refactoring and optimization time while improving the performance of the resulting
applications with respect to the original code.En los últimos años, las técnicas tradicionales para mejorar el rendimiento, como es
el caso del incremento de la frecuencia de reloj, han llegado a sus límites. Con el
fin de seguir mejorando el rendimiento, se han desarrollado las arquitecturas paralelas,
las cuales proporcionan un incremento del rendimiento al estar provistas de
mayores capacidades de procesamiento. Sin embargo, programar de forma eficiente
para estas arquitecturas requieren de grandes esfuerzos por parte de los desarrolladores.
Comparado con la programación secuencial, diseñar e implementar aplicaciones
paralelas enfocadas a trabajar en estas arquitecturas presentan una gran
cantidad de dificultades como son las condiciones de carrera, los deadlocks o el incorrecto
balanceo de la carga.
En este sentido, los patrones paralelos son una forma de encapsular aspectos
algorítmicos de las aplicaciones permitiendo el desarrollo de soluciones robustas,
portables y legibles gracias a las abstracciones de alto nivel. En general, estos patrones
son capaces de proporcionar el paralelismo a la vez que ocultan las complejidades
derivadas de los mecanismos de control de concurrencia necesarios como el
manejo de los hilos, las sincronizaciones o la compartición de datos. No obstante,
los diferentes frameworks que siguen esta filosofía no comparten una única interfaz
lo que conlleva que los usuarios deban conocer múltiples bibliotecas y sus capacidades,
con el fin de decidir cuál de ellos es mejor para una situación concreta y
como usarlos de forma eficiente. Además, con el fin de paralelizar aplicaciones existentes,
es necesario analizar e identificar las regiones del código que pueden ser paralelizadas,
lo cual es una tarea ardua y compleja. Además, algunos algoritmos ya se
encuentran implementados en paralelo y optimizados para arquitecturas concretas
en diversas bibliotecas. Esto da lugar a que sea necesario analizar y determinar que
implementación concreta es la más adecuada para solucionar un problema dado.
Para paliar estas situaciones, está tesis busca simplificar y minimizar el esfuerzo
necesario para transformar aplicaciones secuenciales en paralelas. De esta forma,
los códigos resultantes serán capaces de explotar los recursos disponibles a la vez
que se reduce considerablemente el esfuerzo de desarrollo necesario. En general,
esta tesis contribuye con lo siguiente. En primer lugar, se propone una técnica de
detección de patrones paralelos en códigos secuenciales. En segundo lugar, se presenta
una interfaz genérica de patrones paralelos para C++ que permite seleccionar
la implementación de dichos patrones proporcionada por frameworks ya existentes.
En tercer lugar, se introduce un framework de transformación de código secuencial
a paralelo que hace uso de las técnicas de detección de patrones y la interfaz
presentadas. Finalmente, se proponen mecanismos capaces de seleccionar la implementación
más adecuada para solucionar un problema concreto basándose en el
rendimiento obtenido en ejecuciones previas. Gracias a la evaluación realizada se ha
podido demostrar que uso de las técnicas presentadas pueden minimizar el tiempo
necesario para transformar y optimizar el código a la vez que mejora el rendimiento
de las aplicaciones transformadas.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: David Expósito Singh.- Secretario: Rafael Asenjo Plaza.- Vocal: Marco Aldinucc
Simulación epidemiológica y visualización
La
epidemiologia
ha
ganado
gran
importancia
en
la
sociedad
debido
principalmente
a
la
propagación
del
virus
de
la
gripe
A
en
el
año
2009.
Esta
epidemia
tuvo
gran
impacto
social
debido
a
su
rápida
expansión
pasando
a
ser
rápidamente
una
pandemia
mundial
y
al
temor
resultante
en
la
sociedad.
Esto
provoco
una
gran
alarma
social
y
una
falta
de
medios
para
controlar
la
enfermedad
por
parte
de
las
autoridades
sanitarias.
Por
este
motivo,
se
han
desarrollado
diversas
herramientas
que
permitan
realizar
previsiones
sobre
la
propagación
de
enfermedades
con
el
fin
de
adoptar
medidas
capaces
de
reducir
el
impacto
sobre
la
sociedad.
Dentro
de
estas
herramientas
se
encuentran
una
gran
cantidad
de
simuladores
del
virus
de
la
gripe.
Estos
simuladores
mediante
diversos
modelos
matemáticos
y
sistemas
de
representación
de
las
poblaciones
son
capaces
de
generar
una
previsión
sobre
la
propagación
de
la
enfermedad.
Los
simuladores
epidemiológicos
desarrollados
se
dividen
principalmente
en
dos
grupos
según
el
modelo
de
enfermedad
que
implementen.
Según
el
modelo
de
enfermedad
los
simuladores
se
clasifican
en
estocásticos,
si
son
modelos
probabilísticos,
o
deterministas,
si
son
modelos
deterministas
que
generalmente
se
basan
en
ecuaciones
diferenciales.
En
este
trabajo
se
parte
del
simulador
EpiGraph
el
cual
implementa
un
modelo
estocástico,
capaz
de
simular
el
comportamiento
de
la
enfermedad
de
la
gripe
,
y
un
sistema
de
representación
de
la
población
mediante
grafos
basados
en
las
redes
sociales.
Sin
embargo,
las
previsiones
resultantes
del
simulador
son
difíciles
de
analizar
a
simple
vista
y
el
sistema
que
genera
modelos
de
la
población
en
grafos
mediante
muestras
de
los
grafos
de
las
redes
sociales
puede
mejorarse
notablemente.
Por
este
motivo,
en
este
proyecto,
se
buscara,
por
un
lado,
diseñar
e
implementar
un
algoritmo
de
muestreo
de
grafos
para
generar
los
modelos
de
población
utilizados
en
el
simulador
capaz
de
mantener
las
propiedades
del
grafo
original
y,
por
otro
lado,
se
buscara
diseñar
una
aplicación
que
sea
capaz
de
mostrar
los
resultados
de
las
previsiones
permitiendo
un
análisis
completo
con
la
menor
dificultar
posible.Epidemiology
has
gained
great
importance
in
society
mainly
due
to
the
spread
of
influenza
virus
in
2009.
This
epidemic
had
a
great
social
impact
because
of
its
fast
expansion
becoming
a
global
pandemic
in
a
short
time
and
the
resulting
fear
in
society.
This
provoked
a
great
social
alarm
and
uncertainty
emerged
in
health
authorities
on
how
to
control
the
disease.
For
this
reason,
has
developed
different
tools
to
make
predictions
about
the
spread
of
disease
in
order
to
take
measures
that
may
reduce
the
impact
on
society.
Among
these
tools
are
a
large
amount
of
simulators
of
influenza
virus.
These
simulators
are
able
to
generate
a
forecast
of
the
spread
of
the
disease
using
mathematical
models
and
populations
representing
models.
Epidemiological
simulators
developed
are
mainly
divided
into
two
groups
according
to
the
implemented
disease
model.
According
to
the
disease
model,
simulators
are
classified
as
stochastic,
if
implements
probabilistic
models,
or
deterministic,
if
implements
deterministic
models
that
are
usually
based
on
differential
equations.
This
project
starts
from
the
epigraph
simulator
which
implements
a
stochastic
model,
able
to
simulate
the
behaviour
of
influenza
disease,
and
a
system
of
representation
of
the
population
by
means
of
graphs
based
on
social
networks.
However,
the
resulting
forecasts
are
difficult
to
analyse
with
the
naked
eye
and
the
system
that
generates
the
population
models
on
graphs
using
sample
graphs
of
social
networks
can
be
significantly
improved.
For
this
reason,
this
project
will
seek,
on
the
one
hand,
to
design
and
implement
a
graph-‐sampling
algorithm
to
generate
population
models
used
in
the
simulator
to
keep
the
properties
of
the
original
graph
and
on
the
other
hand,
design
an
application
able
to
display
the
results
of
the
forecasts
allowing
a
complete
analysis
easily.Ingeniería Informátic
A Generic Parallel Pattern Interface for Stream and Data Processing
Current parallel programming frameworks aid developers to a great extent in implementing applications that exploit parallel hardware resources. Nevertheless, developers require additional expertise to properly use and tune them to operate efficiently on specific parallel platforms. On the other hand, porting applications between different parallel programming models and platforms is not straightforward and demands considerable efforts and specific knowledge. Apart from that, the lack of high-level parallel pattern abstractions, in those frameworks, further increases the complexity in developing parallel applications. To pave the way in this direction, this paper proposes GRPPI, a generic and reusable parallel pattern interface for both stream processing and data-intensive C++ applications. GRPPI accommodates a layer between developers and existing parallel programming frameworks targeting multi-core processors, such as C++ threads, OpenMP and Intel TBB, and accelerators, as CUDA Thrust. Furthermore, thanks to its high-level C++ application programming interface and pattern composability features, GRPPI allows users to easily expose parallelism via standalone patterns or patterns compositions matching in sequential applications. We evaluate this interface using an image processing use case and demonstrate its benefits from the usability, flexibility, and performance points of view. Furthermore, we analyze the impact of using stream and data pattern compositions on CPUs, GPUs and heterogeneous configurations.This work has been partially supported by the EU project ICT 644235 “REPHRASE: REfactoring Parallel Heterogeneous Resource-aware Applications” and the Spanish “Ministerio de Economía y Competitividad” under the grant TIN2016-79673-P “Towards Unification of HPC and Big Data Paradigms.
Exploiting stream parallelism of MRI reconstruction using GrPPI over multiple back-ends
Proceeding of: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Larnaca, Cyprus, 14-17 May 2019In recent years, on-line processing of data streams has been established as a major computing paradigm. This is due mainly to two reasons: first, more and more data are generated in near real-time that need to be processed; the second reason is given by the need of efficient parallel applications. However, the above-mentioned areas expose a tough challenge over traditional data-analysis techniques, which have been forced to evolve to a stream perspective. In this work we present an comparative study of a stream-aware multi-staged application, which has been implemented using GrPPI, a generic and reusable parallel pattern interface for C++ applications. We demonstrate the benefits of using this interface in terms of programability, performance, and scalability.This work was supported by the EU project “ASPIDE: Exascale Programing Models for Extreme Data Processing” under grant 80109
Towards Automatic Parallelization of Stream Processing Applications
Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover optimization opportunities are crucial to relieve the burden to developers. In this paper, we present a compile-time framework to (semi) automatically find parallel patterns (Pipeline and Farm) and transform sequential streaming applications into parallel using GrPPI, a generic parallel pattern interface. This framework uses a novel pipeline stage-balancing technique which provides the code generator module with the necessary information to produce balanced pipelines. The evaluation, using a synthetic video benchmark and a real-world computer vision application, demonstrates that the presented framework is capable of producing parallel and optimized versions of the application. A comparison study under several thread-core oversubscribed conditions reveals that the framework can bring comparable performance results with respect to the Intel TBB programming framework
Paving the way towards high-level parallel pattern interfaces for data stream processing
The emergence of the Internet of Things (IoT) data stream applications has posed a number of new challenges to existing infrastructures, processing engines, and programming models. In this sense, high-level interfaces, encapsulating algorithmic aspects in pattern-based constructions, have considerably reduced the development and parallelization efforts of this type of applications. An example of parallel pattern interface is GrPPI, a C++ generic high-level library that acts as a layer between developers and existing parallel programming frameworks, such as C++ threads, OpenMP and Intel TBB. In this paper, we complement the basic patterns supported by GrPPI with the new stream operators Split-Join and Window, and the advanced parallel patterns Stream-Pool, Windowed-Farm and Stream-Iterator for the aforementioned back ends. Thanks to these new stream operators, complex compositions among streaming patterns can be expressed. On the other hand, the collection of advanced patterns allows users to tackle some domain-specific applications, ranging from the evolutionary to the real-time computing areas, where compositions of basic patterns are not capable of fully mimicking the algorithmic behavior of their original sequential codes. The experimental evaluation of the new advanced patterns and the stream operators on a set of domain-specific use-cases, using different back ends and pattern-specific parameters, reports considerable performance gains with respect to the sequential versions. Additionally, we demonstrate the benefits of the GrPPI pattern interface from the usability, flexibility and readability points of view.This work was partially supported by the EU project ICT 644235 “RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications” and the project TIN2013-41350-P “Scalable Data Management Techniques for High-End Computing Systems” from the Ministerio de Economía y Competitividad, Spai
Challenging the abstraction penalty in parallel patterns libraries
In the last years, pattern-based programming has been recognized as a good practice for efficiently exploiting parallel hardware resources. Following this approach, multiple libraries have been designed for providing such high-level abstractions to ease the parallel programming. However, those libraries do not share a common interface. To pave the way, GrPPI has been designed for providing an intermediate abstraction layer between application developers and existing parallel programming frameworks like OpenMP, Intel TBB or ISO C++ threads. On the other hand, FastFlow has been adopted as an efficient object-based programming framework that may benefit from being supported as an additional GrPPI backend. However, the object-based approach presents some major challenges to be incorporated under the GrPPI type safe functional programming style. In this paper, we present the integration of FastFlow as a new GrPPI backend to demonstrate that structured parallel programming frameworks perfectly fit the GrPPI design. Additionally, we also demonstrate that GrPPI does not incur in additional overheads for providing its abstraction layer, and we study the programmability in terms of lines of code and cyclomatic complexity. In general, the presented work acts as reciprocal validation of both FastFlow (as an efficient, native structured parallel programming framework) and GrPPI (as an efficient abstraction layer on top of existing parallel programming frameworks).This work has been partially supported by the European Commission EU H2020-ICT-2014-1 Project RePhrase (No. 644235) and by the Spanish Ministry of Economy and Competitiveness through TIN2016-79637-P “Towards Unification of HPC and Big Data Paradigms”
Detecting semantic violations of lock-free data structures through C++ contracts
The use of synchronization mechanisms in multithreaded applications is essential on shared-memory multi-core architectures. However, debugging parallel applications to avoid potential failures, such as data races or deadlocks, can be challenging. Race detectors are key to spot such concurrency bugs; nevertheless, if lock-free data structures are used, these may emit a significant number of false positives. In this paper, we present a framework for semantic violation detection of lock-free data structures which makes use of contracts, a novel feature of the upcoming C++20, and a customized version of the ThreadSanitizer race detector. We evaluate the detection accuracy of the framework in terms of false positives and false negatives leveraging some synthetic benchmarks which make use of the SPSC and MPMC lock-free queue structures from the Boost C++ library. Thanks to this framework, we are able to check the correct use of lock-free data structures, thus reducing the number of false positives.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness through Project Grant TIN2016-79637-P (BigHPC - Towards Unification of HPC and Big Data Paradigms) and the European Commission through Grant No. 801091 (ASPIDE - Exascale programmIng models for extreme data processing)
An adaptive offline implementation selector for heterogeneous parallel platforms
Heterogeneous Parallel Platforms, Comprising Multiple Processing Units And Architectures, Have Become A Cornerstone In Improving The Overall Performance And Energy Efficiency Of Scientific And Engineering Applications. Nevertheless, Taking Full Advantage Of Their Resources Comes Along With A Variety Of Difficulties: Developers Require Technical Expertise In Using Different Parallel Programming Frameworks And Previous Knowledge About The Algorithms Used Underneath By The Application. To Alleviate This Burden, We Present An Adaptive Offline Implementation Selector That Allows Users To Better Exploit Resources Provided By Heterogeneous Platforms. Specifically, This Framework Selects, At Compile Time, The Tuple Device-Implementation That Delivers The Best Performance On A Given Platform. The User Interface Of The Framework Leverages Two C+
+
Language Features: Attributes And Concepts. To Evaluate The Benefits Of This Framework, We Analyse The Global Performance And Convergence Of The Selector Using Two Different Use Cases. The Experimental Results Demonstrate That The Proposed Framework Allows Users Enhancing Performance While Minimizing Efforts To Tune Applications Targeted To Heterogeneous Platforms. Furthermore, We Also Demonstrate That Our Framework Delivers Comparable Performance Figures With Respect To Other Approaches.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been partially supported by the Spanish ‘Ministerio de Economía y Competitividad’ under the project grant TIN2016-79637-P ‘Towards Unification of High Performance Computing (HPC) and Big Data Paradigms’ and the EU Projects ICT 644235 ‘RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications’ and the FP7 609666 ‘Repara: Reengineering and Enabling Performance And poweR of Applications’
Towards automatic parallelization of stream processing applications
Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover optimization opportunities are crucial to relieve the burden to developers. In this paper, we present a compile-time framework to (semi) automatically find parallel patterns (Pipeline and Farm) and transform sequential streaming applications into parallel using GrPPI, a generic parallel pattern interface. This framework uses a novel pipeline stage-balancing technique which provides the code generator module with the necessary information to produce balanced pipelines. The evaluation, using a synthetic video benchmark and a real-world computer vision application, demonstrates that the presented framework is capable of producing parallel and optimized versions of the application. A comparison study under several thread-core oversubscribed conditions reveals that the framework can bring comparable performance results with respect to the Intel TBB programming framework.This work was supported in part by the Spanish Ministerio de Economía y Competitividad through the Project Toward Uni cation of HPC
and Big Data Paradigms under Grant TIN2016-79637-P and in part by the EU Project RePhrase: REfactoring Parallel Heterogeneous
Resource-Aware Applications under Grant ICT 644235