9 research outputs found
Model-driven development of data intensive applications over cloud resources
The proliferation of sensors over the last years has generated large amounts of raw data, forming data streams that need to be processed. In many cases, cloud resources are used for such processing, exploiting their flexibility, but these sensor streaming applications often need to support operational and control actions that have real-time and low-latency requirements that go beyond the cost effective and flexible solutions supported by existing cloud frameworks, such as Apache Kafka, Apache Spark Streaming, or Map-Reduce Streams. In this paper, we describe a model-driven and stepwise refinement methodological approach for streaming applications executed over clouds. The central role is assigned to a set of Petri Net models for specifying functional and non-functional requirements. They support model reuse, and a way to combine formal analysis, simulation, and approximate computation of minimal and maximal boundaries of non-functional requirements when the problem is either mathematically or computationally intractable. We show how our proposal can assist developers in their design and implementation decisions from a performance perspective. Our methodology allows to conduct performance analysis: The methodology is intended for all the engineering process stages, and we can (i) analyse how it can be mapped onto cloud resources, and (ii) obtain key performance indicators, including throughput or economic cost, so that developers are assisted in their development tasks and in their decision taking. In order to illustrate our approach, we make use of the pipelined wavefront array
Model-driven development of data intensive applications over cloud resources
The proliferation of sensors over the last years has generated large amounts
of raw data, forming data streams that need to be processed. In many cases,
cloud resources are used for such processing, exploiting their flexibility, but
these sensor streaming applications often need to support operational and
control actions that have real-time and low-latency requirements that go beyond
the cost effective and flexible solutions supported by existing cloud
frameworks, such as Apache Kafka, Apache Spark Streaming, or Map-Reduce
Streams. In this paper, we describe a model-driven and stepwise refinement
methodological approach for streaming applications executed over clouds. The
central role is assigned to a set of Petri Net models for specifying functional
and non-functional requirements. They support model reuse, and a way to combine
formal analysis, simulation, and approximate computation of minimal and maximal
boundaries of non-functional requirements when the problem is either
mathematically or computationally intractable. We show how our proposal can
assist developers in their design and implementation decisions from a
performance perspective. Our methodology allows to conduct performance
analysis: The methodology is intended for all the engineering process stages,
and we can (i) analyse how it can be mapped onto cloud resources, and (ii)
obtain key performance indicators, including throughput or economic cost, so
that developers are assisted in their development tasks and in their decision
taking. In order to illustrate our approach, we make use of the pipelined
wavefront array.Comment: Preprin
Autotuning wavefront patterns for heterogeneous architectures
Manual tuning of applications for heterogeneous parallel systems is tedious and complex.
Optimizations are often not portable, and the whole process must be repeated when moving
to a new system, or sometimes even to a different problem size.
Pattern based parallel programming models were originally designed to provide programmers
with an abstract layer, hiding tedious parallel boilerplate code, and allowing a focus on
only application specific issues. However, the constrained algorithmic model associated with
each pattern also enables the creation of pattern-specific optimization strategies. These can
capture more complex variations than would be accessible by analysis of equivalent unstructured
source code. These variations create complex optimization spaces. Machine learning
offers well established techniques for exploring such spaces.
In this thesis we use machine learning to create autotuning strategies for heterogeneous
parallel implementations of applications which follow the wavefront pattern. In a wavefront,
computation starts from one corner of the problem grid and proceeds diagonally like a wave
to the opposite corner in either two or three dimensions. Our framework partitions and
optimizes the work created by these applications across systems comprising multicore CPUs
and multiple GPU accelerators. The tuning opportunities for a wavefront include controlling
the amount of computation to be offloaded onto GPU accelerators, choosing the number of
CPU and GPU threads to process tasks, tiling for both CPU and GPU memory structures,
and trading redundant halo computation against communication for multiple GPUs.
Our exhaustive search of the problem space shows that these parameters are very sensitive
to the combination of architecture, wavefront instance and problem size. We design and
investigate a family of autotuning strategies, targeting single and multiple CPU + GPU
systems, and both two and three dimensional wavefront instances. These yield an average
of 87% of the performance found by offline exhaustive search, with up to 99% in some cases
Pattern Operators for Grid Environments
The definition and programming of distributed applications has become a major research
issue due to the increasing availability of (large scale) distributed platforms
and the requirements posed by the economical globalization. However, such a task
requires a huge effort due to the complexity of the distributed environments: large
amount of users may communicate and share information across different authority
domains; moreover, the “execution environment” or “computations” are dynamic
since the number of users and the computational infrastructure change in time. Grid
environments, in particular, promise to be an answer to deal with such complexity, by
providing high performance execution support to large amount of users, and resource
sharing across different organizations. Nevertheless, programming in Grid environments
is still a difficult task. There is a lack of high level programming paradigms
and support tools that may guide the application developer and allow reusability of
state-of-the-art solutions.
Specifically, the main goal of the work presented in this thesis is to contribute to
the simplification of the development cycle of applications for Grid environments by
bringing structure and flexibility to three stages of that cycle through a commonmodel.
The stages are: the design phase, the execution phase, and the reconfiguration phase.
The common model is based on the manipulation of patterns through pattern operators,
and the division of both patterns and operators into two categories, namely
structural and behavioural. Moreover, both structural and behavioural patterns are
first class entities at each of the aforesaid stages. At the design phase, patterns can
be manipulated like other first class entities such as components. This allows a more
structured way to build applications by reusing and composing state-of-the-art patterns.
At the execution phase, patterns are units of execution control: it is possible, for
example, to start or stop and to resume the execution of a pattern as a single entity. At
the reconfiguration phase, patterns can also be manipulated as single entities with the
additional advantage that it is possible to perform a structural reconfiguration while
keeping some of the behavioural constraints, and vice-versa. For example, it is possible
to replace a behavioural pattern, which was applied to some structural pattern,
with another behavioural pattern.
In this thesis, besides the proposal of the methodology for distributed application
development, as sketched above, a definition of a relevant set of pattern operators
was made. The methodology and the expressivity of the pattern operators were assessed
through the development of several representative distributed applications. To
support this validation, a prototype was designed and implemented, encompassing
some relevant patterns and a significant part of the patterns operators defined. This
prototype was based in the Triana environment; Triana supports the development and
deployment of distributed applications in the Grid through a dataflow-based programming
model. Additionally, this thesis also presents the analysis of a mapping of some
operators for execution control onto the Distributed Resource Management Application
API (DRMAA).
This assessment confirmed the suitability of the proposed model, as well as the
generality and flexibility of the defined pattern operatorsDepartamento de Informática and Faculdade de Ciências e Tecnologia of the Universidade
Nova de Lisboa;
Centro de Informática e Tecnologias da Informação of the FCT/UNL;
Reitoria da Universidade Nova de Lisboa;
Distributed Collaborative Computing Group, Cardiff University, United Kingdom;
Fundação para a Ciência e Tecnologia;
Instituto de Cooperação Científica e Tecnológica Internacional;
French Embassy in Portugal;
European Union Commission through the Agentcities.NET and Coordina projects;
and the European Science Foundation, EURESCO
On the performance characterization and evaluation of RNA structure prediction algorithms for high performance systems
Ph.DDOCTOR OF PHILOSOPH
Pattern operators for grid
The definition and programming of distributed applications has become a major research
issue due to the increasing availability of (large scale) distributed platforms
and the requirements posed by the economical globalization. However, such a task
requires a huge effort due to the complexity of the distributed environments: large
amount of users may communicate and share information across different authority
domains; moreover, the “execution environment” or “computations” are dynamic
since the number of users and the computational infrastructure change in time. Grid
environments, in particular, promise to be an answer to deal with such complexity, by
providing high performance execution support to large amount of users, and resource
sharing across different organizations. Nevertheless, programming in Grid environments
is still a difficult task. There is a lack of high level programming paradigms
and support tools that may guide the application developer and allow reusability of
state-of-the-art solutions.
Specifically, the main goal of the work presented in this thesis is to contribute to
the simplification of the development cycle of applications for Grid environments by
bringing structure and flexibility to three stages of that cycle through a commonmodel.
The stages are: the design phase, the execution phase, and the reconfiguration phase.
The common model is based on the manipulation of patterns through pattern operators,
and the division of both patterns and operators into two categories, namely
structural and behavioural. Moreover, both structural and behavioural patterns are
first class entities at each of the aforesaid stages. At the design phase, patterns can
be manipulated like other first class entities such as components. This allows a more
structured way to build applications by reusing and composing state-of-the-art patterns.
At the execution phase, patterns are units of execution control: it is possible, for
example, to start or stop and to resume the execution of a pattern as a single entity. At
the reconfiguration phase, patterns can also be manipulated as single entities with the
additional advantage that it is possible to perform a structural reconfiguration while
keeping some of the behavioural constraints, and vice-versa. For example, it is possible
to replace a behavioural pattern, which was applied to some structural pattern,
with another behavioural pattern.
In this thesis, besides the proposal of the methodology for distributed application
development, as sketched above, a definition of a relevant set of pattern operators
was made. The methodology and the expressivity of the pattern operators were assessed
through the development of several representative distributed applications. To
support this validation, a prototype was designed and implemented, encompassing
some relevant patterns and a significant part of the patterns operators defined. This
prototype was based in the Triana environment; Triana supports the development and
deployment of distributed applications in the Grid through a dataflow-based programming
model. Additionally, this thesis also presents the analysis of a mapping of some
operators for execution control onto the Distributed Resource Management Application
API (DRMAA).
This assessment confirmed the suitability of the proposed model, as well as the
generality and flexibility of the defined pattern operatorsDepartamento de Informática and Faculdade de Ciências e Tecnologia of the Universidade
Nova de Lisboa;
Centro de Informática e Tecnologias da Informação of the FCT/UNL;
Reitoria da Universidade Nova de Lisboa;
Distributed Collaborative Computing Group, Cardiff University, United Kingdom;
Fundação para a Ciência e Tecnologia;
Instituto de Cooperação Científica e Tecnológica Internacional;
French Embassy in Portugal;
European Union Commission through the Agentcities.NET and Coordina projects;
and the European Science Foundation, EURESCO
Generating parallel programs from the wavefront design pattern
Object-oriented programming, design patterns, and frameworks are common techniques that have been used to reduce the complexity of sequential programming. We have applied these techniques to the more difficult domain of parallel programming. This paper describes CO 2P 3S, a pattern-based parallel programming system that generates parallel programs from parallel design patterns. We demonstrate CO 2P 3S by applying a new design pattern called the Wavefront pattern to three problems. We show that it is quick and easy to use CO 2P 3S to generate structurally correct parallel programs with good speed-ups on shared-memory computers. 1