90 research outputs found
SPar: A DSL for High-Level and Productive Stream Parallelism
This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar's performance and expressiveness
Finding parallel patterns through static analysis in C++ applications
Since The 'Free Lunch' Of Processor Performance Is Over, Parallelism Has Become The New Trend In Hardware And Architecture Design. However, Parallel Resources Deployed In Data Centers Are Underused In Many Cases, Given That Sequential Programming Is Still Deeply Rooted In Current Software Development. To Address This Problem, New Methodologies And Techniques For Parallel Programming Have Been Progressively Developed. For Instance, Parallel Frameworks, Offering Programming Patterns, Allow Expressing Concurrency In Applications To Better Exploit Parallel Hardware. Nevertheless, A Large Portion Of Production Software, From A Broad Range Of Scientific And Industrial Areas, Is Still Developed Sequentially. Considering That These Software Modules Contain Thousands, Or Even Millions, Of Lines Of Code, An Extremely Large Amount Of Effort Is Needed To Identify Parallel Regions. To Pave The Way In This Area, This Paper Presents Parallel Pattern Analyzer Tool, A Software Component That Aids The Discovery And Annotation Of Parallel Patterns In Source Codes. This Tool Simplifies The Transformation Of Sequential Source Code To Parallel. Specifically, We Provide Support For Identifying Map, Farm, And Pipeline Parallel Patterns And Evaluate The Quality Of The Detection For A Set Of Different C++ Applications.This work was partially supported by the EU Projects ICT 644235 “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications” and the FP7 609666 “Repara: Reengineering and Enabling Performance and Power of Application
Parallel source code transformation techniques using design patterns
Mención Internacional en el título de doctorIn recent years, the traditional approaches for improving performance, such as increasing
the clock frequency, has come to a dead-end. To tackle this issue, parallel architectures,
such as multi-/many-core processors, have been envisioned to increase
the performance by providing greater processing capabilities. However, programming
efficiently for this architectures demands big efforts in order to transform sequential
applications into parallel and to optimize such applications. Compared to
sequential programming, designing and implementing parallel applications for operating
on modern hardware poses a number of new challenges to developers such
as data races, deadlocks, load imbalance, etc.
To pave the way, parallel design patterns provide a way to encapsulate algorithmic
aspects, allowing users to implement robust, readable and portable solutions
with such high-level abstractions. Basically, these patterns instantiate parallelism
while hiding away the complexity of concurrency mechanisms, such as thread management,
synchronizations or data sharing. Nonetheless, frameworks following this
philosophy does not share the same interface and users require understanding different
libraries, and their capabilities, not only to decide which fits best for their
purposes but also to properly leverage them. Furthermore, in order to parallelize
these applications, it is necessary to analyze the sequential code in order to detect the
regions of code that can be parallelized that is a time consuming and complex task.
Additionally, different libraries targeted to specific devices provide some algorithms
implementations that are already parallel and highly-tuned. In these situations, it is
also necessary to analyze and determine which routine implementation is the most
suitable for a given problem.
To tackle these issues, this thesis aims at simplifying and minimizing the necessary
efforts to transform sequential applications into parallel. This way, resulting
codes will improve their performance by fully exploiting the available resources
while the development efforts will be considerably reduced. Basically, in this thesis,
we contribute with the following. First, we propose a technique to detect potential
parallel patterns in sequential code. Second, we provide a novel generic C++ interface
for parallel patterns which acts as a switch among existing frameworks. Third,
we implement a framework that is able to transform sequential code into parallel
using the proposed pattern discovery technique and pattern interface. Finally, we
propose mechanisms that are able to select the most suitable device and routine implementation
to solve a given problem based on previous performance information.
The evaluation demonstrates that using the proposed techniques can minimize the
refactoring and optimization time while improving the performance of the resulting
applications with respect to the original code.En los últimos años, las técnicas tradicionales para mejorar el rendimiento, como es
el caso del incremento de la frecuencia de reloj, han llegado a sus límites. Con el
fin de seguir mejorando el rendimiento, se han desarrollado las arquitecturas paralelas,
las cuales proporcionan un incremento del rendimiento al estar provistas de
mayores capacidades de procesamiento. Sin embargo, programar de forma eficiente
para estas arquitecturas requieren de grandes esfuerzos por parte de los desarrolladores.
Comparado con la programación secuencial, diseñar e implementar aplicaciones
paralelas enfocadas a trabajar en estas arquitecturas presentan una gran
cantidad de dificultades como son las condiciones de carrera, los deadlocks o el incorrecto
balanceo de la carga.
En este sentido, los patrones paralelos son una forma de encapsular aspectos
algorítmicos de las aplicaciones permitiendo el desarrollo de soluciones robustas,
portables y legibles gracias a las abstracciones de alto nivel. En general, estos patrones
son capaces de proporcionar el paralelismo a la vez que ocultan las complejidades
derivadas de los mecanismos de control de concurrencia necesarios como el
manejo de los hilos, las sincronizaciones o la compartición de datos. No obstante,
los diferentes frameworks que siguen esta filosofía no comparten una única interfaz
lo que conlleva que los usuarios deban conocer múltiples bibliotecas y sus capacidades,
con el fin de decidir cuál de ellos es mejor para una situación concreta y
como usarlos de forma eficiente. Además, con el fin de paralelizar aplicaciones existentes,
es necesario analizar e identificar las regiones del código que pueden ser paralelizadas,
lo cual es una tarea ardua y compleja. Además, algunos algoritmos ya se
encuentran implementados en paralelo y optimizados para arquitecturas concretas
en diversas bibliotecas. Esto da lugar a que sea necesario analizar y determinar que
implementación concreta es la más adecuada para solucionar un problema dado.
Para paliar estas situaciones, está tesis busca simplificar y minimizar el esfuerzo
necesario para transformar aplicaciones secuenciales en paralelas. De esta forma,
los códigos resultantes serán capaces de explotar los recursos disponibles a la vez
que se reduce considerablemente el esfuerzo de desarrollo necesario. En general,
esta tesis contribuye con lo siguiente. En primer lugar, se propone una técnica de
detección de patrones paralelos en códigos secuenciales. En segundo lugar, se presenta
una interfaz genérica de patrones paralelos para C++ que permite seleccionar
la implementación de dichos patrones proporcionada por frameworks ya existentes.
En tercer lugar, se introduce un framework de transformación de código secuencial
a paralelo que hace uso de las técnicas de detección de patrones y la interfaz
presentadas. Finalmente, se proponen mecanismos capaces de seleccionar la implementación
más adecuada para solucionar un problema concreto basándose en el
rendimiento obtenido en ejecuciones previas. Gracias a la evaluación realizada se ha
podido demostrar que uso de las técnicas presentadas pueden minimizar el tiempo
necesario para transformar y optimizar el código a la vez que mejora el rendimiento
de las aplicaciones transformadas.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: David Expósito Singh.- Secretario: Rafael Asenjo Plaza.- Vocal: Marco Aldinucc
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The PhD Symposium was a very good opportunity for the young researchers to share information and knowledge, to
present their current research, and to discuss topics with other students in order to look for synergies and common research
topics. The idea was very successful and the assessment made by the PhD Student was very good. It also helped to
achieve one of the major goals of the NESUS Action: to establish an open European research network targeting sustainable
solutions for ultrascale computing aiming at cross fertilization among HPC, large scale distributed systems, and big
data management, training, contributing to glue disparate researchers working across different areas and provide a meeting
ground for researchers in these separate areas to exchange ideas, to identify synergies, and to pursue common activities in
research topics such as sustainable software solutions (applications and system software stack), data management, energy
efficiency, and resilience.European Cooperation in Science and Technology. COS
PiCo: A Domain-Specific Language for Data Analytics Pipelines
In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks.
From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics.
The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.
Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world
Transactional memory on heterogeneous architectures
Tesis Leida el 9 de Marzo de 2018.Si observamos las necesidades computacionales de hoy, y tratamos de predecir
las necesidades del mañana, podemos concluir que el procesamiento heterogéneo
estará presente en muchos dispositivos y aplicaciones.
El motivo es lógico: algoritmos diferentes y datos de naturaleza diferente encajan mejor
en unos dispositivos de cómputo que en otros. Pongamos como ejemplo una
tecnología de vanguardia como son los vehículos inteligentes. En este tipo de
aplicaciones la computación heterogénea no es una opción, sino un requisito.
En este tipo de vehículos se recolectan y analizan imágenes, tarea para la cual
los procesadores gráficos (GPUs) son muy eficientes.
Muchos de estos vehículos utilizan algoritmos sencillos,
pero con grandes requerimientos de tiempo real, que deben
implementarse directamente en hardware utilizando FPGAs.
Y, por supuesto, los procesadores multinúcleo tienen un
papel fundamental en estos sistemas, tanto organizando el trabajo de otros coprocesadores
como ejecutando tareas en las que ningún otro procesador
es más eficiente. No obstante, los procesadores tampoco siguen siendo dispositivos
homogéneos. Los diferentes núcleos de un procesador pueden
ofrecer diferentes características en términos de potencia y consumo
energético que se adapten a las necesidades de cómputo de la aplicación.
Programar este conjunto de dispositivos es una tarea compleja, especialmente
en su sincronización.
Habitualmente, esta sincronización se basa en operaciones atómicas, ejecución y
terminación de kernels, barreras y señales. Con estas primitivas de sincronización
básicas se pueden construir otras estructuras más complejas.
Sin embargo, la programación de estos
mecanismos es tediosa y propensa a fallos. La memoria transaccional
(TM por sus siglas en inglés) se ha propuesto como un mecanismo
avanzado a la vez que simple para garantizar la exclusión mutua
Open reWall: Survey-to-production workflow for building renovation
A reabilitação de espaços interiores, num contexto de personalização em série, requer uma mudança na forma como os sistemas construtivos são desenhados, construídos e reutilizados. Recorrendo a plataformas digitais para a participação os arquitetos, em colaboração com outros atores na indústria AEC, podem desenvolver e oferecer soluções personalizadas e desmontáveis a utilizadores genéricos.
Esta investigação propõe o uso de sistemas de construção personalizada em série (CPS) para fornecer sistemas de divisórias desmontáveis fabricadas digitalmente usando metodologias do levantamento à produção ligadas a configuradores online, em que os utilizadores co-projetam soluções para a reabilitação de espaços interiores.
A metodologia de investigação socorre-se de pesquisa e análise teórica para definir critérios e objetivos a serem explorados em resolução de problemas de projeto. A partir destas experiências são sintetizados princípios e uma metodologia para a conceção de sistemas CPS de sistemas de divisórias personalizáveis e desmontáveis para a reabilitação. A metodologia clarifica os papeis dos atores, passos, e arquitetura do sistema para implementar um sistema CPS do levantamento à produção.
A investigação demonstra que a metodologia de levantamento proposta é utilizável por utilizadores especialistas e não-especialistas, com os últimos a apresentarem em média melhores resultados, e que estes levantamentos têm precisão suficiente para processos do desenho à produção.
Também se demonstra que a metodologia do levantamento à produção, a gramática genérica, e os critérios são úteis para os arquitetos conceberem sistemas de divisórias desmontáveis e personalizáveis para sistemas CPS abertos.Building renovation of interior spaces, in the context of mass customization, requires a shift in how construction systems are designed, built, and reused. Leveraging digital frameworks for user participation, architects in collaboration with other stakeholders in the AEC industry may design anddeliver customized and disassemble-able solutions to generic end-users.
The research proposes mass customization construction (MCC) systems can deliver cost-effective digitally fabricated and disassemble-able construction systems using survey-to-production workflows deployed in web configurators for end-users to co-design solutions in building renovation.
The research methodology uses theoretical inquiry and analysis to define criteria and objectives to be explored in design problem solving. From these experiments generalizable principles and a lowkey workflow for the design of MCC systems of customizable and disassemble-able partition wall construction systems for open building renovation are synthetized. The workflow clarifies stakeholder roles, steps, and system architecture to implement an MCC system from survey to production.
This investigation demonstrates the proposed survey workflow is usable by non-expert and expert instance-designers, with the former having on-average better results, and that these can survey spaces with sufficient precision for design-to-production workflows. It is also shown the survey-to-production workflow, the generic grammar, and criteria are useful for architects to design customizable and disassemble-able partition wall systems for open MCC systems
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
- …