228 research outputs found
Multi-Path Bound for DAG Tasks
This paper studies the response time bound of a DAG (directed acyclic graph)
task. Recently, the idea of using multiple paths to bound the response time of
a DAG task, instead of using a single longest path in previous results, was
proposed and leads to the so-called multi-path bound. Multi-path bounds can
greatly reduce the response time bound and significantly improve the
schedulability of DAG tasks. This paper derives a new multi-path bound and
proposes an optimal algorithm to compute this bound. We further present a
systematic analysis on the dominance and the sustainability of three existing
multi-path bounds and the proposed multi-path bound. Our bound theoretically
dominates and empirically outperforms all existing multi-path bounds. What's
more, the proposed bound is the only multi-path bound that is proved to be
self-sustainable
Performance analysis and tuning in multicore environments
Performance analysis is the task of monitor the behavior of a program execution. The main goal is to find out the possible adjustments that might be done in order improve the performance. To be able to get that improvement it is necessary to find the different causes of overhead. Nowadays we are already in the multicore era, but there is a gap between the level of development of the two main divisions of multicore technology (hardware and software). When we talk about multicore we are also speaking of shared memory systems, on this master thesis we talk about the issues involved on the performance analysis and tuning of applications running specifically in a shared Memory system. We move one step ahead to take the performance analysis to another level by analyzing the applications structure and patterns. We also present some tools specifically addressed to the performance analysis of OpenMP multithread application. At the end we present the results of some experiments performed with a set of OpenMP scientific application.Análisis de rendimiento es el área de estudio encargada de monitorizar el comportamiento de la ejecución de programas informáticos. El principal objetivo es encontrar los posibles ajustes que serán necesarios para mejorar el rendimiento. Para poder obtener esa mejora es necesario encontrar las principales causas de overhead. Actualmente estamos sumergidos en la era multicore, pero existe una brecha entre el nivel de desarrollo de sus dos principales divisiones (hardware y software). Cuando hablamos de multicore también estamos hablando de sistemas de memoria compartida. Nosotros damos un paso más al abordar el análisis de rendimiento a otro nivel por medio del estudio de la estructura de las aplicaciones y sus patrones. También presentamos herramientas de análisis de aplicaciones que son especÃficas para el análisis de rendimiento de aplicaciones paralelas desarrolladas con OpenMP. Al final presentamos los resultados de algunos experimentos realizados con un grupo de aplicaciones cientÃficas desarrolladas bajo este modelo de programación.L'Anà lisi de rendiment és l'à rea d'estudi encarregada de monitorar el comportament de l'execució de programes informà tics. El principal objectiu és trobar els possibles ajustaments que seran necessaris per a millorar el rendiment. Per a poder obtenir aquesta millora és necessari trobar les principals causes de l'overhead (excessos de computació no productiva). Actualment estem immersos en l'era multicore, però existeix una rasa entre el nivell de desenvolupament de les seves dues principals divisions (maquinari i programari). Quan parlam de multicore, també estem parlant de sistemes de memòria compartida. Nosaltres donem un pas més per a abordar l'anà lisi de rendiment en un altre nivell per mitjà de l'estudi de l'estructura de les aplicacions i els seus patrons. També presentem eines d'anà lisis d'aplicacions que són especÃfiques per a l'anà lisi de rendiment d'aplicacions paral·leles desenvolupades amb OpenMP. Al final, presentem els resultats d'alguns experiments realitzats amb un grup d'aplicacions cientÃfiques desenvolupades sota aquest model de programació
Extreme scale parallel NBody algorithm with event driven constraint based execution model
Traditional scientific applications such as Computational Fluid Dynamics, Partial Differential Equations based numerical methods (like Finite Difference Methods, Finite Element Methods) achieve sufficient efficiency on state of the art high performance computing systems and have been widely studied / implemented using conventional programming models. For emerging application domains such as Graph applications scalability and efficiency is significantly constrained by the conventional systems and their supporting programming models. Furthermore technology trends like multicore, manycore, heterogeneous system architectures are introducing new challenges and possibilities. Emerging technologies are requiring a rethinking of approaches to more effectively expose the underlying parallelism to the applications and the end-users. This thesis explores the space of effective parallel execution of ephemeral graphs that are dynamically generated. The standard particle based simulation, solved using the Barnes-Hut algorithm is chosen to exemplify the dynamic workloads. In this thesis the workloads are expressed using sequential execution semantics, a conventional parallel programming model - shared memory semantics and semantics of an innovative execution model designed for efficient scalable performance towards Exascale computing called ParalleX. The main outcomes of this research are parallel processing of dynamic ephemeral workloads, enabling dynamic load balancing during runtime, and using advanced semantics for exposing parallelism in scaling constrained applications
Acceleration of Computational Geometry Algorithms for High Performance Computing Based Geo-Spatial Big Data Analysis
Geo-Spatial computing and data analysis is the branch of computer science that deals with real world location-based data. Computational geometry algorithms are algorithms that process geometry/shapes and is one of the pillars of geo-spatial computing. Real world map and location-based data can be huge in size and the data structures used to process them extremely big leading to huge computational costs. Furthermore, Geo-Spatial datasets are growing on all V’s (Volume, Variety, Value, etc.) and are becoming larger and more complex to process in-turn demanding more computational resources. High Performance Computing is a way to breakdown the problem in ways that it can run in parallel on big computers with massive processing power and hence reduce the computing time delivering the same results but much faster.This dissertation explores different techniques to accelerate the processing of computational geometry algorithms and geo-spatial computing like using Many-core Graphics Processing Units (GPU), Multi-core Central Processing Units (CPU), Multi-node setup with Message Passing Interface (MPI), Cache optimizations, Memory and Communication optimizations, load balancing, Algorithmic Modifications, Directive based parallelization with OpenMP or OpenACC and Vectorization with compiler intrinsic (AVX). This dissertation has applied at least one of the mentioned techniques to the following problems. Novel method to parallelize plane sweep based geometric intersection for GPU with directives is presented. Parallelization of plane sweep based Voronoi construction, parallelization of Segment tree construction, Segment tree queries and Segment tree-based operations has been presented. Spatial autocorrelation, computation of getis-ord hotspots are also presented. Acceleration performance and speedup results are presented in each corresponding chapter
Response-Time Analysis of Limited-Preemptive Parallel DAG Tasks Under Global Scheduling
Most recurrent real-time applications can be modeled as a set of sequential code segments (or blocks) that must be (repeatedly) executed in a specific order. This paper provides a schedulability analysis for such systems modeled as a set of parallel DAG tasks executed under any limited-preemptive global job-level fixed priority scheduling policy. More precisely, we derive response-time bounds for a set of jobs subject to precedence constraints, release jitter, and execution-time uncertainty, which enables support for a wide variety of parallel, limited-preemptive execution models (e.g., periodic DAG tasks, transactional tasks, generalized multi-frame tasks, etc.). Our analysis explores the space of all possible schedules using a powerful new state abstraction and state-pruning technique. An empirical evaluation shows the analysis to identify between 10 to 90 percentage points more schedulable task sets than the state-of-the-art schedulability test for limited-preemptive sporadic DAG tasks. It scales to systems of up to 64 cores with 20 DAG tasks. Moreover, while our analysis is almost as accurate as the state-of-the-art exact schedulability test based on model checking (for sequential non-preemptive tasks), it is three orders of magnitude faster and hence capable of analyzing task sets with more than 60 tasks on 8 cores in a few seconds
MIMOPack: A High Performance Computing Library for MIMO Communication Systems
[EN] Nowadays, several communication standards are emerging and evolving, searching
higher transmission rates, reliability and coverage. This expansion is
primarily driven by the continued increase in consumption of mobile multimedia services
due to the emergence of new handheld devices such as smartphones and tablets.
One of the most significant techniques employed to meet these demands is the use
of multiple transmit and receive antennas, known as MIMO systems. The use of this technology allows to increase the
transmission rate and the quality of the transmission through the use of multiple antennas at the
transmitter and receiver sides.
MIMO technologies have become an essential key in several wireless standards such as WLAN, WiMAX and LTE.
These technologies will be incorporated also in future standards, therefore is
expected in the coming years a great deal of research in this field.
Clearly, the study of MIMO systems is critical in the current investigation,
however the problems that arise from this technology are very complex.
High Performance Computing (HPC) systems, and specifically, modern hardware
architectures as multi-core and many-cores (e.g Graphics Processing Units (GPU))
are playing a key role in the development of efficient and low-complexity
algorithms for MIMO transmissions. Proof of this is that the number of
scientific contributions and research projects related to its use has increased in the last years.
Also, some high performance libraries have been implemented as
tools for researchers involved in the development of future
communication standards. Two of the most popular libraries are: IT++
that is a library based on the use of some optimized libraries for multi-core
processors and the Communications System Toolbox designed for use with MATLAB, which uses GPU computing. However, there is not a library able to
run on a heterogeneous platform using all the available resources.
In view of the high computational requirements in MIMO application research and
the shortage of tools able to satisfy them, we have made a special effort to develop a
library to ease the development of adaptable parallel applications in accordance
with the different architectures of the executing platform. The library, called MIMOPack, aims to implement efficiently using parallel computing, a set of functions to perform some of the critical stages of MIMO communication systems simulation.
The main contribution of the thesis is the implementation of efficient Hard and Soft output detectors, since the detection stage is considered the most complex part of the communication process. These detectors are highly configurable and many of them include preprocessing techniques that reduce the computational cost and increase the performance.
The proposed library shows three important features: portability,
efficiency and easy of use. Current realease allows GPUs and multi-core computation, or even
simultaneously, since it is designed to use on heterogeneous machines. The interface of the functions are common to all environments
in order to simplify the use of the library. Moreover, some of the functions are callable from MATLAB increasing the portability of developed codes between different computing environments.
According to the library design and the performance assessment, we consider that MIMOPack may facilitate
industrial and academic researchers the implementation of scientific codes without having to know different programming
languages and machine architectures. This will allow to include more complex
algorithms in their simulations and obtain their results faster. This is
particularly important in the industry, since the manufacturers work
to analyze and to propose their own technologies with the aim that it will be
approved as a standard. Thus allowing to enforce their intellectual property
rights over their competitors, who should obtain the corresponding licenses
to include these technologies into their products.[ES] En la actualidad varios estándares de comunicación están surgiendo buscando velocidades de transmisión más altas y mayor fiabilidad. Esta expansión está impulsada por el aumento en el consumo de servicios multimedia debido a la aparición de nuevos dispositivos como los smartphones y las tabletas.
Una de las técnicas empleadas más importantes es el uso de múltiples antenas de transmisión y recepción, conocida como sistemas MIMO, que permite aumentar la velocidad y la calidad de la transmisión.
Las tecnologÃas MIMO se han convertido en una parte esencial en diferentes estándares tales como WLAN, WiMAX y LTE.
Estas tecnologÃas se incorporarán también en futuros estándares, por lo tanto, se espera en los próximos años una gran cantidad de investigación en este campo.
Está claro que el estudio de los sistemas MIMO es crÃtico en la investigación actual, sin embargo los problemas que surgen de esta tecnologÃa son muy complejos. La sistemas de computación de alto rendimiento, y en concreto, las arquitecturas hardware actuales como multi-core y many-core (p. ej. GPUs) están jugando un papel clave en el desarrollo de algoritmos eficientes y de baja complejidad en las transmisiones MIMO. Prueba de ello es que el número de contribuciones cientÃficas y proyectos de investigación relacionados con su uso se han incrementado en el últimos años.
Algunas librerÃas de alto rendimiento se están utilizando como
herramientas por investigadores en el desarrollo de
futuros estándares. Dos de las librerÃas más destacadas
son: IT++ que se basa en el uso de distintas librerÃas optimizadas para procesadores multi-core y el paquete Communications System Toolbox diseñada para su uso con MATLAB, que utiliza computación con GPU. Sin embargo, no hay una biblioteca capaz de ejecutarse en una plataforma heterogénea.
En vista de los altos requisitos computacionales en la investigación MIMO y
la escasez de herramientas capaces de satisfacerlos, hemos implementado una
librerÃa que facilita el desarrollo de aplicaciones paralelas adaptables de
acuerdo con las diferentes arquitecturas de la plataforma de ejecución. La
librerÃa, llamada MIMOPack, implementa de manera eficiente un conjunto de funciones para llevar a cabo algunas de las etapas crÃticas en la simulación de un sistema de comunicación MIMO.
La principal aportación de la tesis es la implementación de detectores eficientes de salida Hard y Soft, ya que la etapa de detección es considerada la parte más compleja en el proceso de comunicación.
Estos detectores son altamente configurables y muchos de ellos incluyen
técnicas de preprocesamiento que reducen el coste computacional y
aumentan el rendimiento.
La librerÃa propuesta tiene tres caracterÃsticas importantes: la portabilidad, la eficiencia y facilidad de uso. La versión actual permite computación en GPU y multi-core, incluso simultáneamente, ya que está diseñada para ser utilizada sobre plataformas heterogéneas que explotan toda la capacidad computacional. Para facilitar el uso de la biblioteca, las interfaces de las funciones son comunes para todas las arquitecturas. Algunas de las funciones se pueden llamar desde MATLAB aumentando la portabilidad de códigos desarrollados entre los diferentes entornos.
De acuerdo con el diseño de la biblioteca y la evaluación del rendimiento,
consideramos que MIMOPack puede facilitar la implementación de códigos sin tener que saber programar con diferentes lenguajes y arquitecturas. MIMOPack permitirá incluir algoritmos más complejos en las simulaciones y obtener los resultados
más rápidamente. Esto es particularmente importante en la industria,
ya que los fabricantes trabajan para proponer sus propias tecnologÃas lo antes posible con el objetivo de que sean aprobadas como un estándar. De este modo, los fabricantes pueden hacer valer sus derechos de propiedad intelectual frente a sus competidores, quienes luego deben obtener las correspon[CA] En l'actualitat diversos està ndards de comunicació estan sorgint i
evolucionant cercant velocitats de transmissió més altes i major
fiabilitat. Aquesta expansió, està impulsada pel continu augment en el consum de serveis multimèdia a causa de l'aparició de
nous dispositius portà tils com els smartphones i les tablets.
Una de les tècniques més importants és l'ús de múltiples antenes de transmissió i recepció (MIMO) que permet augmentar la velocitat de transmissió i la qualitat de transmissió.
Les tecnologies MIMO s'han convertit en una part essencial en diferents
està ndards inalà mbrics, tals com WLAN, WiMAX i LTE. Aquestes
tecnologies s'incorporaran també en futurs està ndards, per tant, s'espera en
els pròxims anys una gran quantitat d'investigació en aquest camp.
L'estudi dels sistemes MIMO és crÃtic en la recerca actual,
no obstant açó, els problemes que sorgeixen d'aquesta tecnologia són molt
complexos. Els sistemes de computació d'alt rendiment com els multi-core i many-core (p. ej. GPUs)), estan jugant un paper clau en el desenvolupament
d'algoritmes eficients i de baixa complexitat en les transmissions MIMO. Prova
d'açò és que el nombre de contribucions cientÃfiques i projectes
d'investigació relacionats amb el seu ús s'han incrementat en els últims anys.
Algunes llibreries d'alt rendiment estan utilitzant-se com a eines
per investigadors involucrats en el desenvolupament de futurs
està ndards. Dos de les llibreries més destacades són:
IT++ que és una llibreria basada en lús de diferents llibreries optimitzades per a
processadors multi-core i el paquet Communications System Toolbox dissenyat per
al seu ús amb MATLAB, que utilitza computació amb GPU. No obstant açò, no hi ha una
biblioteca capaç d'executar-se en una plataforma heterogènia.
Degut als alts requisits computacionals en la investigació MIMO i l'escacès
d'eines capaces de satisfer-los, hem implementat
una llibreria que facilita el desenvolupament d'aplicacions paral·leles
adaptables d'acord amb les diferentes arquitectures de la plataforma
d'ejecució. La llibreria, anomenada MIMOPack, implementa
de manera eficient, un conjunt de
funcions per dur a terme algunes de les etapes crÃtiques en la simulació
d'un sistema de comunicació MIMO.
La principal aportació de la tesi és la implementació de detectors
eficients d'exida Hard i Soft, ja que l'etapa de detecció és considerada
la part més complexa en el procés de comunicació. Estos detectors són
altament configurables i molts d'ells inclouen tècniques de preprocessament
que redueixen el cost computacional i augmenten el rendiment. La llibreria
proposta té tres caracterÃstiques importants: la portabilitat,
l'eficiència i la facilitat d'ús. La versió actual permet
computació en GPU i multi-core, fins i tot simultà niament, ja que estÃ
dissenyada per a ser utilitzada sobre plataformes heterogènies que exploten
tota la capacitat computacional. Amb el fi de simplificar l'ús de la biblioteca,
les interfaces de les funcions són comunes per a totes les arquitectures. Algunes de
les funcions poden ser utilitzades des de MATLAB augmentant la portabilitat de
còdics desenvolupats entre els diferentes entorns.
D'acord amb el disseny de la biblioteca i l'evaluació del rendiment,
considerem que MIMOPack pot facilitar la implementació de còdics a investigadors sense haver de saber programar amb diferents llenguatges i arquitectures. MIMOPack permetrÃ
incloure algoritmes més complexos en les seues simulacions i obtindre els seus
resultats més rà pid. Açò és particularment important en la
industria, ja que els fabricants treballen per a proposar les seues pròpies
tecnologies el més prompte possible amb l'objectiu que siguen aprovades com un
està ndard. D'aquesta menera, els fabricants podran fer valdre els seus drets
de propietat intel·lectual enfront dels seus competidors, els qui després han
d'obtenir les corresponents llicències si voleRamiro Sánchez, C. (2015). MIMOPack: A High Performance Computing Library for MIMO Communication Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/53930TESISPremios Extraordinarios de tesis doctorale
An automated OpenCL FPGA compilation framework targeting a configurable, VLIW chip multiprocessor
Modern system-on-chips augment their baseline CPU with coprocessors and accelerators to increase overall computational capacity and power efficiency, and thus have evolved into heterogeneous systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This thesis discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a customised VLIW chip multiprocessor (CMP) architecture, known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on the LE1 CPU. The framework fully automates the compilation flow and supports work-item coalescing to better utilise the CPU cores and alleviate the effects of thread divergence. This thesis discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework on a highly precise cycle-accurate simulator. This is achieved through the execution of 12 benchmarks across 240 different machine configurations, as well as further results utilising an incomplete development branch of the compiler. It is shown that the problems generally scale well with the LE1 architecture, up to eight cores, when the memory system becomes a serious bottleneck. Results demonstrate superlinear performance on certain benchmarks (x9 for the bitonic sort benchmark with 8 dual-issue cores) with further improvements from compiler optimisations (x14 for bitonic with the same configuration
- …