Search CORE

4,695 research outputs found

Statistical methodologies for the control of dynamic remapping

Author: Nicol D. M.
Saltz J. H.
Publication venue
Publication date
Field of study

Following an initial mapping of a problem onto a multiprocessor machine or computer network, system performance often deteriorates with time. In order to maintain high performance, it may be necessary to remap the problem. The decision to remap must take into account measurements of performance deterioration, the cost of remapping, and the estimated benefits achieved by remapping. We examine the tradeoff between the costs and the benefits of remapping two qualitatively different kinds of problems. One problem assumes that performance deteriorates gradually, the other assumes that performance deteriorates suddenly. We consider a variety of policies for governing when to remap. In order to evaluate these policies, statistical models of problem behaviors are developed. Simulation results are presented which compare simple policies with computationally expensive optimal decision policies; these results demonstrate that for each problem type, the proposed simple policies are effective and robust

NASA Technical Reports Server

ASAM : Automatic Architecture Synthesis and Application Mapping; dl. 2: Final design methodology, flow, and tool requirements

Author: Cocco M.
Corvino R.
Jordans R.
Jozwiak L.
Kienhuis B.
Lindwer M.
Madsen J.
Meloni P.
Notarangelo G.
Raffo L.
Publication venue: 'Anadolu Universitesi Bilim ve Teknoloji Dergisi C : Yasam Bilimleri ve Biyoteknoloji'
Publication date: 01/01/2011
Field of study

Repository TU/e

Pure OAI Repository

Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

Author: Arampatzis Giorgos
Katsoulakis Markos A.
Plechac Petr
Taufer Michela
Xu Lifan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

We present a mathematical framework for constructing and analyzing parallel algorithms for lattice Kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. The algorithms can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). The proposed parallel algorithms are controlled-error approximations of kinetic Monte Carlo algorithms, departing from the predominant paradigm of creating parallel KMC algorithms with exactly the same master equation as the serial one. Our methodology relies on a spatial decomposition of the Markov operator underlying the KMC algorithm into a hierarchy of operators corresponding to the processors' structure in the parallel architecture. Based on this operator decomposition, we formulate Fractional Step Approximation schemes by employing the Trotter Theorem and its random variants; these schemes, (a) determine the communication schedule} between processors, and (b) are run independently on each processor through a serial KMC simulation, called a kernel, on each fractional step time-window. Furthermore, the proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules.Comment: 34 pages, 9 figure

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

ACMAC

Efficient Parallel Video Encoding on Heterogeneous Systems

Author: Ilic Aleksandar
Momcilovic Svetislav
Roma Nuno
Sousa Leonel
Publication venue
Publication date: 01/01/2014
Field of study

Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014.In this study we propose an efficient method for collaborative H.264/AVC inter-loop encoding in heterogeneous CPU+GPU systems. This method relies on specifically developed extensive library of highly optimized parallel algorithms for both CPU and GPU architectures, and all inter-loop modules. In order to minimize the overall encoding time, this method integrates adaptive load balancing for the most computationally intensive, inter-prediction modules, which is based on dynamically built functional performance models of heterogenous devices and inter-loop modules. The proposed method also introduces efficient communication-aware techniques, which maximize data reusing, and decrease the overhead of expensive data transfers in collaborative video encoding. The experimental results show that the proposed method is able of achieving real-time video encoding for very demanding video coding parameters, i.e., full HD video format, 64×64 pixels search area and the exhaustive motion estimation.This work was supported by national funds through FCT – Fundação para a Ciência e a Tecnologia, under projects PEst-OE/EEI/LA0021/2013, PTDC/EEI-ELC/3152/2012 and PTDC/EEA-ELC/117329/2010

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Mapping Framework for Heterogeneous Reconfigurable Architectures:Combining Temporal Partitioning and Multiprocessor Scheduling

Author: Popp Andreas
Publication venue: Department of Electronic Systems, Aalborg University
Publication date: 01/01/2010
Field of study

VBN

The hArtes Tool Chain

Author: A. Antola
A. Cerruto
A. Lattanzi
A. Michelotti
A. Morea
C. Pilato
D. Sciuto
E. Ciavattini
F. Bettarelli
F. Ferrandi
J.G.F. Coutinho
K. Bertels
K. Sigdel
M. Lattuada
M.T. Chiaradia
R. Nutricato
R.J. Meeuws
T. Todman
V.M. Sima
W. Luk
Y. Yankova
Y.M. Lam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

Archivio istituzionale della ricerca - Politecnico di Milano

Equilibrado de carga dirigido por modelos de Kernels de datos paralelos en plataformas heterogéneas de alto rendimiento

Author: Moreno Álvarez Sergio
Publication venue
Publication date: 01/01/2019
Field of study

Las aplicaciones de datos paralelos se componen de varios procesos que aplican el mismo cómputo (kernel) a diferentes conjuntos de datos. Además, durante su ejecución, estas aplicaciones necesitan comunicar resultados parciales. Las plataformas heterogéneas son aquellas donde cada recurso de cómputo del sistema es probablemente diferente a los otros, y están compuestas por aceleradores. La conexión entre los elementos se realiza mediante redes de diferente rendimiento y características. Estos tienen que trabajar juntos para ejecutar una aplicación o resolver un problema, lo cual es lo complicado de este escenario. Por ello, el problema del equilibrado de carga de las aplicaciones paralelas de datos en plataformas heterogéneas se está investigando y resolviendo mediante distribuciones no uniformes de la carga de trabajo entre todos los recursos disponibles. Este problema se ha demostrado NP-Completo. La literatura ha desarrollado varias heurísticas para encontrar soluciones óptimas en las que diferentes modelos de rendimiento de computación y comunicación se utilizan como métrica en los algoritmos de partición. Los modelos nos permiten describir el funcionamiento del sistema, mientras que las heurísticas son el enfoque que se utiliza para encontrar una solución satisfactoria. Discutimos el papel de estos modelos y, finalmente para mejorar estos enfoques heurísticos, sustituimos métricas basadas en volumen de comunicaciones por una métrica basada en los tiempos de comunicaciones. Estos tiempos son obtenidos mediante un modelo analítico a través de una herramienta simbólica que manipula, evalúa y representa el coste de la comunicación de una partición con una expresión analítica utilizando el modelo de rendimiento de comunicación τ–Lop.Data-Parallel applications are composed of several processes that apply the same computation (kernel) to different amounts of data. While its execution, these applications need to communicate partial results. The heterogeneous platforms are those where each computation resource of the system is probably different from the others, and are composed of accelerators. The connection between the elements is made through networks of different performance and characteristics. These have to work together to execute an application or solve a problem, which is the complicated part of this scenario. Therefore, the load balancing problem of Data-Parallel applications in heterogeneous platforms is being investigated and solved by non-uniform distributions of the workload among all available resources. The objective of this solution is to find a partition that minimizes the cost of computation and communication, which is not trivial. This problem is demonstrated as NP-Complete. The literature has developed several heuristics to find optimal solutions where computation and communication performance models are used as metrics in the partitioning algorithms. The models allow us to describe the functioning of the system, while heuristics are the approach used to find a satisfactory solution. We discuss the role of these models and finally, to improve these heuristic approaches, we replace metrics based on communications volume with a metric based on communication times. These times are obtained through a symbolic tool that manipulates, evaluates and represents the cost of communication of a partition with an analytic expression using the communication performance model τ –Lop.Máster Universitario en Ingeniería Informática. Universidad de Extremadur

Dehesa. Repositorio Institucional de la Universidad de Extremadura

Optimal dynamic remapping of parallel computations

Author: Nicol David M.
Reynolds Paul F., Jr.
Publication venue
Publication date
Field of study

A large class of computations are characterized by a sequence of phases, with phase changes occurring unpredictably. The decision problem was considered regarding the remapping of workload to processors in a parallel computation when the utility of remapping and the future behavior of the workload is uncertain, and phases exhibit stable execution requirements during a given phase, but requirements may change radically between phases. For these problems a workload assignment generated for one phase may hinder performance during the next phase. This problem is treated formally for a probabilistic model of computation with at most two phases. The fundamental problem of balancing the expected remapping performance gain against the delay cost was addressed. Stochastic dynamic programming is used to show that the remapping decision policy minimizing the expected running time of the computation has an extremely simple structure. Because the gain may not be predictable, the performance of a heuristic policy that does not require estimnation of the gain is examined. The heuristic method's feasibility is demonstrated by its use on an adaptive fluid dynamics code on a multiprocessor. The results suggest that except in extreme cases, the remapping decision problem is essentially that of dynamically determining whether gain can be achieved by remapping after a phase change. The results also suggest that this heuristic is applicable to computations with more than two phases

NASA Technical Reports Server

Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

Author: Becker Jürgen
Hübner Michael
Lagadec Loïc
Sander Oliver
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2010
Field of study

ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

KITopen