4,695 research outputs found

    Statistical methodologies for the control of dynamic remapping

    Get PDF
    Following an initial mapping of a problem onto a multiprocessor machine or computer network, system performance often deteriorates with time. In order to maintain high performance, it may be necessary to remap the problem. The decision to remap must take into account measurements of performance deterioration, the cost of remapping, and the estimated benefits achieved by remapping. We examine the tradeoff between the costs and the benefits of remapping two qualitatively different kinds of problems. One problem assumes that performance deteriorates gradually, the other assumes that performance deteriorates suddenly. We consider a variety of policies for governing when to remap. In order to evaluate these policies, statistical models of problem behaviors are developed. Simulation results are presented which compare simple policies with computationally expensive optimal decision policies; these results demonstrate that for each problem type, the proposed simple policies are effective and robust

    Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

    Get PDF
    We present a mathematical framework for constructing and analyzing parallel algorithms for lattice Kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. The algorithms can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). The proposed parallel algorithms are controlled-error approximations of kinetic Monte Carlo algorithms, departing from the predominant paradigm of creating parallel KMC algorithms with exactly the same master equation as the serial one. Our methodology relies on a spatial decomposition of the Markov operator underlying the KMC algorithm into a hierarchy of operators corresponding to the processors' structure in the parallel architecture. Based on this operator decomposition, we formulate Fractional Step Approximation schemes by employing the Trotter Theorem and its random variants; these schemes, (a) determine the communication schedule} between processors, and (b) are run independently on each processor through a serial KMC simulation, called a kernel, on each fractional step time-window. Furthermore, the proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules.Comment: 34 pages, 9 figure

    Efficient Parallel Video Encoding on Heterogeneous Systems

    Get PDF
    Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014.In this study we propose an efficient method for collaborative H.264/AVC inter-loop encoding in heterogeneous CPU+GPU systems. This method relies on specifically developed extensive library of highly optimized parallel algorithms for both CPU and GPU architectures, and all inter-loop modules. In order to minimize the overall encoding time, this method integrates adaptive load balancing for the most computationally intensive, inter-prediction modules, which is based on dynamically built functional performance models of heterogenous devices and inter-loop modules. The proposed method also introduces efficient communication-aware techniques, which maximize data reusing, and decrease the overhead of expensive data transfers in collaborative video encoding. The experimental results show that the proposed method is able of achieving real-time video encoding for very demanding video coding parameters, i.e., full HD video format, 64×64 pixels search area and the exhaustive motion estimation.This work was supported by national funds through FCT – Fundação para a Ciência e a Tecnologia, under projects PEst-OE/EEI/LA0021/2013, PTDC/EEI-ELC/3152/2012 and PTDC/EEA-ELC/117329/2010

    Mapping Framework for Heterogeneous Reconfigurable Architectures:Combining Temporal Partitioning and Multiprocessor Scheduling

    Get PDF

    The hArtes Tool Chain

    Get PDF
    This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

    Equilibrado de carga dirigido por modelos de Kernels de datos paralelos en plataformas heterogéneas de alto rendimiento

    Get PDF
    Las aplicaciones de datos paralelos se componen de varios procesos que aplican el mismo cómputo (kernel) a diferentes conjuntos de datos. Además, durante su ejecución, estas aplicaciones necesitan comunicar resultados parciales. Las plataformas heterogéneas son aquellas donde cada recurso de cómputo del sistema es probablemente diferente a los otros, y están compuestas por aceleradores. La conexión entre los elementos se realiza mediante redes de diferente rendimiento y características. Estos tienen que trabajar juntos para ejecutar una aplicación o resolver un problema, lo cual es lo complicado de este escenario. Por ello, el problema del equilibrado de carga de las aplicaciones paralelas de datos en plataformas heterogéneas se está investigando y resolviendo mediante distribuciones no uniformes de la carga de trabajo entre todos los recursos disponibles. Este problema se ha demostrado NP-Completo. La literatura ha desarrollado varias heurísticas para encontrar soluciones óptimas en las que diferentes modelos de rendimiento de computación y comunicación se utilizan como métrica en los algoritmos de partición. Los modelos nos permiten describir el funcionamiento del sistema, mientras que las heurísticas son el enfoque que se utiliza para encontrar una solución satisfactoria. Discutimos el papel de estos modelos y, finalmente para mejorar estos enfoques heurísticos, sustituimos métricas basadas en volumen de comunicaciones por una métrica basada en los tiempos de comunicaciones. Estos tiempos son obtenidos mediante un modelo analítico a través de una herramienta simbólica que manipula, evalúa y representa el coste de la comunicación de una partición con una expresión analítica utilizando el modelo de rendimiento de comunicación τ–Lop.Data-Parallel applications are composed of several processes that apply the same computation (kernel) to different amounts of data. While its execution, these applications need to communicate partial results. The heterogeneous platforms are those where each computation resource of the system is probably different from the others, and are composed of accelerators. The connection between the elements is made through networks of different performance and characteristics. These have to work together to execute an application or solve a problem, which is the complicated part of this scenario. Therefore, the load balancing problem of Data-Parallel applications in heterogeneous platforms is being investigated and solved by non-uniform distributions of the workload among all available resources. The objective of this solution is to find a partition that minimizes the cost of computation and communication, which is not trivial. This problem is demonstrated as NP-Complete. The literature has developed several heuristics to find optimal solutions where computation and communication performance models are used as metrics in the partitioning algorithms. The models allow us to describe the functioning of the system, while heuristics are the approach used to find a satisfactory solution. We discuss the role of these models and finally, to improve these heuristic approaches, we replace metrics based on communications volume with a metric based on communication times. These times are obtained through a symbolic tool that manipulates, evaluates and represents the cost of communication of a partition with an analytic expression using the communication performance model τ –Lop.Máster Universitario en Ingeniería Informática. Universidad de Extremadur

    Optimal dynamic remapping of parallel computations

    Get PDF
    A large class of computations are characterized by a sequence of phases, with phase changes occurring unpredictably. The decision problem was considered regarding the remapping of workload to processors in a parallel computation when the utility of remapping and the future behavior of the workload is uncertain, and phases exhibit stable execution requirements during a given phase, but requirements may change radically between phases. For these problems a workload assignment generated for one phase may hinder performance during the next phase. This problem is treated formally for a probabilistic model of computation with at most two phases. The fundamental problem of balancing the expected remapping performance gain against the delay cost was addressed. Stochastic dynamic programming is used to show that the remapping decision policy minimizing the expected running time of the computation has an extremely simple structure. Because the gain may not be predictable, the performance of a heuristic policy that does not require estimnation of the gain is examined. The heuristic method's feasibility is demonstrated by its use on an adaptive fluid dynamics code on a multiprocessor. The results suggest that except in extreme cases, the remapping decision problem is essentially that of dynamically determining whether gain can be achieved by remapping after a phase change. The results also suggest that this heuristic is applicable to computations with more than two phases

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability
    corecore