3,576 research outputs found

    A Review on Software Architectures for Heterogeneous Platforms

    Full text link
    The increasing demands for computing performance have been a reality regardless of the requirements for smaller and more energy efficient devices. Throughout the years, the strategy adopted by industry was to increase the robustness of a single processor by increasing its clock frequency and mounting more transistors so more calculations could be executed. However, it is known that the physical limits of such processors are being reached, and one way to fulfill such increasing computing demands has been to adopt a strategy based on heterogeneous computing, i.e., using a heterogeneous platform containing more than one type of processor. This way, different types of tasks can be executed by processors that are specialized in them. Heterogeneous computing, however, poses a number of challenges to software engineering, especially in the architecture and deployment phases. In this paper, we conduct an empirical study that aims at discovering the state-of-the-art in software architecture for heterogeneous computing, with focus on deployment. We conduct a systematic mapping study that retrieved 28 studies, which were critically assessed to obtain an overview of the research field. We identified gaps and trends that can be used by both researchers and practitioners as guides to further investigate the topic

    Solving Large-Scale Markov Decision Processes on Low-Power Heterogeneous Platforms

    Get PDF
    Markov Decision Processes (MDPs) provide a framework for a machine to act autonomously and intelligently in environments where the effects of its actions are not deterministic. MDPs have numerous applications. We focus on practical applications for decision making, such as autonomous driving and service robotics, that have to run on mobile platforms with scarce computing and power resources. In our study, we use Value Iteration to solve MDPs, a core method of the paradigm to find optimal sequences of actions, which is well known for its high computational cost. In order to solve these computationally complex problems efficiently in platforms with stringent power consumption constraints, high-performance accelerator hardware and parallelised software come to the rescue. We introduce a generalisable approach to implement practical applications for decision making, such as autonomous driving on mobile and embedded low-power heterogeneous SoC platforms that integrate an accelerator (GPU) with a multicore. We evaluate three scheduling strategies that enable concurrent execution and efficient use of resources on a variety of SoCs embedding a multicore CPU and integrated GPU, namely Oracle, Dynamic, and LogFit. We compare these strategies for solving an MDP modelling the use-case of autonomous robot navigation in indoor environments on four representative platforms for mobile decision-making applications with a power use ranging from 4 to 65 Watts. We provide a rigorous analysis of the results to better understand their behaviour depending on the MDP size and the computing platform. Our experimental results show that by using CPU-GPU heterogeneous strategies, the computation time and energy required are considerably reduced with respect to multicore implementation, regardless of the computational platform.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work was partially supported by the Spanish project TIN 2016-80920-R

    Parallel prefix operations on heterogeneous platforms

    Get PDF
    Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo] As tarxetas gráficas, coñecidas como GPUs, aportan grandes vantaxes no rendemento computacional e na eficiencia enerxética, sendo un piar clave para a computación de altas prestacións (HPC). Sen embargo, esta tecnoloxía tamén é custosa de programar, e ten certos problemas asociados á portabilidade entre as diferentes tarxetas. Por autra banda, os algoritmos de prefixo paralelo son un conxunto de algoritmos paralelos regulares e moi empregados nas ciencias compuacionais, cuxa eficiencia é esencial en moita."3 aplicacións. Neste eiclo, aínda que as GPUs poden acelerar a computación destes algoritmos, tamén poden ser unha limitación cando non explotan axeitadamente o paralelismo da arquitectura CPU. Esta Tese presenta dúas perspectivas. Dunha parte, deséñanse novos algoritmos de prefixo paralelo para calquera paradigma de programación paralela. Pola outra banda, tamén se propón unha metodoloxÍa xeral que implementa eficientemente algoritmos de prefixo paralelos, de xeito doado e portable, sobre arquitecturas GPU CUDA, mais que se centrar nun algoritmo particular ou nun modelo concreto de tarxeta. Para isto, a metodoloxía identifica os paramétros da GPU que inflúen no rendemento e, despois, seguindo unha serie de premisas teóricas, obtéñense os valores óptimos destes parámetros dependendo do algoritmo, do tamaño do problema e da arquitectura GPU empregada. Ademais, esta Tese tamén prové unha serie de fUllciólls GPU compostas de bloques de código CUDA modulares e reutilizables, o que permite a implementación de calquera algoritmo de xeito sinxelo. Segundo o tamaño do problema, propóñense tres aproximacións. As dúas primeiras resolven problemas pequenos, medios e grandes nunha única GPU) mentras que a terceira trata con tamaños extremad8.1nente grandes, usando varias GPUs. As nosas propostas proporcionan uns resultados moi competitivos a nivel de rendemento, mellorando as propostas existentes na bibliografía para as operacións probadas: a primitiva sean, ordenación e a resolución de sistemas tridiagonais.[Resumen] Las tarjetas gráficas (GPUs) han demostrado gmndes ventajas en el rendimiento computacional y en la eficiencia energética, siendo una tecnología clave para la computación de altas prestaciones (HPC). Sin embargo, esta tecnología también es costosa de progTamar, y tiene ciertos problemas asociados a la portabilidad de sus códigos entre diferentes generaciones de tarjetas. Por otra parte, los algoritmos de prefijo paralelo son un conjunto de algoritmos regulares y muy utilizados en las ciencias computacionales, cuya eficiencia es crucial en muchas aplicaciones. Aunque las GPUs puedan acelerar la computación de estos algoritmos, también pueden ser una limitación si no explotan correctamente el paralelismo de la arquitectura CPU. Esta Tesis presenta dos perspectivas. De un lado, se han diseñado nuevos algoritmos de prefijo paralelo que pueden ser implementados en cualquier paradigma de programación paralela. Por otra parte, se propone una metodología general que implementa eficientemente algoritmos de prefijo paralelo, de forma sencilla y portable, sobre cualquier arquitectura GPU CUDA, sin centrarse en un algoritmo particular o en un modelo de tarjeta. Para ello, la metodología identifica los parámetros GPU que influyen en el rendimiento y, siguiendo un conjunto de premisas teóricas, obtiene los valores óptimos para cada algoritmo, tamaño de problema y arquitectura. Además, las funciones GPU proporcionadas están compuestas de bloques de código CUDA reutilizable y modular, lo que permite la implementación de cualquier algoritmo de prefijo paralelo sencillamente. Dependiendo del tamaño del problema, se proponen tres aproximaciones. Las dos primeras resuelven tamaños pequeños, medios y grandes, utilizando para ello una única GPU i mientras que la tercera aproximación trata con tamaños extremadamente grandes, usando varias GPUs. Nuestras propuestas proporcionan resultados muy competitivos, mejorando el rendimiento de las propuestas existentes en la bibliografía para las operaciones probadas: la primitiva sean, ordenación y la resolución de sistemas tridiagonales.[Abstract] Craphics Processing Units (CPUs) have shown remarkable advantages in computing performance and energy efficiency, representing oue of the most promising trends fúr the near-fnture of high perfonnance computing. However, these devices also bring sorne programming complexities, and many efforts are required tú provide portability between different generations. Additionally, parallel prefix algorithms are a 8et of regular and highly-used parallel algorithms, whose efficiency is crutial in roany computer sCience applications. Although GPUs can accelerate the computation of such algorithms, they can also be a limitation when they do not match correctly to the CPU architecture or do not exploit the CPU parallelism properly. This dissertation presents two different perspectives. Gn the Oile hand, new parallel prefix algorithms have been algorithmicany designed for any paranel progrannning paradigm. On the other hand, a general tuning CPU methodology is proposed to provide an easy and portable mechanism tú efficiently implement paranel prefix algorithms on any CUDA CPU architecture, rather than focusing on a particular algorithm or a CPU mode!. To accomplish this goal, the methodology identifies the GPU parameters which influence on the performance and, following a set oí performance premises, obtains the cOllvillient values oí these parameters depending on the algorithm, the problem size and the CPU architecture. Additionally, the provided CPU functions are composed of modular and reusable CUDA blocks of code, which allow the easy implementation of any paranel prefix algorithm. Depending on the size of the dataset, three different approaches are proposed. The first two approaches solve small and medium-large datasets on a single GPU; whereas the third approach deals with extremely large datasets on a Multiple-CPU environment. OUT proposals provide very competitive performance, outperforming the stateof- the-art for many parallel prefix operatiOllS, such as the sean primitive, sorting and solving tridiagonal systems

    The value of incumbency for heterogeneous platforms

    Get PDF
    We study the dynamics of competition in a model with network effects, an incumbent and entry. We propose a new way of representing the strategic advantages of incumbency in a static model and embed it in a dynamic framework with heterogeneous consumers. We completely identify the conditions under which inefficient equilibria with two platforms emerge at equilibrium; explore the reasons why these inefficient equilibria arise; compute the profits of the incumbent and demonstrate that the incumbency advantage does not improve much, if at all, when going from a static to a dynamic framework

    The value of incumbency for heterogeneous platforms

    Get PDF
    We study the dynamics of competition in a model with network effects, an incumbent and entry. We propose a new way of representing the strategic advantages of incumbency in a static model and embed it in a dynamic framework with heterogeneous consumers. We completely identify the conditions under which inefficient equilibria with two platforms emerge at equilibrium; explore the reasons why these inefficient equilibria arise; compute the profits of the incumbent and demonstrate that the incumbency advantage does not improve much, if at all, when going from a static to a dynamic framework

    A Process-oriented Approach for Migrating Software to Heterogeneous Platforms

    Get PDF
    Context: Heterogeneous computing, i.e., computing performed on processors of different types - such as combination of CPUs and GPUs, or CPUs and FPGAs - has shown to be a feasible path towards higher performance and less energy consumption. However, this approach imposes a number of challenges on the software side that must be addressed in order to achieve the aforementioned advantages.Objective: The objective of this thesis is to improve the process of software deployment on heterogeneous platforms. Through a detailed analysis of the state-of-the-art and state-of-the-practice, we aim to provide a reasoning framework for engineers to migrate software to be executed on such platforms.Method: To achieve our goal, we conducted: (i) a literature review in the form of a systematic mapping study on software deployment on heterogeneous platforms; (ii) a multiple case study in industry that highlights the main challenges and concerns in the state-of-the-practice in the area; and (iii) a study in which we propose and evaluate a decision framework to guide engineers in migrating software for execution on heterogeneous platforms, with a case study in the automotive domain.Results: In the mapping study, we provided a thorough classification of the identified concerns and approaches to deploying software on heterogeneous platforms. Among other findings, we discovered a lack of holistic approaches that include development processes, as well as few validation studies in industrial contexts. In the second study, we discovered and analyzed common practices and challenges that companies face when using heterogeneous platforms. One of such challenges is related to the lack of approaches that cover the software development lifecycle. In the third study, we proposed a decision framework that guides engineers in the process of reasoning for migrating software for execution on heterogeneous platforms. It consists of five stages (assessing, re-architecting, developing, deploying, evaluating), each containing a set of aspects to be addressed through the answers to predefined questions.Conclusions: This thesis addresses a gap that was identified in both theory and practice concerning the lack of holistic approaches to migrate software for execution on heterogeneous platforms. Our proposed approach addresses the problem through systematic guidance for engineers.Future work: In the future, we intend to further refine the proposed framework through case studies in domains other than automotive. We will explore its integration with existing software engineering processes in industrial contexts, performing in-depth analysis of the required adaptations and providing detailed solutions within the stages of the framework

    A C-DAG task model for scheduling complex real-time tasks on heterogeneous platforms: preemption matters

    Full text link
    Recent commercial hardware platforms for embedded real-time systems feature heterogeneous processing units and computing accelerators on the same System-on-Chip. When designing complex real-time application for such architectures, the designer needs to make a number of difficult choices: on which processor should a certain task be implemented? Should a component be implemented in parallel or sequentially? These choices may have a great impact on feasibility, as the difference in the processor internal architectures impact on the tasks' execution time and preemption cost. To help the designer explore the wide space of design choices and tune the scheduling parameters, in this paper we propose a novel real-time application model, called C-DAG, specifically conceived for heterogeneous platforms. A C-DAG allows to specify alternative implementations of the same component of an application for different processing engines to be selected off-line, as well as conditional branches to model if-then-else statements to be selected at run-time. We also propose a schedulability analysis for the C-DAG model and a heuristic allocation algorithm so that all deadlines are respected. Our analysis takes into account the cost of preempting a task, which can be non-negligible on certain processors. We demonstrate the effectiveness of our approach on a large set of synthetic experiments by comparing with state of the art algorithms in the literature
    corecore