84 research outputs found

    A survey of emerging architectural techniques for improving cache energy consumption

    Get PDF
    The search goes on for another ground breaking phenomenon to reduce the ever-increasing disparity between the CPU performance and storage. There are encouraging breakthroughs in enhancing CPU performance through fabrication technologies and changes in chip designs but not as much luck has been struck with regards to the computer storage resulting in material negative system performance. A lot of research effort has been put on finding techniques that can improve the energy efficiency of cache architectures. This work is a survey of energy saving techniques which are grouped on whether they save the dynamic energy, leakage energy or both. Needless to mention, the aim of this work is to compile a quick reference guide of energy saving techniques from 2013 to 2016 for engineers, researchers and students

    Planificación consciente de la contención y gestión de recursos en arquitecturas multicore emergentes

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 14-12-2021Chip multicore processors (CMPs) currently constitute the architecture of choice for mosto general-pùrpose computing systems, and they will likely continue to be dominant in the near future. Advances in technology have enabled to pack an increasing number of cores and bigger caches on the same chip. Nevertheless, contention on shared resources on CMPs -present since the advent of these architectures- still poses a big challenge. Cores in a CMP typically share a last-level cache (LLC) and other memory-related resources with the remaining cores, such as a DRAM controller and an interconnection network. This causes that co-running applications may intensively compete with each other for these shared resources, leading to substantial and uneven performance degradation...Los procesadores multinúcleo o CMPs (Chip Multicore Processors) son actualmente la arquitectura más usada por la mayoría de sistemas de computación de propósito general, y muy probablemente se mantendrían en esa posición dominante en el futuro cercano. Los avances tecnológicos han permitido integrar progresivamente en el mismo chip más cores y aumentar los tamaños de los distintos niveles de cache. No obstante, la contención de recursos compartidos en CMPs {presente desde la aparición de estas arquitecturas{ todavía representa un reto importante que afrontar. Los cores en un CMP comparten en la mayor parte de los diseños una cache de último nivel o LLC (Last-Level Cache) y otros recursos, como el controlador de DRAM o una red de interconexión. La existencia de dichos recursos compartidos provoca en ocasiones que cuando se ejecutan dos o más aplicaciones simultáneamente en el sistema, se produzca una degradación sustancial y potencialmente desigual del rendimiento entre aplicaciones...Fac. de InformáticaTRUEunpu

    A Study of Dynamic Phase Adaptation Using a Dynamic Multicore Processor

    Get PDF

    GPU devices for safety-critical systems: a survey

    Get PDF
    Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft

    Heterogeneity-aware scheduling and data partitioning for system performance acceleration

    Get PDF
    Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity. Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity. This thesis presents approaches to design general heterogeneity-aware software frameworks for system performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster. Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads."This work is supported by St Leonards 7th Century Scholarship and Computer Science PhD funding from University of St Andrews; by UK EPSRC grant Discovery: Pattern Discovery and Program Shaping for Manycore Systems (EP/P020631/1)." -- Acknowledgement

    Hardware-Software Stack for an RC car for testing autonomous driving algorithms

    Get PDF
    In this paper, we report our ongoing work on developing hardware and software support for a toy RC car. This toy car can be used as a platform for evaluating algorithms and accelerators for autonomous driving vehicles (ADVs). We describe different sensors and actuators used and interfacing of them with two processors, viz., Jetson Nano and Raspberry Pi. Where possible, we have used ROS nodes for interfacing. We discuss the advantages and limitations of different sensors and processors and issues related to their compatibility. We include both software (e.g., python code, linux commands) and hardware (e.g., pin configuration) information which will be useful for reproducing the experiments. This paper will be useful for robotics enthusiasts and researchers in the area of autonomous driving

    Optimización de justicia y rendimiento en procesadores multicore asimétricos mediante planificación consciente de la contención

    Get PDF
    Los procesadores multicore asimétricos (AMPs) con repertorio común de instrucciones constituyen una alternativa de mayor eficiencia energética que los multicores simétricos para cargas de trabajo diversas. Los AMPs integran cores rápidos de alto rendimiento, con otros más lentos y de bajo consumo. Se ha demostrado que la planificación a nivel de sistema operativo y consciente de la asimetría es esencial para obtener beneficios significativos en cuanto a rendimiento global y para garantizar justicia en este tipo de sistemas. No obstante, para poder llevar esto a cabo, el planificador ha de estimar de forma precisa el progreso que cada hilo realiza al ejecutarse en los diversos tipos de core durante la ejecución. A pesar de la existencia de planificadores que optimizan la justicia o el rendimiento en AMPs, las propuestas existentes habitualmente dependen de extensiones hardware especiales o de modelos de predicción específicos de plataforma y, además, no tienen en cuenta la degradación del rendimiento asociada a la contención en los recursos compartidos (p.ej., caché compartida o bus de memoria). Esto puede limitar la portabilidad del planificador y producir una degradación significativa de la justicia y del rendimiento global. En este Trabajo de Fin de Máster se ha procedido al diseño e implementación en el kernel Linux de un planificador consciente de la contención en recursos compartidos en AMPs, que está orientado a la optimización de la justicia. Asimismo, el planificador expone un parámetro de configuración que permite mejorar gradualmente el rendimiento global a costa de degradar la justicia. La evaluación experimental del planificador propuesto se ha llevado a cabo utilizando hardware multicore asimétrico real

    Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

    Full text link
    The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

    Ecosystem-Driven Design of In-Home Terminals Based on Open Platform for the

    Get PDF
    Abstract—In-home healthcare services based on the Internet-of-Things (IoT) have great business potentials. To turn it into reality, a business ecosystem should be established first. Technical solutions should therefore aim for a cooperative ecosystem by meeting the interoperability, security, and system integration requirements. In this paper, we propose an ecosystem-driven design strategy and apply it in the design of an open-platform-based in-home healthcare terminal. A cooperative business ecosystem is formulated by merging the traditiona
    corecore