1,330 research outputs found

    Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms

    Get PDF
    Multicore processors are becoming ubiquitous, and it is becoming increasingly common to run multiple real-time systems on a shared multicore platform. While this trend helps to reduce cost and to increase performance, it also makes it more challenging to achieve timing guarantees and functional isolation. One approach to achieving functional isolation is to use virtualization. However, virtualization also introduces many challenges to the multicore timing analysis; for instance, the overhead due to cache misses becomes harder to predict, since it depends not only on the direct interference between tasks but also on the indirect interference between virtual processors and the tasks executing on them. In this paper, we present a cache-aware compositional analysis technique that can be used to ensure timing guarantees of components scheduled on a multicore virtualization platform. Our technique improves on previous multicore compositional analyses by accounting for the cache-related overhead in the components’ interfaces, and it addresses the new virtualization-specific challenges in the overhead analysis. To demonstrate the utility of our technique, we report results from an extensive evaluation based on randomly generated workload

    Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms

    Get PDF
    Multicore processors are becoming ubiquitous, and it is becoming increasingly common to run multiple real-time systems on a shared multicore platform. While this trend helps to reduce cost and to increase performance, it also makes it more challenging to achieve timing guarantees and functional isolation. One approach to achieving functional isolation is to use virtualization. However, virtualization also introduces many challenges to the multicore timing analysis; for instance, the overhead due to cache misses becomes harder to predict, since it depends not only on the direct interference between tasks but also on the indirect interference between virtual processors and the tasks executing on them. In this paper, we present a cache-aware compositional analysis technique that can be used to ensure timing guarantees of components scheduled on a multicore virtualization platform. Our technique improves on previous multicore compositional analyses by accounting for the cache-related overhead in the components’ interfaces, and it addresses the new virtualization-specific challenges in the overhead analysis. To demonstrate the utility of our technique, we report results from an extensive evaluation based on randomly generated workloads

    Cache-aware Interfaces for Compositional Real-Time Systems

    Get PDF
    Interface-based compositional analysis is by now a fairly established area of research in real-time systems. However, current research has not yet fully considered practical aspects, such as the effects of cache interferences on multicore platforms. This position paper discusses the analysis challenges and motivates the need for cache scheduling in this setting, and it highlights several research questions towards cache-aware interfaces for compositional systems on multicore platforms

    Cache-Aware Real-Time Virtualization

    Get PDF
    Virtualization has been adopted in diverse computing environments, ranging from cloud computing to embedded systems. It enables the consolidation of multi-tenant legacy systems onto a multicore processor for Size, Weight, and Power (SWaP) benefits. In order to be adopted in timing-critical systems, virtualization must provide real-time guarantee for tasks and virtual machines (VMs). However, existing virtualization technologies cannot offer such timing guarantee. Tasks in VMs can interfere with each other through shared hardware components. CPU cache, in particular, is a major source of interference that is hard to analyze or manage. In this work, we focus on challenges of the impact of cache-related interferences on the real-time guarantee of virtualization systems. We propose the cache-aware real-time virtualization that provides both system techniques and theoretical analysis for tackling the challenges. We start with the challenge of the private cache overhead and propose the private cache-aware compositional analysis. To tackle the challenge of the shared cache interference, we start with non-virtualization systems and propose a shared cache-aware scheduler for operating systems to co-allocate both CPU and cache resources to tasks and develop the analysis. We then investigate virtualization systems and propose a dynamic cache management framework that hierarchically allocates shared cache to tasks. After that, we further investigate the resource allocation and analysis technique that considers not only cache resource but also CPU and memory bandwidth resources. Our solutions are applicable to commodity hardware and are essential steps to advance virtualization technology into timing-critical systems

    A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

    Full text link
    Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

    Effective data parallel computing on multicore processors

    Get PDF
    The rise of chip multiprocessing or the integration of multiple general purpose processing cores on a single chip (multicores), has impacted all computing platforms including high performance, servers, desktops, mobile, and embedded processors. Programmers can no longer expect continued increases in software performance without developing parallel, memory hierarchy friendly software that can effectively exploit the chip level multiprocessing paradigm of multicores. The goal of this dissertation is to demonstrate a design process for data parallel problems that starts with a sequential algorithm and ends with a high performance implementation on a multicore platform. Our design process combines theoretical algorithm analysis with practical optimization techniques. Our target multicores are quad-core processors from Intel and the eight-SPE IBM Cell B.E. Target applications include Matrix Multiplications (MM), Finite Difference Time Domain (FDTD), LU Decomposition (LUD), and Power Flow Solver based on Gauss-Seidel (PFS-GS) algorithms. These applications are popular computation methods in science and engineering problems and are characterized by unit-stride (MM, LUD, and PFS-GS) or 2-point stencil (FDTD) memory access pattern. The main contributions of this dissertation include a cache- and space-efficient algorithm model, integrated data pre-fetching and caching strategies, and in-core optimization techniques. Our multicore efficient implementations of the above described applications outperform nai¨ve parallel implementations by at least 2x and scales well with problem size and with the number of processing cores

    Cache-aware Parallel Programming for Manycore Processors

    Full text link
    With rapidly evolving technology, multicore and manycore processors have emerged as promising architectures to benefit from increasing transistor numbers. The transition towards these parallel architectures makes today an exciting time to investigate challenges in parallel computing. The TILEPro64 is a manycore accelerator, composed of 64 tiles interconnected via multiple 8x8 mesh networks. It contains per-tile caches and supports cache-coherent shared memory by default. In this paper we present a programming technique to take advantages of distributed caching facilities in manycore processors. However, unlike other work in this area, our approach does not use architecture-specific libraries. Instead, we provide the programmer with a novel technique on how to program future Non-Uniform Cache Architecture (NUCA) manycore systems, bearing in mind their caching organisation. We show that our localised programming approach can result in a significant improvement of the parallelisation efficiency (speed-up).Comment: This work was presented at the international symposium on Highly- Efficient Accelerators and Reconfigurable Technologies (HEART2013), Edinburgh, Scotland, June 13-14, 201

    Improving time predictability of shared hardware resources in real-time multicore systems : emphasis on the space domain

    Get PDF
    Critical Real-Time Embedded Systems (CRTES) follow a verification and validation process on the timing and functional correctness. This process includes the timing analysis that provides Worst-Case Execution Time (WCET) estimates to provide evidence that the execution time of the system, or parts of it, remain within the deadlines. A key design principle for CRTES is the incremental qualification, whereby each software component can be subject to verification and validation independently of any other component, with obvious benefits for cost. At timing level, this requires time composability, such that the timing behavior of a function is not affected by other functions. CRTES are experiencing an unprecedented growth with rising performance demands that have motivated the use of multicore architectures. Multicores can provide the performance required and bring the potential of integrating several software functions onto the same hardware. However, multicore contention in the access to shared hardware resources creates a dependence of the execution time of a task with the rest of the tasks running simultaneously. This dependence threatens time predictability and jeopardizes time composability. In this thesis we analyze and propose hardware solutions to be applied on current multicore designs for CRTES to improve time predictability and time composability, focusing on the on-chip bus and the memory controller. At hardware level, we propose new bus and memory controller designs that control and mitigate contention between different cores and allow to have time composability by design, also in the context of mixed-criticality systems. At analysis level, we propose contention prediction models that factor the impact of contenders and don¿t need modifications to the hardware. We also propose a set of Performance Monitoring Counters (PMC) that provide evidence about the contention. We give an special emphasis on the Space domain focusing on the Cobham Gaisler NGMP multicore processor, which is currently assessed by the European Space Agency for its future missions.Los Sistemas Críticos Empotrados de Tiempo Real (CRTES) siguen un proceso de verificación y validación para su correctitud funcional y temporal. Este proceso incluye el análisis temporal que proporciona estimaciones de el peor caso del tiempo de ejecución (WCET) para dar evidencia de que el tiempo de ejecución del sistema, o partes de él, permanecen dentro de los límites temporales. Un principio de diseño clave para los CRTES es la cualificación incremental, por la que cada componente de software puede ser verificado y validado independientemente del resto de componentes, con beneficios obvios para el coste. A nivel temporal, esto requiere composabilidad temporal, por la que el comportamiento temporal de una función no se ve afectado por otras funciones. CRTES están experimentando un crecimiento sin precedentes con crecientes demandas de rendimiento que han motivado el uso the arquitecturas multi-núcleo (multicore). Los procesadores multi-núcleo pueden proporcionar el rendimiento requerido y tienen el potencial de integrar varias funcionalidades software en el mismo hardware. A pesar de ello, la interferencia entre los diferentes núcleos que aparece en los recursos compartidos de os procesadores multi núcleo crea una dependencia del tiempo de ejecución de una tarea con el resto de tareas ejecutándose simultáneamente en el procesador. Esta dependencia amenaza la predictabilidad temporal y compromete la composabilidad temporal. En esta tésis analizamos y proponemos soluciones hardware para ser aplicadas en los diseños multi núcleo actuales para CRTES que mejoran la predictabilidad y composabilidad temporal, centrándose en el bus y el controlador de memoria internos al chip. A nivel de hardware, proponemos nuevos diseños de buses y controladores de memoria que controlan y mitigan la interferencia entre los diferentes núcleos y permiten tener composabilidad temporal por diseño, también en el contexto de sistemas de criticalidad mixta. A nivel de análisis, proponemos modelos de predicción de la interferencia que factorizan el impacto de los núcleos y no necesitan modificaciones hardware. También proponemos un conjunto de Contadores de Control del Rendimiento (PMC) que proporcionoan evidencia de la interferencia. En esta tésis, damós especial importancia al dominio espacial, centrándonos en el procesador mutli núcleo Cobham Gaisler NGMP, que está siendo actualmente evaluado por la Agencia Espacial Europea para sus futuras misiones
    • …
    corecore