1,101 research outputs found

    ENERGY-AWARE OPTIMIZATION FOR EMBEDDED SYSTEMS WITH CHIP MULTIPROCESSOR AND PHASE-CHANGE MEMORY

    Get PDF
    Over the last two decades, functions of the embedded systems have evolved from simple real-time control and monitoring to more complicated services. Embedded systems equipped with powerful chips can provide the performance that computationally demanding information processing applications need. However, due to the power issue, the easy way to gain increasing performance by scaling up chip frequencies is no longer feasible. Recently, low-power architecture designs have been the main trend in embedded system designs. In this dissertation, we present our approaches to attack the energy-related issues in embedded system designs, such as thermal issues in the 3D chip multiprocessor (CMP), the endurance issue in the phase-change memory(PCM), the battery issue in the embedded system designs, the impact of inaccurate information in embedded system, and the cloud computing to move the workload to remote cloud computing facilities. We propose a real-time constrained task scheduling method to reduce peak temperature on a 3D CMP, including an online 3D CMP temperature prediction model and a set of algorithm for scheduling tasks to different cores in order to minimize the peak temperature on chip. To address the challenging issues in applying PCM in embedded systems, we propose a PCM main memory optimization mechanism through the utilization of the scratch pad memory (SPM). Furthermore, we propose an MLC/SLC configuration optimization algorithm to enhance the efficiency of the hybrid DRAM + PCM memory. We also propose an energy-aware task scheduling algorithm for parallel computing in mobile systems powered by batteries. When scheduling tasks in embedded systems, we make the scheduling decisions based on information, such as estimated execution time of tasks. Therefore, we design an evaluation method for impacts of inaccurate information on the resource allocation in embedded systems. Finally, in order to move workload from embedded systems to remote cloud computing facility, we present a resource optimization mechanism in heterogeneous federated multi-cloud systems. And we also propose two online dynamic algorithms for resource allocation and task scheduling. We consider the resource contention in the task scheduling

    Performance and Memory Space Optimizations for Embedded Systems

    Get PDF
    Embedded systems have three common principles: real-time performance, low power consumption, and low price (limited hardware). Embedded computers use chip multiprocessors (CMPs) to meet these expectations. However, one of the major problems is lack of efficient software support for CMPs; in particular, automated code parallelizers are needed. The aim of this study is to explore various ways to increase performance, as well as reducing resource usage and energy consumption for embedded systems. We use code restructuring, loop scheduling, data transformation, code and data placement, and scratch-pad memory (SPM) management as our tools in different embedded system scenarios. The majority of our work is focused on loop scheduling. Main contributions of our work are: We propose a memory saving strategy that exploits the value locality in array data by storing arrays in a compressed form. Based on the compressed forms of the input arrays, our approach automatically determines the compressed forms of the output arrays and also automatically restructures the code. We propose and evaluate a compiler-directed code scheduling scheme, which considers both parallelism and data locality. It analyzes the code using a locality parallelism graph representation, and assigns the nodes of this graph to processors.We also introduce an Integer Linear Programming based formulation of the scheduling problem. We propose a compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. The method is to distribute loop iterations across parallel processors in an SPM-conscious manner. The compiler identifies potential SPM hits and misses, and distributes loop iterations such that the processors have close execution times. We present an SPM management technique using Markov chain based data access. We propose a compiler directed integrated code and data placement scheme for 2-D mesh based CMP architectures. Using a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data, it assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. We present a memory bank aware dynamic loop scheduling scheme for array intensive applications.The goal is to minimize the number of memory banks needed for executing the group of loop iterations

    Energy Saving Techniques for Phase Change Memory (PCM)

    Full text link
    In recent years, the energy consumption of computing systems has increased and a large fraction of this energy is consumed in main memory. Towards this, researchers have proposed use of non-volatile memory, such as phase change memory (PCM), which has low read latency and power; and nearly zero leakage power. However, the write latency and power of PCM are very high and this, along with limited write endurance of PCM present significant challenges in enabling wide-spread adoption of PCM. To address this, several architecture-level techniques have been proposed. In this report, we review several techniques to manage power consumption of PCM. We also classify these techniques based on their characteristics to provide insights into them. The aim of this work is encourage researchers to propose even better techniques for improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM

    A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules

    Full text link
    System-level emulators have been used extensively for system design, debugging and evaluation. They work by providing a system-level virtual machine to support a guest operating system (OS) running on a platform with the same or different native OS that uses the same or different instruction-set architecture. For such system-level emulation, dynamic binary translation (DBT) is one of the core technologies. A recently proposed learning-based DBT approach has shown a significantly improved performance with a higher quality of translated code using automatically learned translation rules. However, it has only been applied to user-level emulation, and not yet to system-level emulation. In this paper, we explore the feasibility of applying this approach to improve system-level emulation, and use QEMU to build a prototype. ... To achieve better performance, we leverage several optimizations that include coordination overhead reduction to reduce the overhead of each coordination, and coordination elimination and code scheduling to reduce the coordination frequency. Experimental results show that it can achieve an average of 1.36X speedup over QEMU 6.1 with negligible coordination overhead in the system emulation mode using SPEC CINT2006 as application benchmarks and 1.15X on real-world applications.Comment: 10 pages, 19 figures, to be published in International Symposium on Code Generation and Optimization (CGO) 202

    A Survey on Thread-Level Speculation Techniques

    Get PDF
    Producción CientíficaThread-Level Speculation (TLS) is a promising technique that allows the parallel execution of sequential code without relying on a prior, compile-time-dependence analysis. In this work, we introduce the technique, present a taxonomy of TLS solutions, and summarize and put into perspective the most relevant advances in this field.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)

    Towards a Time-predictable Dual-Issue Microprocessor: The Patmos Approach

    Get PDF
    Current processors are optimized for average case performance, often leading to a high worst-case execution time (WCET). Many architectural features that increase the average case performance are hard to be modeled for the WCET analysis. In this paper we present Patmos, a processor optimized for low WCET bounds rather than high average case performance. Patmos is a dual-issue, statically scheduled RISC processor. The instruction cache is organized as a method cache and the data cache is organized as a split cache in order to simplify the cache WCET analysis. To fill the dual-issue pipeline with enough useful instructions, Patmos relies on a customized compiler. The compiler also plays a central role in optimizing the application for the WCET instead of average case performance

    Modeling and scheduling heterogeneous multi-core architectures

    Get PDF
    Om de prestatie van toekomstige processors en processorarchitecturen te evalueren wordt vaak gebruik gemaakt van een simulator die het gedrag en de prestatie van de processor modelleert. De prestatie bepalen van de uitvoering van een computerprogramma op een gegeven processorarchitectuur m.b.v. een simulator duurt echter vele grootteordes langer dan de werkelijke uitvoeringstijd. Dit beperkt in belangrijke mate de hoeveelheid experimenten die gedaan kunnen worden. In dit doctoraatswerk werd het Multi-Program Performance Model (MPPM) ontwikkeld, een innovatief alternatief voor traditionele simulatie, dat het mogelijk maakt om tot 100.000x sneller een processorconfiguratie te evalueren. MPPM laat ons toe om nooit geziene exploraties te doen. Gebruik makend van dit raamwerk hebben we aangetoond dat de taakplanning cruciaal is om heterogene meerkernige processors optimaal te benutten. Vervolgens werd een nieuwe manier voorgesteld om op een schaalbare manier de taakplanning uit te voeren, namelijk Performance Impact Estimation (PIE). Tijdens de uitvoering van een draad op een gegeven processorkern schatten we de prestatie op een ander type kern op basis van eenvoudig op te meten prestatiemetrieken. Zo beschikken we op elk moment over alle nodige informatie om een efficiënte taakplanning te doen. Dit laat ons bovendien toe te optimaliseren voor verschillende criteria zoals uitvoeringstijd, doorvoersnelheid of fairness
    corecore