2,112 research outputs found

    Using data compression for increasing memory system utilization

    Get PDF
    Cataloged from PDF version of article.The memory system presents one of the critical challenges in embedded system design and optimization. This is mainly due to the ever-increasing code complexity of embedded applications and the exponential increase seen in the amount of data they manipulate. The memory bottleneck is even more important for multiprocessor-system-on-a-chip (MPSoC) architectures due to the high cost of off-chip memory accesses in terms of both energy and performance. As a result, reducing the memory-space occupancy of embedded applications is very important and will be even more important in the next decade. While it is true that the on-chip memory capacity of embedded systems is continuously increasing, the increases in the complexity of embedded applications and the sizes of the data sets they process are far greater. Motivated by this observation, this paper presents and evaluates a compiler-driven approach to data compression for reducing memory-space occupancy. Our goal is to study how automated compiler support can help in deciding the set of data elements to compress/decompress and the points during execution at which these compressions/decompressions should be performed. We first study this problem in the context of single-core systems and then extend it to MPSoCs where we schedule compressions and decompressions intelligently such that they do not conflict with application execution as much as possible. Particularly, in MPSoCs, one needs to decide which processors should participate in the compression and decompression activities at any given point during the course of execution. We propose both static and dynamic algorithms for this purpose. In the static scheme, the processors are divided into two groups: those performing compression/decompression and those executing the application, and this grouping is maintained throughout the execution of the application. In the dynamic scheme, on the other hand, the execution starts with some grouping but this grouping can change during the course of execution, depending on the dynamic variations in the data access pattern. Our experimental results show that, in a single-core system, the proposed approach reduces maximum memory occupancy by 47.9% and average memory occupancy by 48.3% when averaged over all the benchmarks. Our results also indicate that, in an MPSoC, the average energy saving is 12.7% when all eight benchmarks are considered. While compressions and decompressions and related bookkeeping activities take extra cycles and memory space and consume additional energy, we found that the improvements they bring from the memory space, execution cycles, and energy perspectives are much higher than these overheads

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    Performance and Memory Space Optimizations for Embedded Systems

    Get PDF
    Embedded systems have three common principles: real-time performance, low power consumption, and low price (limited hardware). Embedded computers use chip multiprocessors (CMPs) to meet these expectations. However, one of the major problems is lack of efficient software support for CMPs; in particular, automated code parallelizers are needed. The aim of this study is to explore various ways to increase performance, as well as reducing resource usage and energy consumption for embedded systems. We use code restructuring, loop scheduling, data transformation, code and data placement, and scratch-pad memory (SPM) management as our tools in different embedded system scenarios. The majority of our work is focused on loop scheduling. Main contributions of our work are: We propose a memory saving strategy that exploits the value locality in array data by storing arrays in a compressed form. Based on the compressed forms of the input arrays, our approach automatically determines the compressed forms of the output arrays and also automatically restructures the code. We propose and evaluate a compiler-directed code scheduling scheme, which considers both parallelism and data locality. It analyzes the code using a locality parallelism graph representation, and assigns the nodes of this graph to processors.We also introduce an Integer Linear Programming based formulation of the scheduling problem. We propose a compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. The method is to distribute loop iterations across parallel processors in an SPM-conscious manner. The compiler identifies potential SPM hits and misses, and distributes loop iterations such that the processors have close execution times. We present an SPM management technique using Markov chain based data access. We propose a compiler directed integrated code and data placement scheme for 2-D mesh based CMP architectures. Using a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data, it assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. We present a memory bank aware dynamic loop scheduling scheme for array intensive applications.The goal is to minimize the number of memory banks needed for executing the group of loop iterations

    On the side-effects of code abstraction

    Get PDF

    Reductie van het geheugengebruik van besturingssysteemkernen Memory Footprint Reduction for Operating System Kernels

    Get PDF
    In ingebedde systemen is er vaak maar een beperkte hoeveelheid geheugen beschikbaar. Daarom wordt er veel aandacht besteed aan het produceren van compacte programma's voor deze systemen, en zijn er allerhande technieken ontwikkeld die automatisch het geheugengebruik van programma's kunnen verkleinen. Tot nu toe richtten die technieken zich voornamelijk op de toepassingssoftware die op het systeem draait, en werd het besturingssysteem over het hoofd gezien. In dit proefschrift worden een aantal technieken beschreven die het mogelijk maken om op een geautomatiseerde manier het geheugengebruik van een besturingssysteemkern gevoelig te verkleinen. Daarbij wordt in eerste instantie gebruik gemaakt van compactietransformaties tijdens het linken. Als we de hardware en software waaruit het systeem samengesteld is kennen, is het mogelijk om nog verdere reducties te bekomen. Daartoe wordt de kern gespecialiseerd voor een bepaalde hardware-software combinatie. Overbodige functionaliteit wordt opgespoord en uit de kern verwijderd, terwijl de resterende functionaliteit wordt aangepast aan de specifieke gebruikspatronen die uit de hardware en software kunnen afgeleid worden. Als laatste worden technieken voorgesteld die het mogelijk maken om weinig of niet uitgevoerde code (bijvoorbeeld code voor het afhandelen van slechts zeldzaam optredende foutcondities) uit het geheugen te verwijderen. Deze code wordt dan enkel ingeladen op het moment dat ze effectief nodig is. Voor ons testsysteem kunnen we met de gecombineerde technieken het geheugengebruik van een Linux 2.4 kern met meer dan 48% verminderen

    Trends in hardware architecture for mobile devices

    Get PDF
    In the last ten years, two main factors have fueled the steady growth in sales of mobile computing and communication devices: a) the reduction of the footprint of the devices themselves, such as cellular handsets and small computers; and b) the success in developing low-power hardware which allows the devices to operate autonomously for hours or even days. In this review, I show that the first generation of mobile devices was DSP centric – that is, its architecture was based in fast processing of digitized signals using low- power, yet numerically powerful DSPs. However, the next generation of mobile devices will be built around DSPs and low power microprocessor cores for general processing applications. Mobile devices will become data-centric. The main challenge for designers of such hybrid architectures is to increase the computational performance of the computing unit, while keeping power constant, or even reducing it. This report shows that low-power mobile hardware architectures design goes hand in hand with advances in compiling techniques. We look at the synergy between hardware and software, and show that a good balance between both can lead to innovative lowpower processor architectures

    Dynamic Binary Translation for Embedded Systems with Scratchpad Memory

    Get PDF
    Embedded software development has recently changed with advances in computing. Rather than fully co-designing software and hardware to perform a relatively simple task, nowadays embedded and mobile devices are designed as a platform where multiple applications can be run, new applications can be added, and existing applications can be updated. In this scenario, traditional constraints in embedded systems design (i.e., performance, memory and energy consumption and real-time guarantees) are more difficult to address. New concerns (e.g., security) have become important and increase software complexity as well. In general-purpose systems, Dynamic Binary Translation (DBT) has been used to address these issues with services such as Just-In-Time (JIT) compilation, dynamic optimization, virtualization, power management and code security. In embedded systems, however, DBT is not usually employed due to performance, memory and power overhead. This dissertation presents StrataX, a low-overhead DBT framework for embedded systems. StrataX addresses the challenges faced by DBT in embedded systems using novel techniques. To reduce DBT overhead, StrataX loads code from NAND-Flash storage and translates it into a Scratchpad Memory (SPM), a software-managed on-chip SRAM with limited capacity. SPM has similar access latency as a hardware cache, but consumes less power and chip area. StrataX manages SPM as a software instruction cache, and employs victim compression and pinning to reduce retranslation cost and capture frequently executed code in the SPM. To prevent performance loss due to excessive code expansion, StrataX minimizes the amount of code inserted by DBT to maintain control of program execution. When a hardware instruction cache is available, StrataX dynamically partitions translated code among the SPM and main memory. With these techniques, StrataX has low performance overhead relative to native execution for MiBench programs. Further, it simplifies embedded software and hardware design by operating transparently to applications without any special hardware support. StrataX achieves sufficiently low overhead to make it feasible to use DBT in embedded systems to address important design goals and requirements
    • …
    corecore