24,034 research outputs found

    Retargetable Compilers for Embedded DSPs

    Get PDF
    Programmable devices are a key technology for the design of embedded systems, such as in the consumer electronics market. Processor cores are used as building blocks for more and more embedded system designs, since they provide a unique combination of features: flexibility and reusability. Processor-based design implies that compilers capable of generating efficient machine code are necessary. However, highly efficient compilers for embedded processors are hardly available. In particular, this holds for digital signal processors (DSPs). This contribution is intended to outline different aspects of DSP compiler technology. First, we cover demands on compilers for embedded DSPs, which are partially in sharp contrast to traditional compiler construction. Secondly, we present recent advances in DSP code optimization techniques, which explore a comparatively large search space in order to achieve high code quality. Finally, we discuss the different approaches to retargetability of compilers, that is, techniques for automatic generation of compilers from processor models

    Run-time implementation issues for real-time embedded Ada

    Get PDF
    A motivating factor in the development of Ada as the department of defense standard language was the high cost of embedded system software development. It was with embedded system requirements in mind that many of the features of the language were incorporated. Yet it is the designers of embedded systems that seem to comprise the majority of the Ada community dissatisfied with the language. There are a variety of reasons for this dissatisfaction, but many seem to be related in some way to the Ada run-time support system. Some of the areas in which the inconsistencies were found to have the greatest impact on performance from the standpoint of real-time systems are presented. In particular, a large part of the duties of the tasking supervisor are subject to the design decisions of the implementer. These include scheduling, rendezvous, delay processing, and task activation and termination. Some of the more general issues presented include time and space efficiencies, generic expansions, memory management, pragmas, and tracing features. As validated compilers become available for bare computer targets, it is important for a designer to be aware that, at least for many real-time issues, all validated Ada compilers are not created equal

    Storage constraint satisfaction for embedded processor compilers

    Get PDF
    Increasing interest in the high-volume high-performance embedded processor market motivates the stand-alone processor world to consider issues like design flexibility (synthesizable processor core), energy consumption, and silicon efficiency. Implications for embedded processor architectures and compilers are the exploitation of hardware acceleration, instruction-level parallelism (ILP), and distributed storage files. In that scope, VLIW architectures have been acclaimed for their parallelism in the architecture while orthogonality of the associated instruction sets is maintained. Code generation methods for such processors will be pressured towards an efficient use of scarce resources while satisfying tight real-time constraints imposed by DSP and multimedia applications. Limited storage (e.g. registers) availability poses a problem for traditional methods that perform code generation in separate stages, e.g. operation scheduling followed by register allocation. This is because the objectives of scheduling and register allocation cause conflicts in code generation in several ways. Firstly, register reuse can create dependencies that did not exist in the original code, but can also save spilling values to memory. Secondly, while a particular ordering of instructions may increase the potential for ILP, the reordering due to instruction scheduling may also extend the lifetime of certain values, which can increase the register requirement. Furthermore, the instruction scheduler requires an adequate number of local registers to avoid register reuse (since reuse limits the opportunity for ILP), while the register allocator would prefer sufficient global registers in order to avoid spills. Finally, an effective scheduler can lose its achieved degree of instruction-level parallelism when spill code is inserted afterwards. Without any communication of information and cooperation between scheduling and storage allocation phases, the compiler writer faces the problem of determining which of these phases should run first to generate the most efficient final code. The lack of communication and cooperation between the instruction scheduling and storage allocation can result in code that contains excess of register spills and/or lower degree of ILP than actually achievable. This problem called phase coupling cannot be ignored when constraints are tight and efficient solutions are desired. Traditional methods that perform code generation in separate stages are often not able to find an efficient or even a feasible solution. Therefore, those methods need an increasing amount of help from the programmer (or designer) to arrive at a feasible solution. Because this requires an excessive amount of design time and extensive knowledge of the processor architecture, there is a need for automated techniques that can cope with the different kinds of constraints during scheduling. This thesis proposes an approach for instruction scheduling and storage allocation that makes an extensive use of timing, resource and storage constraints to prune the search space for scheduling. The method in this approach supports VLIW architectures with (distributed) storage files containing random-access registers, rotating registers to exploit the available ILP in loops, stacks or fifos to exploit larger storage capacities with lower addressing costs. Potential access conflicts between values are analyzed before and during scheduling, according to the type of storage they are assigned to. Using constraint analysis techniques and properties of colored conflict graphs essential information is obtained to identify the bottlenecks for satisfying the storage file constraints. To reduce the identified bottlenecks, this method performs partial scheduling by ordering value accesses such that to allow a better reuse of storage. Without enforcing any specific storage assignment of values, the method continues until it can guarantee that any completion of the partial schedule will also result in a feasible storage allocation. Therefore, the scheduling freedom is exploited for satisfaction of storage, resource, and timing constraints in one phase

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption

    Get PDF
    This paper presents the interesting observation that by performing fewer of the optimizations available in a standard compiler optimization level such as -O2, while preserving their original ordering, significant savings can be achieved in both execution time and energy consumption. This observation has been validated on two embedded processors, namely the ARM Cortex-M0 and the ARM Cortex-M3, using two different versions of the LLVM compilation framework; v3.8 and v5.0. Experimental evaluation with 71 embedded benchmarks demonstrated performance gains for at least half of the benchmarks for both processors. An average execution time reduction of 2.4% and 5.3% was achieved across all the benchmarks for the Cortex-M0 and Cortex-M3 processors, respectively, with execution time improvements ranging from 1% up to 90% over the -O2. The savings that can be achieved are in the same range as what can be achieved by the state-of-the-art compilation approaches that use iterative compilation or machine learning to select flags or to determine phase orderings that result in more efficient code. In contrast to these time consuming and expensive to apply techniques, our approach only needs to test a limited number of optimization configurations, less than 64, to obtain similar or even better savings. Furthermore, our approach can support multi-criteria optimization as it targets execution time, energy consumption and code size at the same time.Comment: 15 pages, 3 figures, 71 benchmarks used for evaluatio
    • …
    corecore