15 research outputs found

    Model-Driven Engineering and Optimizing Compilers: A bridge too far?

    Get PDF
    International audienceA primary goal of Model Driven Engineering (MDE) is to reduce the cost and effort of developing complex software systems using techniques for transforming abstract views of software to concrete implementations. The rich set of tools that have been developed, especially the growing maturity of model transformation technologies, opens the possibility of applying MDE technologies to transformation-based problems in other domains. In this paper, we present our experience with using MDE technologies to build and evolve compiler infrastructures in the optimizing compiler domain.We illustrate, through our two ongoing research compiler projects for C and a functional language, the challenging aspects of optimizing compiler research and show how mature MDE technologies can be used to address them.We also identify some of the pitfalls that arise from unrealistic expectations of what can be accomplished using MDE and discuss how they can lead to unsuccessful and frustrating application of MDE technologies

    Cognitive Radio Programming: Existing Solutions and Open Issues

    Get PDF
    Software defined radio (sdr) technology has evolved rapidly and is now reaching market maturity, providing solutions for cognitive radio applications. Still, a lot of issues have yet to be studied. In this paper, we highlight the constraints imposed by recent radio protocols and we present current architectures and solutions for programming sdr. We also list the challenges to overcome in order to reach mastery of future cognitive radios systems.La radio logicielle a évolué rapidement pour atteindre la maturité nécessaire pour être mise sur le marché, offrant de nouvelles solutions pour les applications de radio cognitive. Cependant, beaucoup de problèmes restent à étudier. Dans ce papier, nous présentons les contraintes imposées par les nouveaux protocoles radios, les architectures matérielles existantes ainsi que les solutions pour les programmer. De plus, nous listons les difficultés à surmonter pour maitriser les futurs systèmes de radio cognitive

    RISPP: A Run-time Adaptive Reconfigurable Embedded Processor

    Get PDF
    This Ph.D. thesis describes a new approach for adaptive processors using a reconfigurable fabric (embedded FPGA) to implement application-specific accelerators. A novel modular Special Instruction composition is presented along with a run-time system that exploits the provided adaptivity. The approach was simulated and prototyped using and FPGA. Comparisons with state-of-the-art appl.-specific and reconf. processors demonstrate significant improvements according the performance and efficiency

    The co-design methodologies on click router application system

    Full text link
    Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal

    Compilation Techniques for High-Performance Embedded Systems with Multiple Processors

    Get PDF
    Institute for Computing Systems ArchitectureDespite the progress made in developing more advanced compilers for embedded systems, programming of embedded high-performance computing systems based on Digital Signal Processors (DSPs) is still a highly skilled manual task. This is true for single-processor systems, and even more for embedded systems based on multiple DSPs. Compilers often fail to optimise existing DSP codes written in C due to the employed programming style. Parallelisation is hampered by the complex multiple address space memory architecture, which can be found in most commercial multi-DSP configurations. This thesis develops an integrated optimisation and parallelisation strategy that can deal with low-level C codes and produces optimised parallel code for a homogeneous multi-DSP architecture with distributed physical memory and multiple logical address spaces. In a first step, low-level programming idioms are identified and recovered. This enables the application of high-level code and data transformations well-known in the field of scientific computing. Iterative feedback-driven search for “good” transformation sequences is being investigated. A novel approach to parallelisation based on a unified data and loop transformation framework is presented and evaluated. Performance optimisation is achieved through exploitation of data locality on the one hand, and utilisation of DSP-specific architectural features such as Direct Memory Access (DMA) transfers on the other hand. The proposed methodology is evaluated against two benchmark suites (DSPstone & UTDSP) and four different high-performance DSPs, one of which is part of a commercial four processor multi-DSP board also used for evaluation. Experiments confirm the effectiveness of the program recovery techniques as enablers of high-level transformations and automatic parallelisation. Source-to-source transformations of DSP codes yield an average speedup of 2.21 across four different DSP architectures. The parallelisation scheme is – in conjunction with a set of locality optimisations – able to produce linear and even super-linear speedups on a number of relevant DSP kernels and applications

    NASA STI Program Coordinating Council Twelfth Meeting: Standards

    Get PDF
    The theme of this NASA Scientific and Technical Information Program Coordinating Council Meeting was standards and their formation and application. Topics covered included scientific and technical information architecture, the Open Systems Interconnection Transmission Control Protocol/Internet Protocol, Machine-Readable Cataloging (MARC) open system environment procurement, and the Government Information Locator Service

    System Synthesis for Embedded Multiprocessors

    Get PDF
    Modern embedded systems must increasingly accommodate dynamically changing operating environments, high computational requirements, and tight time-to-market windows. Such trends and the ever-increasing design complexity of embedded systems have challenged designers to raise the level of abstraction and replace traditional ad-hoc approaches with more efficient synthesis techniques. Additionally, since embedded multiprocessor systems are typically designed as final implementations for dedicated functions, modifications to embedded system implementations are rare, and this allows embedded system designers to spend significantly larger amounts of time to optimize the architecture and the employed software. This dissertation presents several system-level synthesis algorithms that employ time-intensive optimization techniques that allow the designer to explore a significantly larger part of the design space. It looks at critical issues that are at the core of the synthesis process --- selecting the architecture, partitioning the functionality over the components of the architecture, and scheduling activities such that design constraints and optimization objectives are satisfied. More specifically for the scheduling step, a new solution to the two-step multiprocessor scheduling problem is proposed. For the first step of clustering a highly efficient genetic algorithm is proposed. Several techniques for the second step of merging are proposed and finally a complete two-step effective solution is presented. Also, a randomization technique is applied to existing deterministic techniques to extend these techniques so that they can utilize arbitrary increases in available optimization time. This novel framework for extending deterministic algorithms in our context allows for accurate and fair comparison of our techniques against the state of the art. To further generalize the proposed clustering-based scheduling approach, a complementary two-step multiprocessor scheduling approach for heterogeneous multiprocessor systems is presented. This work is amongst the first works that formally studies the application of clustering to heterogeneous system scheduling. Several techniques are proposed and compared and conclusive results are presented. A modular system-level synthesis framework is then proposed. It synthesizes multi-mode, multi-task embedded systems under a number of hard constraints; optimizes a comprehensive set of objectives; and provides a set of alternative trade-off points in a given multi-objective design evaluation space. An extension of the framework is proposed to better address DVS, memory optimization, and efficient mappings onto dynamically reconfigurable hardware. An integrated framework for energy-driven scheduling onto embedded multiprocessor systems is proposed. It employs a solution representation that encodes both task assignment and ordering into a single chromosome and hence significantly reduces the search space and problem complexity. It is shown that a task assignment and scheduling that result in better performance do not necessarily save power, and hence, integrating task scheduling and voltage scheduling is crucial for fully exploiting the energy-saving potential of an embedded multiprocessor implementation

    Increasing the efficacy of automated instruction set extension

    Get PDF
    The use of Instruction Set Extension (ISE) in customising embedded processors for a specific application has been studied extensively in recent years. The addition of a set of complex arithmetic instructions to a baseline core has proven to be a cost-effective means of meeting design performance requirements. This thesis proposes and evaluates a reconfigurable ISE implementation called “Configurable Flow Accelerators” (CFAs), a number of refinements to an existing Automated ISE (AISE) algorithm called “ISEGEN”, and the effects of source form on AISE. The CFA is demonstrated repeatedly to be a cost-effective design for ISE implementation. A temporal partitioning algorithm called “staggering” is proposed and demonstrated on average to reduce the area of CFA implementation by 37% for only an 8% reduction in acceleration. This thesis then turns to concerns within the ISEGEN AISE algorithm. A methodology for finding a good static heuristic weighting vector for ISEGEN is proposed and demonstrated. Up to 100% of merit is shown to be lost or gained through the choice of vector. ISEGEN early-termination is introduced and shown to improve the runtime of the algorithm by up to 7.26x, and 5.82x on average. An extension to the ISEGEN heuristic to account for pipelining is proposed and evaluated, increasing acceleration by up to an additional 1.5x. An energyaware heuristic is added to ISEGEN, which reduces the energy used by a CFA implementation of a set of ISEs by an average of 1.6x, up to 3.6x. This result directly contradicts the frequently espoused notion that “bigger is better” in ISE. The last stretch of work in this thesis is concerned with source-level transformation: the effect of changing the representation of the application on the quality of the combined hardwaresoftware solution. A methodology for combined exploration of source transformation and ISE is presented, and demonstrated to improve the acceleration of the result by an average of 35% versus ISE alone. Floating point is demonstrated to perform worse than fixed point, for all design concerns and applications studied here, regardless of ISEs employed

    Reconfigurable Instruction Cell Architecture Reconfiguration and Interconnects

    Get PDF
    corecore