396 research outputs found

    Scheduling, Binding and Routing System for a Run-Time Reconfigurable Operator Based Multimedia Architecture

    Get PDF
    International audienceThis article presents an integrated environment for application scheduling, binding and routing used for the run-time reconfigurable, operator based, ROMA multimedia architecture. The environment is very flexible and after a minor modification can support other reconfigurable architectures. Currently, it supports the architecture model composed of a bank of single (double) port memories, two communication networks (with different topologies) and a set of run-time functionally reconfigurable non-pipelined and pipelined operators. The main novelty of this work is simultaneous solving of the scheduling, binding and routing tasks. This frequently generates optimal results, which has been shown by extensive experiments using the constraint programming paradigm. In order to show flexibility of our environment, we have used it in this article for optimization of application scheduling, binding and routing (the case of the non-pipelined execution model) and for space exploration (case of the pipelined execution model)

    Cognitive Radio Programming: Existing Solutions and Open Issues

    Get PDF
    Software defined radio (sdr) technology has evolved rapidly and is now reaching market maturity, providing solutions for cognitive radio applications. Still, a lot of issues have yet to be studied. In this paper, we highlight the constraints imposed by recent radio protocols and we present current architectures and solutions for programming sdr. We also list the challenges to overcome in order to reach mastery of future cognitive radios systems.La radio logicielle a évolué rapidement pour atteindre la maturité nécessaire pour être mise sur le marché, offrant de nouvelles solutions pour les applications de radio cognitive. Cependant, beaucoup de problèmes restent à étudier. Dans ce papier, nous présentons les contraintes imposées par les nouveaux protocoles radios, les architectures matérielles existantes ainsi que les solutions pour les programmer. De plus, nous listons les difficultés à surmonter pour maitriser les futurs systèmes de radio cognitive

    Extensible microprocessor without interlocked pipeline stages (emips), the reconfigurable microprocessor

    Get PDF
    In this thesis we propose to realize the performance benefits of applicationspecific hardware optimizations in a general-purpose, multi-user system environment using a dynamically extensible microprocessor architecture. We have called our dynamically extensible microprocessor design the Extensible Microprocessor without Interlocked Pipeline Stages, or eMIPS. The eMIPS architecture uses the interaction of fixed and configurable logic available in modern Field Programmable Gate Array (FPGA). This interaction is used to address the limitations of current microprocessor architectures based solely on Application Specific Integrated Circuits (ASIC). These limitations include inflexibility, size, and application specific performance optimization. The eMIPS system allows multiple secure extensions to load dynamically and to plug into the stages of a pipelined central processing unit (CPU) data path, thereby extending the core instruction set of the microprocessor. Extensions can also be used to realize on-chip peripherals, and if area permits, even multiple cores. Extension instructions reduce dramatically the execution time of frequently executed instruction patterns. These new functionalities we have developed can be exploited by patching the binaries of existing applications, without any changes to the compilers. A FPGA based workstation prototype and a flexible simulation system implementating this design demonstrates speedups of 2x-3x on a set of applications that include video games, real-time programs and the SPEC2000 integer benchmarks. eMIPS is the first realized workstation based entirely on a dynamically extensible microprocessor that is safe for general purpose, multi-user applications. By exposing the individual stages of the data path, eMIPS allows optimizations not previously possible. This includes permitting safe and coherent accesses to memory from within an extension, optimizing multi-branched blocks, and throwing precise and restart able exceptions from within an extension. This work describes a simplified implementation of an extensible microprocessor architecture based on the Microprocessor without Interlocked Pipeline Stages (MIPS) Reduced Instruction Set Computer (RISC) architecture. The concepts and methods contained within this thesis may be applied to other similar architectures. Given this simplified prototype we look forward to propose how this architecture will be expanded as it matures

    Investigating Single Precision Floating General Matrix Multiply in Heterogeneous Hardware

    Get PDF
    The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect new designs for reconfigurable hardware using C/C++. Using the HARPv2 as a vehicle for exploration, we investigate the utility of several of the most notable matrix multiplication optimizations to better understand the performance portability of OpenCL and the implications for such optimizations on this and future heterogeneous architectures. Our results give targeted insights into the applicability of best practices that were for existing architectures when used on emerging heterogeneous systems

    Instruction-set architecture synthesis for VLIW processors

    Get PDF

    Performance analysis of massively parallel embedded hardware architectures for retinal image processing

    Get PDF
    This paper examines the implementation of a retinal vessel tree extraction technique on different hardware platforms and architectures. Retinal vessel tree extraction is a representative application of those found in the domain of medical image processing. The low signal-to-noise ratio of the images leads to a large amount of low-level tasks in order to meet the accuracy requirements. In some applications, this might compromise computing speed. This paper is focused on the assessment of the performance of a retinal vessel tree extraction method on different hardware platforms. In particular, the retinal vessel tree extraction method is mapped onto a massively parallel SIMD (MP-SIMD) chip, a massively parallel processor array (MPPA) and onto an field-programmable gate arrays (FPGA)This work is funded by Xunta de Galicia under the projects 10PXIB206168PR and 10PXIB206037PR and the program Maria BarbeitoS

    A Modular Approach to Adaptive Reactive Streaming Systems

    Get PDF
    The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms
    corecore