1,151 research outputs found

    A Micro Power Hardware Fabric for Embedded Computing

    Get PDF
    Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

    Memory Hierarchy Hardware-Software Co-design in Embedded Systems

    Get PDF
    The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc. Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed. In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.Singapore-MIT Alliance (SMA

    EPICURE: A partitioning and co-design framework for reconfigurable computing

    Get PDF
    This paper presents a new design methodology able to bridge the gap between an abstract specification and a heterogeneous reconfigurable architecture. The EPICURE contribution is the result of a joint study on abstraction/refinement methods and a smart reconfigurable architecture within the formal Esterel design tools suite. The original points of this work are: (i) a generic HW/SW interface model, (ii) a specification methodology that handles the control, and includes efficient verification and HW/SW synthesis capabilities, (iii) a method for parallelism exploration based on abstract resources/performance estimation expressed in terms of area/delay tradeoffs, (iv) a HW/SW partitioning approach that refines the specification into explicit HW configurations and the associated SW control. The EPICURE framework shows how a cooperation of complementary methodologies and CAD tools associated with a relevant architecture can signficantly improve the designer productivity, especially in the context of reconfigurable architectures

    High-level Modelling and Exploration of Coarse-grained Re-configurable Architectures

    Full text link

    Design and resource management of reconfigurable multiprocessors for data-parallel applications

    Get PDF
    FPGA (Field-Programmable Gate Array)-based custom reconfigurable computing machines have established themselves as low-cost and low-risk alternatives to ASIC (Application-Specific Integrated Circuit) implementations and general-purpose microprocessors in accelerating a wide range of computation-intensive applications. Most often they are Application Specific Programmable Circuiits (ASPCs), which are developer programmable instead of user programmable. The major disadvantages of ASPCs are minimal programmability, and significant time and energy overheads caused by required hardware reconfiguration when the problem size outnumbers the available reconfigurable resources; these problems are expected to become more serious with increases in the FPGA chip size. On the other hand, dominant high-performance computing systems, such as PC clusters and SMPs (Symmetric Multiprocessors), suffer from high communication latencies and/or scalability problems. This research introduces low-cost, user-programmable and reconfigurable MultiProcessor-on-a-Programmable-Chip (MPoPC) systems for high-performance, low-cost computing. It also proposes a relevant resource management framework that deals with performance, power consumption and energy issues. These semi-customized systems reduce significantly runtime device reconfiguration by employing userprogrammable processing elements that are reusable for different tasks in large, complex applications. For the sake of illustration, two different types of MPoPCs with hardware FPUs (floating-point units) are designed and implemented for credible performance evaluation and modeling: the coarse-grain MIMD (Multiple-Instruction, Multiple-Data) CG-MPoPC machine based on a processor IP (Intellectual Property) core and the mixed-mode (MIMD, SIMD or M-SIMD) variant-grain HERA (HEterogeneous Reconfigurable Architecture) machine. In addition to alleviating the above difficulties, MPoPCs can offer several performance and energy advantages to our data-parallel applications when compared to ASPCs; they are simpler and more scalable, and have less verification time and cost. Various common computation-intensive benchmark algorithms, such as matrix-matrix multiplication (MMM) and LU factorization, are studied and their parallel solutions are shown for the two MPoPCs. The performance is evaluated with large sparse real-world matrices primarily from power engineering. We expect even further performance gains on MPoPCs in the near future by employing ever improving FPGAs. The innovative nature of this work has the potential to guide research in this arising field of high-performance, low-cost reconfigurable computing. The largest advantage of reconfigurable logic lies in its large degree of hardware customization and reconfiguration which allows reusing the resources to match the computation and communication needs of applications. Therefore, a major effort in the presented design methodology for mixed-mode MPoPCs, like HERA, is devoted to effective resource management. A two-phase approach is applied. A mixed-mode weighted Task Flow Graph (w-TFG) is first constructed for any given application, where tasks are classified according to their most appropriate computing mode (e.g., SIMD or MIMD). At compile time, an architecture is customized and synthesized for the TFG using an Integer Linear Programming (ILP) formulation and a parameterized hardware component library. Various run-time scheduling schemes with different performanceenergy objectives are proposed. A system-level energy model for HERA, which is based on low-level implementation data and run-time statistics, is proposed to guide performance-energy trade-off decisions. A parallel power flow analysis technique based on Newton\u27s method is proposed and employed to verify the methodology

    Design methodology addressing static/reconfigurable partitioning optimizing software defined radio (SDR) implementation through FPGA dynamic partial reconfiguration and rapid prototyping tools

    Get PDF
    The characteristics people request for communication devices become more and more demanding every day. And not only in those aspects dealing with communication speed, but also in such different characteristics as different communication standards compatibility, battery life, device size or price. Moreover, when this communication need is addressed by the industrial world, new characteristics such as reliability, robustness or time-to-market appear. In this context, Software Defined Radios (SDR) and evolutions such as Cognitive Radios or Intelligent Radios seem to be the technological answer that will satisfy all these requirements in a short and mid-term. Consequently, this PhD dissertation deals with the implementation of this type of communication system. Taking into account that there is no limitation neither in the implementation architecture nor in the target device, a novel framework for SDR implementation is proposed. This framework is made up of FPGAs, using dynamic partial reconfiguration, as target device and rapid prototyping tools as designing tool. Despite the benefits that this framework generates, there are also certain drawbacks that need to be analyzed and minimized to the extent possible. On this purpose, a SDR design methodology has been designed and tested. This methodology addresses the static/reconfigurable partitioning of the SDRs in order to optimize their implementation in the aforementioned framework. In order to verify the feasibility of both the design framework and the design methodology, several implementations have been carried out making use of them. A multi-standard modulator implementing WiFi, WiMAX and UMTS, a small-form-factor cognitive video transmission system and the implementation of several data coding functions over R3TOS, a hardware operating system developed by the University of Edinburgh, are these implementations.Las características que la gente exige a los dispositivos de comunicaciones son cada día más exigentes. Y no solo en los aspectos relacionados con la velocidad de comunicación, sino que también en diferentes características como la compatibilidad con diferentes estándares de comunicación, autonomía, tamaño o precio. Es más, cuando esta necesidad de comunicación se traslada al mundo industrial, aparecen nuevas características como fiabilidad, robustez o plazo de comercialización que también es necesario cubrir. En este contexto, las Radios Definidas por Software (SDR) y evoluciones como las Radios Cognitivas o Radios Inteligentes parecen la respuesta tecnológica que va a satisfacer estas necesidades a corto y medio plazo. Por ello, esta tesis doctoral aborda la implementación de este tipo de sistemas de comunicaciones. Teniendo en cuenta que no existe una limitación, ni en la arquitectura de implementación, ni en el tipo de dispositivo a usar, se propone un nuevo entrono de diseño formado por las FPGAs, haciendo uso de la reconfiguración parcial dinámica, y por las herramientas de prototipado rápido. A pesar de que este entorno de diseño ofrece varios beneficios, también genera algunos inconvenientes que es necesario analizar y minimizar en la medida de lo posible. Con este objetivo, se ha diseñado y verificado una metodología de diseño de SDRs. Esta metodología se encarga del particionado estático/reconfigurable de las SDRs para optimizar su implementación sobre el entrono de diseño antes comentado. Para verificar la viabilidad tanto del entorno, como de la metodología de diseño propuesta, se han realizado varias implementaciones que hacen uso de ambas cosas. Estas implementaciones son: un modulador multi-estándar que implementa WiFi, WiMAX y UMTS, un sistema cognitivo y compacto de transmisión de video y la implementación de varias funciones de codificación de datos sobre R3TOS, un sistema operativo hardware desarrollado por la Universidad de Edimburgo

    Modeling and Mapping of Optimized Schedules for Embedded Signal Processing Systems

    Get PDF
    The demand for Digital Signal Processing (DSP) in embedded systems has been increasing rapidly due to the proliferation of multimedia- and communication-intensive devices such as pervasive tablets and smart phones. Efficient implementation of embedded DSP systems requires integration of diverse hardware and software components, as well as dynamic workload distribution across heterogeneous computational resources. The former implies increased complexity of application modeling and analysis, but also brings enhanced potential for achieving improved energy consumption, cost or performance. The latter results from the increased use of dynamic behavior in embedded DSP applications. Furthermore, parallel programming is highly relevant in many embedded DSP areas due to the development and use of Multiprocessor System-On-Chip (MPSoC) technology. The need for efficient cooperation among different devices supporting diverse parallel embedded computations motivates high-level modeling that expresses dynamic signal processing behaviors and supports efficient task scheduling and hardware mapping. Starting with dynamic modeling, this thesis develops a systematic design methodology that supports functional simulation and hardware mapping of dynamic reconfiguration based on Parameterized Synchronous Dataflow (PSDF) graphs. By building on the DIF (Dataflow Interchange Format), which is a design language and associated software package for developing and experimenting with dataflow-based design techniques for signal processing systems, we have developed a novel tool for functional simulation of PSDF specifications. This simulation tool allows designers to model applications in PSDF and simulate their functionality, including use of the dynamic parameter reconfiguration capabilities offered by PSDF. With the help of this simulation tool, our design methodology helps to map PSDF specifications into efficient implementations on field programmable gate arrays (FPGAs). Furthermore, valid schedules can be derived from the PSDF models at runtime to adapt hardware configurations based on changing data characteristics or operational requirements. Under certain conditions, efficient quasi-static schedules can be applied to reduce overhead and enhance predictability in the scheduling process. Motivated by the fact that scheduling is critical to performance and to efficient use of dynamic reconfiguration, we have focused on a methodology for schedule design, which complements the emphasis on automated schedule construction in the existing literature on dataflow-based design and implementation. In particular, we have proposed a dataflow-based schedule design framework called the dataflow schedule graph (DSG), which provides a graphical framework for schedule construction based on dataflow semantics, and can also be used as an intermediate representation target for automated schedule generation. Our approach to applying the DSG in this thesis emphasizes schedule construction as a design process rather than an outcome of the synthesis process. Our approach employs dataflow graphs for representing both application models and schedules that are derived from them. By providing a dataflow-integrated framework for unambiguously representing, analyzing, manipulating, and interchanging schedules, the DSG facilitates effective codesign of dataflow-based application models and schedules for execution of these models. As multicore processors are deployed in an increasing variety of embedded image processing systems, effective utilization of resources such as multiprocessor systemon-chip (MPSoC) devices, and effective handling of implementation concerns such as memory management and I/O become critical to developing efficient embedded implementations. However, the diversity and complexity of applications and architectures in embedded image processing systems make the mapping of applications onto MPSoCs difficult. We help to address this challenge through a structured design methodology that is built upon the DSG modeling framework. We refer to this methodology as the DEIPS methodology (DSG-based design and implementation of Embedded Image Processing Systems). The DEIPS methodology provides a unified framework for joint consideration of DSG structures and the application graphs from which they are derived, which allows designers to integrate considerations of parallelization and resource constraints together with the application modeling process. We demonstrate the DEIPS methodology through cases studies on practical embedded image processing systems
    corecore