479 research outputs found

    Multiple voltage scheme with frequency variation for power minimization of pipelined circuits at high-level synthesis

    Full text link
    High-Level Synthesis (HLS) is defined as a translation process from a behavioral description into structural description. The high-level synthesis process consists of three interdependent phases: scheduling, allocation and binDing The order of the three phases varies depending on the design flow. There are three important quality measures used to support design decision, namely size, performance and power consumption. Recently, with the increase in portability, the power consumption has become a very dominant factor in the design of circuits. The aim of low-power high-level synthesis is to schedule operations to minimize switching activity and select low power modules while satisfying timing constraints. This thesis presents a heuristic that helps minimize power consumption by operating the functional units at multiple voltages and varied clock frequencies. The algorithm presented here deals with pipelined operations where multiple instance of the same operation are carried out. The algorithm was implemented using C++, on LINUX platform

    Hardware Acceleration Using Functional Languages

    Get PDF
    Cílem této práce je prozkoumat možnosti využití funkcionálního paradigmatu pro hardwarovou akceleraci, konkrétně pro datově paralelní úlohy. Úroveň abstrakce tradičních jazyků pro popis hardwaru, jako VHDL a Verilog, přestáví stačit. Pro popis na algoritmické či behaviorální úrovni se rozmáhají jazyky původně navržené pro vývoj softwaru a modelování, jako C/C++, SystemC nebo MATLAB. Funkcionální jazyky se s těmi imperativními nemůžou měřit v rozšířenosti a oblíbenosti mezi programátory, přesto je předčí v mnoha vlastnostech, např. ve verifikovatelnosti, schopnosti zachytit inherentní paralelismus a v kompaktnosti kódu. Pro akceleraci datově paralelních výpočtů se často používají jednotky FPGA, grafické karty (GPU) a vícejádrové procesory. Praktická část této práce rozšiřuje existující knihovnu Accelerate pro počítání na grafických kartách o výstup do VHDL. Accelerate je možno chápat jako doménově specifický jazyk vestavěný do Haskellu s backendem pro prostředí NVIDIA CUDA. Rozšíření pro vysokoúrovňovou syntézu obvodů ve VHDL představené v této práci používá stejný jazyk a frontend.The aim of this thesis is to research how the functional paradigm can be used for hardware acceleration with an emphasis on data-parallel tasks. The level of abstraction of the traditional hardware description languages, such as VHDL or Verilog, is becoming to low. High-level languages from the domains of software development and modeling, such as C/C++, SystemC or MATLAB, are experiencing a boom for hardware description on the algorithmic or behavioral level. Functional Languages are not so commonly used, but they outperform imperative languages in verification, the ability to capture inherent paralellism and the compactness of code. Data-parallel task are often accelerated on FPGAs, GPUs and multicore processors. In this thesis, we use a library for general-purpose GPU programs called Accelerate and extend it to produce VHDL. Accelerate is a domain-specific language embedded into Haskell with a backend for the NVIDIA CUDA platform. We use the language and its frontend, and create a new backend for high-level synthesis of circuits in VHDL.

    A Behavioral Design Flow for Synthesis and Optimization of Asynchronous Systems

    Get PDF
    Asynchronous or clockless design is believed to hold the promise of alleviating many of the challenges currently facing microelectronic design. Distributing a high-speed clock signal across an entire chip is an increasing challenge, particularly as the number of transistors on chip continues to rise. With increasing heterogeneity in massively multi- core processors, the top-level system integration is already elastic in nature. Future computing technologies (e.g., nano, quantum, etc.) are expected to have unpredictable timing as well. Therefore, asynchronous design techniques are gaining relevance in mainstream design. Unfortunately, the field of asynchronous design lacks mature design tools for creating large-scale, high-performance or energy-efficient systems. This thesis attempts to fill the void by contributing a set of design methods and automated tools for synthesizing asynchronous systems from high-level specifications. In particular, this thesis provides methods and tools for: (i) generating high-speed pipelined implementations from behavioral specifications, (ii) sharing and scheduling resources to conserve area while providing high performance, and (iii) incorporating energy and power considerations into high-level design. These methods are incorporated into a comprehensive design flow that provides a choice of synthesis paths to the designer, and a mechanism to explore the spectrum between them. The first path specifically targets the highest-performance implementations using data-driven pipelined circuits. The second path provides an alternative approach that targets low-area implementations, providing for optimal resource sharing and optimal scheduling techniques to achieve performance targets. Finally, the third path through the design flow allows the entire spectrum between the two extremes to be explored. In particular, it is a hybrid approach that preserves a pipelined architecture but still allows sharing of resources. By varying performance targets, a wide range of designs can be realized. A variety of metrics are incorporated as constraints or cost functions: area, latency, cycle time, energy consumption, and peak power. Experimental results demonstrate the capability of the proposed design flow to quickly produce optimized specifications. By automating synthesis and optimization, this thesis shows that the designer effort necessary to produce a high-quality solution can be significantly reduced. It is hoped that this work provides a path towards more mature automation and design tools for asynchronous design

    Optimising runtime reconfigurable designs for high performance applications

    Get PDF
    This thesis proposes novel optimisations for high performance runtime reconfigurable designs. For a reconfigurable design, the proposed approach investigates idle resources introduced by static design approaches, and exploits runtime reconfiguration to eliminate the inefficient resources. The approach covers the circuit level, the function level, and the system level. At the circuit level, a method is proposed for tuning reconfigurable designs with two analytical models: a resource model for computational and memory resources and memory bandwidth, and a performance model for estimating execution time. This method is applied to tuning implementations of finite-difference algorithms, optimising arithmetic operators and memory bandwidth based on algorithmic parameters, and eliminating idle resources by runtime reconfiguration. At the function level, a method is proposed to automatically identify and exploit runtime reconfiguration opportunities while optimising resource utilisation. The method is based on Reconfiguration Data Flow Graph, a new hierarchical graph structure enabling runtime reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and runtime solution generation. At the system level, a method is proposed for optimising reconfigurable designs by dynamically adapting the designs to available runtime resources in a reconfigurable system. This method includes two steps: compile-time optimisation and runtime scaling, which enable efficient workload distribution, asynchronous communication scheduling, and domain-specific optimisations. It can be used in developing effective servers for high performance applications.Open Acces

    Power and memory optimization techniques in embedded systems design

    Get PDF
    Embedded systems incur tight constraints on power consumption and memory (which impacts size) in addition to other constraints such as weight and cost. This dissertation addresses two key factors in embedded system design, namely minimization of power consumption and memory requirement. The first part of this dissertation considers the problem of optimizing power consumption (peak power as well as average power) in high-level synthesis (HLS). The second part deals with memory usage optimization mainly targeting a restricted class of computations expressed as loops accessing large data arrays that arises in scientific computing such as the coupled cluster and configuration interaction methods in quantum chemistry. First, a mixed-integer linear programming (MILP) formulation is presented for the scheduling problem in HLS using multiple supply-voltages in order to optimize peak power as well as average power and energy consumptions. For large designs, the MILP formulation may not be suitable; therefore, a two-phase iterative linear programming formulation and a power-resource-saving heuristic are presented to solve this problem. In addition, a new heuristic that uses an adaptation of the well-known force-directed scheduling heuristic is presented for the same problem. Next, this work considers the problem of module selection simultaneously with scheduling for minimizing peak and average power consumption. Then, the problem of power consumption (peak and average) in synchronous sequential designs is addressed. A solution integrating basic retiming and multiple-voltage scheduling (MVS) is proposed and evaluated. A two-stage algorithm namely power-oriented retiming followed by a MVS technique for peak and/or average power optimization is presented. Memory optimization is addressed next. Dynamic memory usage optimization during the evaluation of a special class of interdependent large data arrays is considered. Finally, this dissertation develops a novel integer-linear programming (ILP) formulation for static memory optimization using the well-known fusion technique by encoding of legality rules for loop fusion of a special class of loops using logical constraints over binary decision variables and a highly effective approximation of memory usage

    DESIGN AUTOMATION FOR LOW POWER RFID TAGS

    Get PDF
    Radio Frequency Identification (RFID) tags are small, wireless devices capable of automated item identification, used in a variety of applications including supply chain management, asset management, automatic toll collection (EZ Pass), etc. However, the design of these types of custom systems using the traditional methods can take months for a hardware engineer to develop and debug. In this dissertation, an automated, low-power flow for the design of RFID tags has been developed, implemented and validated. This dissertation presents the RFID Compiler, which permits high-level design entry using a simple description of the desired primitives and their behavior in ANSI-C. The compiler has different back-ends capable of targeting microprocessor-based or custom hardware-based tags. For the hardware-based tag, the back-end automatically converts the user-supplied behavior in C to low power synthesizable VHDL optimized for RFID applications. The compiler also integrates a fast, high-level power macromodeling flow, which can be used to generate power estimates within 15% accuracy of industry CAD tools and to optimize the primitives and / or the behaviors, compared to conventional practices. Using the RFID Compiler, the user can develop the entire design in a matter of days or weeks. The compiler has been used to implement standards such as ANSI, ISO 18000-7, 18000-6C and 18185-7. The automatically generated tag designs were validated by targeting microprocessors such as the AD Chips EISC and FPGAs such as Xilinx Spartan 3. The corresponding ASIC implementation is comparable to the conventionally designed commercial tags in terms of the energy and area. Thus, the RFID Compiler permits the design of power efficient, custom RFID tags by a wider audience with a dramatically reduced design cycle

    Embedded System Design

    Get PDF
    A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues

    Studies on Core-Based Testing of System-on-Chips Using Functional Bus and Network-on-Chip Interconnects

    Get PDF
    The tests of a complex system such as a microprocessor-based system-onchip (SoC) or a network-on-chip (NoC) are difficult and expensive. In this thesis, we propose three core-based test methods that reuse the existing functional interconnects-a flat bus, hierarchical buses of multiprocessor SoC's (MPSoC), and a N oC-in order to avoid the silicon area cost of a dedicated test access mechanism (TAM). However, the use of functional interconnects as functional TAM's introduces several new problems. During tests, the interconnects-including the bus arbitrator, the bus bridges, and the NoC routers-operate in the functional mode to transport the test stimuli and responses, while the core under tests (CUT) operate in the test mode. Second, the test data is transported to the CUT through the functional bus, and not directly to the test port. Therefore, special core test wrappers that can provide the necessary control signals required by the different functional interconnect are proposed. We developed two types of wrappers, one buffer-based wrapper for the bus-based systems and another pair of complementary wrappers for the NoCbased systems. Using the core test wrappers, we propose test scheduling schemes for the three functionally different types of interconnects. The test scheduling scheme for a flat bus is developed based on an efficient packet scheduling scheme that minimizes both the buffer sizes and the test time under a power constraint. The schedulingscheme is then extended to take advantage of the hierarchical bus architecture of the MPSoC systems. The third test scheduling scheme based on the bandwidth sharing is developed specifically for the NoC-based systems. The test scheduling is performed under the objective of co-optimizing the wrapper area cost and the resulting test application time using the two complementary NoC wrappers. For each of the proposed methodology for the three types of SoC architec .. ture, we conducted a thorough experimental evaluation in order to verify their effectiveness compared to other methods
    corecore