25,166 research outputs found

    Automated Exploration of the ASIC Design Space for Minimum Power-Delay-Area Product at the Register Transfer Level

    Get PDF
    Exploring the integrated circuit design space for minimum power-delay-area (PDA) product can be time-consuming and tedious, especially when the target standard-cell library has hundreds of options. In this dissertation, heuristic algorithms that automate this process have been developed, implemented and validated at the reg- ister transfer level. In some cases, the PDA product was 1.9 times better than the initial baseline solution. The parallel search algorithm exhibited 9x speed up when executed on 10 machines simultaneously. These two new methods also characterize the design space for the given RTL code by generating power-delay-area points in addition to the minimum PDA point in case the designer wishes to select a different solution that is a tradeoff among these metrics. As a final step, these two search algorithms are integrated into a fully automated ASIC design flow

    KAPow: A System Identification Approach to Online Per-Module Power Estimation in FPGA Designs

    Get PDF
    In a modern FPGA system-on-chip design, it is often insufficient to simply assess the total power consumption of the entire circuit by design-time estimation or runtime power rail measurement. Instead, to make better runtime decisions, it is desirable to understand the power consumed by each individual module in the system. In this work, we combine boardlevel power measurements with register-level activity counting to build an online model that produces a breakdown of power consumption within the design. Online model refinement avoids the need for a time-consuming characterisation stage and also allows the model to track long-term changes to operating conditions. Our flow is named KAPow, a (loose) acronym for ‘K’ounting Activity for Power estimation, which we show to be accurate, with per-module power estimates as close to ±5mW of true measurements, and to have low overheads. We also demonstrate an application example in which a permodule power breakdown can be used to determine an efficient mapping of tasks to modules and reduce system-wide power consumption by over 8%

    Modelling and Automated Implementation of Optimal Power Saving Strategies in Coarse-Grained Reconfigurable Architectures

    Get PDF
    This paper focuses on how to efficiently reduce power consumption in coarse-grained reconfigurable designs, to allow their effective adoption in heterogeneous architectures supporting and accelerating complex and highly variable multifunctional applications. We propose a design flow for this kind of architectures that, besides their automatic customization, is also capable of determining their optimal power management support. Power and clock gating implementation costs are estimated in advance, before their physical implementation, on the basis of the functional, technological, and architectural parameters of the baseline design. Experimental results, on 90 and 45 nm CMOS technologies, demonstrate that the proposed approach guides the designer towards optimal implementation

    Energy-efficient hardware design based on high-level synthesis

    Get PDF
    This dissertation describes research activities broadly concerning the area of High-level synthesis (HLS), but more specifically, regarding the HLS-based design of energy-efficient hardware (HW) accelerators. HW accelerators, mostly implemented on FPGAs, are integral to the heterogeneous architectures employed in modern high performance computing (HPC) systems due to their ability to speed up the execution while dramatically reducing the energy consumption of computationally challenging portions of complex applications. Hence, the first activity was regarding an HLS-based approach to directly execute an OpenCL code on an FPGA instead of its traditional GPU-based counterpart. Modern FPGAs offer considerable computational capabilities while consuming significantly smaller power as compared to high-end GPUs. Several different implementations of the K-Nearest Neighbor algorithm were considered on both FPGA- and GPU-based platforms and their performance was compared. FPGAs were generally more energy-efficient than the GPUs in all the test cases. Eventually, we were also able to get a faster (in terms of execution time) FPGA implementation by using an FPGA-specific OpenCL coding style and utilizing suitable HLS directives. The second activity was targeted towards the development of a methodology complementing HLS to automatically derive power optimization directives (also known as "power intent") from a system-level design description and use it to drive the design steps after HLS, by producing a directive file written using the common power format (CPF) to achieve power shut-off (PSO) in case of an ASIC design. The proposed LP-HLS methodology reduces the design effort by enabling designers to infer low power information from the system-level description of a design rather than at the RTL. This methodology required a SystemC description of a generic power management module to describe the design context of a HW module also modeled in SystemC, along with the development of a tool to automatically produce the CPF file to accomplish PSO. Several test cases were considered to validate the proposed methodology and the results demonstrated its ability to correctly extract the low power information and apply it to achieve power optimization in the backend flow

    Limits on Fundamental Limits to Computation

    Full text link
    An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help evaluate emerging technologies and enrich our understanding of integrated-circuit scaling, we review fundamental limits to computation: in manufacturing, energy, physical space, design and verification effort, and algorithms. To outline what is achievable in principle and in practice, we recall how some limits were circumvented, compare loose and tight limits. We also point out that engineering difficulties encountered by emerging technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Sciduction: Combining Induction, Deduction, and Structure for Verification and Synthesis

    Full text link
    Even with impressive advances in automated formal methods, certain problems in system verification and synthesis remain challenging. Examples include the verification of quantitative properties of software involving constraints on timing and energy consumption, and the automatic synthesis of systems from specifications. The major challenges include environment modeling, incompleteness in specifications, and the complexity of underlying decision problems. This position paper proposes sciduction, an approach to tackle these challenges by integrating inductive inference, deductive reasoning, and structure hypotheses. Deductive reasoning, which leads from general rules or concepts to conclusions about specific problem instances, includes techniques such as logical inference and constraint solving. Inductive inference, which generalizes from specific instances to yield a concept, includes algorithmic learning from examples. Structure hypotheses are used to define the class of artifacts, such as invariants or program fragments, generated during verification or synthesis. Sciduction constrains inductive and deductive reasoning using structure hypotheses, and actively combines inductive and deductive reasoning: for instance, deductive techniques generate examples for learning, and inductive reasoning is used to guide the deductive engines. We illustrate this approach with three applications: (i) timing analysis of software; (ii) synthesis of loop-free programs, and (iii) controller synthesis for hybrid systems. Some future applications are also discussed

    Automated Hardware Prototyping for 3D Network on Chips

    Get PDF
    Vor mehr als 50 Jahren stellte IntelÂź MitbegrĂŒnder Gordon Moore eine Prognose zum Entwicklungsprozess der Transistortechnologie auf. Er prognostizierte, dass sich die Zahl der Transistoren in integrierten Schaltungen alle zwei Jahre verdoppeln wird. Seine Aussage ist immer noch gĂŒltig, aber ein Ende von Moores Gesetz ist in Sicht. Mit dem Ende von Moore’s Gesetz mĂŒssen neue Aspekte untersucht werden, um weiterhin die Leistung von integrierten Schaltungen zu steigern. Zwei mögliche AnsĂ€tze fĂŒr "More than Moore” sind 3D-Integrationsverfahren und heterogene Systeme. Gleichzeitig entwickelt sich ein Trend hin zu Multi-Core Prozessoren, basierend auf Networks on chips (NoCs). Neben dem Ende des Mooreschen Gesetzes ergeben sich bei immer kleiner werdenden TechnologiegrĂ¶ĂŸen, vor allem jenseits der 60 nm, neue Herausforderungen. Eine Schwierigkeit ist die WĂ€rmeableitung in großskalierten integrierten Schaltkreisen und die daraus resultierende Überhitzung des Chips. Um diesem Problem in modernen Multi-Core Architekturen zu begegnen, muss auch die Verlustleistung der Netzwerkressourcen stark reduziert werden. Diese Arbeit umfasst eine durch Hardware gesteuerte Kombination aus Frequenzskalierung und Power Gating fĂŒr 3D On-Chip Netzwerke, einschließlich eines FPGA Prototypen. DafĂŒr wurde ein Takt-synchrones 2D Netzwerk auf ein dreidimensionales asynchrones Netzwerk mit mehreren Frequenzbereichen erweitert. ZusĂ€tzlich wurde ein skalierbares Online-Power-Management System mit geringem Ressourcenaufwand entwickelt. Die Verifikation neuer Hardwarekomponenten ist einer der zeitaufwendigsten Schritte im Entwicklungsprozess hochintegrierter digitaler Schaltkreise. Um diese Aufgabe zu beschleunigen und um eine parallele Softwareentwicklung zu ermöglichen, wurde im Rahmen dieser Arbeit ein automatisiertes und benutzerfreundliches Tool fĂŒr den Entwurf neuer Hardware Projekte entwickelt. Eine grafische BenutzeroberflĂ€che zum Erstellen des gesamten Designablaufs, vom Erstellen der Architektur, Parameter Deklaration, Simulation, Synthese und Test ist Teil dieses Werkzeugs. Zudem stellt die GrĂ¶ĂŸe der Architektur fĂŒr die Erstellung eines Prototypen eine besondere Herausforderung dar. FrĂŒhere Arbeiten haben es versĂ€umt, eine schnelles und unkompliziertes Prototyping, insbesondere von Architekturen mit mehr als 50 Prozessorkernen, zu realisieren. Diese Arbeit umfasst eine Design Space Exploration und FPGA-basierte Prototypen von verschiedenen 3D-NoC Implementierungen mit mehr als 80 Prozessoren
    • 

    corecore