37 research outputs found

    Mutual Impact between Clock Gating and High Level Synthesis in Reconfigurable Hardware Accelerators

    Get PDF
    With the diffusion of cyber-physical systems and internet of things, adaptivity and low power consumption became of primary importance in digital systems design. Reconfigurable heterogeneous platforms seem to be one of the most suitable choices to cope with such challenging context. However, their development and power optimization are not trivial, especially considering hardware acceleration components. On the one hand high level synthesis could simplify the design of such kind of systems, but on the other hand it can limit the positive effects of the adopted power saving techniques. In this work, the mutual impact of different high level synthesis tools and the application of the well known clock gating strategy in the development of reconfigurable accelerators is studied. The aim is to optimize a clock gating application according to the chosen high level synthesis engine and target technology (Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA)). Different levels of application of clock gating are evaluated, including a novel multi level solution. Besides assessing the benefits and drawbacks of the clock gating application at different levels, hints for future design automation of low power reconfigurable accelerators through high level synthesis are also derived

    Design and Implementation of Software Defined Radios on a Homogeneous Multi-Processor Architecture

    Get PDF
    In the wireless communications domain, multi-mode and multi-standard platforms are becoming increasingly the central focus of system architects. In fact, mobile terminal users require more and more mobility and throughput, pushing towards a fully integrated radio system able to support different communication protocols running concurrently on the platform. A new concept of radio system was introduced to meet the users' expectations. Flexible radio platforms have became an indispensable requirement to meet the expectations of the users today and in the future. This thesis deals with issues related to the design of flexible radio platforms. In particular, the flexibility of the radio system is achieved through the concept of software defined radios (SDRs). The research work focuses on the utilization of homogeneous multi-processor (MP) architectures as a feasible way to efficiently implement SDR platforms. In fact, platforms based on MP architectures are able to deliver high performance together with a high degree of flexibility. Moreover, homogeneous MP platforms are able to reduce design and verification costs as well as provide a high scalability in terms of software and hardware. However, homogeneous MP architectures provide less computational efficiency when compared to heterogeneous solutions. This thesis can be divided into two parts: the first part is related to the implementation of a reference platform while the second part of the thesis introduces the design and implementation of flexible, high performance, power and energy efficient algorithms for wireless communications. The proposed reference platform, Ninesilica, is a homogeneous MP architecture composed of a 3x3 mesh of processing nodes (PNs), interconnected by a hierarchical Network-on-Chip (NoC). Each PN hosts as Processing Element (PE) a processor core. To improve the computational efficiency of the platform, different power and energy saving techniques have been investigated. In the design, implementation and mapping of the algorithms, the following constraints were considered: energy and power efficiency, high scalability of the platform, portability of the solutions across similar platforms, and parallelization efficiency. Ninesilica architecture together with the proposed algorithm implementations showed that homogeneous MP architectures are highly scalable platforms, both in terms of hardware and software. Furthermore, Ninesilica architecture demonstrated that homogeneous MPs are able to achieve high parallelization efficiency as well as high energy and power savings, meeting the requirements of SDRs as well as enabling cognitive radios. Ninesilica can be utilized as a stand-alone block or as an elementary building block to realize clustered many-core architectures. Moreover, the obtained results, in terms of parallelization efficiency as well as power and energy efficiency are independent of the type of PE utilized, ensuring the portability of the results to similar architectures based on a different type of processing element

    Design and Implementation of Software Defined Radios on a Homogeneous Multi-Processor Architecture

    Get PDF
    In the wireless communications domain, multi-mode and multi-standard platforms are becoming increasingly the central focus of system architects. In fact, mobile terminal users require more and more mobility and throughput, pushing towards a fully integrated radio system able to support different communication protocols running concurrently on the platform. A new concept of radio system was introduced to meet the users' expectations. Flexible radio platforms have became an indispensable requirement to meet the expectations of the users today and in the future. This thesis deals with issues related to the design of flexible radio platforms. In particular, the flexibility of the radio system is achieved through the concept of software defined radios (SDRs). The research work focuses on the utilization of homogeneous multi-processor (MP) architectures as a feasible way to efficiently implement SDR platforms. In fact, platforms based on MP architectures are able to deliver high performance together with a high degree of flexibility. Moreover, homogeneous MP platforms are able to reduce design and verification costs as well as provide a high scalability in terms of software and hardware. However, homogeneous MP architectures provide less computational efficiency when compared to heterogeneous solutions. This thesis can be divided into two parts: the first part is related to the implementation of a reference platform while the second part of the thesis introduces the design and implementation of flexible, high performance, power and energy efficient algorithms for wireless communications. The proposed reference platform, Ninesilica, is a homogeneous MP architecture composed of a 3x3 mesh of processing nodes (PNs), interconnected by a hierarchical Network-on-Chip (NoC). Each PN hosts as Processing Element (PE) a processor core. To improve the computational efficiency of the platform, different power and energy saving techniques have been investigated. In the design, implementation and mapping of the algorithms, the following constraints were considered: energy and power efficiency, high scalability of the platform, portability of the solutions across similar platforms, and parallelization efficiency. Ninesilica architecture together with the proposed algorithm implementations showed that homogeneous MP architectures are highly scalable platforms, both in terms of hardware and software. Furthermore, Ninesilica architecture demonstrated that homogeneous MPs are able to achieve high parallelization efficiency as well as high energy and power savings, meeting the requirements of SDRs as well as enabling cognitive radios. Ninesilica can be utilized as a stand-alone block or as an elementary building block to realize clustered many-core architectures. Moreover, the obtained results, in terms of parallelization efficiency as well as power and energy efficiency are independent of the type of PE utilized, ensuring the portability of the results to similar architectures based on a different type of processing element

    Course grained low power design flow using UPF

    Get PDF
    Increased system complexity has led to the substitution of the traditional bottom-up design flow by systematic hierarchical design flow. The main motivation behind the evolution of such an approach is the increasing difficulty in hardware realization of complex systems. With decreasing channel lengths, few key problems such as timing closure, design sign-off, routing complexity, signal integrity, and power dissipation arise in the design flows. Specifically, minimizing power dissipation is critical in several high-end processors. In high-end processors, the design complexity contributes to the overall dynamic power while the decreasing transistor size results in static power dissipation. This research aims at optimizing the design flow for power and timing using the unified power format (UPF). UPF provides a strategic format to specify power-aware design information at every stage in the flow. The low power reduction techniques enforced in this research are multi-voltage, multi-threshold voltage (Vth), and power gating with state retention. An inherent design challenge addressed in this research is the choice of power optimization techniques as the flow advances from synthesis to physical design. A top-down digital design flow for a 32 bit MIPS RISC processor has been implemented with and without UPF synthesis flow for 65nm technology. The UPF synthesis is implemented with two voltages, 1.08V and 0.864V (Multi-VDD). Area, power and timing metrics are analyzed for the flows developed. Power savings of about 20 % are achieved in the design flow with \u27multi-threshold\u27 power technique compared to that of the design flow with no low power techniques employed. Similarly, 30 % power savings are achieved in the design flow with the UPF implemented when compared to that of the design flow with \u27multi-threshold\u27 power technique employed. Thus, a cumulative power savings of 42% has been achieved in a complete power efficient design flow (UPF) compared to that of the generic top-down standard flow with no power saving techniques employed. This is substantiated by the low voltage operation of modules in the design, reduction in clock switching power by gating clocks in the design and extensive use of HVT and LVT standard cells for implementation. The UPF synthesis flow saw the worst timing slack and more area when compared to those of the `multi-threshold\u27 or the generic flow. Percentage increase in the area with UPF is approximately 15%; a significant source for this increase being the additional power controlling logic added

    Automated Hardware Prototyping for 3D Network on Chips

    Get PDF
    Vor mehr als 50 Jahren stellte Intel® Mitbegründer Gordon Moore eine Prognose zum Entwicklungsprozess der Transistortechnologie auf. Er prognostizierte, dass sich die Zahl der Transistoren in integrierten Schaltungen alle zwei Jahre verdoppeln wird. Seine Aussage ist immer noch gültig, aber ein Ende von Moores Gesetz ist in Sicht. Mit dem Ende von Moore’s Gesetz müssen neue Aspekte untersucht werden, um weiterhin die Leistung von integrierten Schaltungen zu steigern. Zwei mögliche Ansätze für "More than Moore” sind 3D-Integrationsverfahren und heterogene Systeme. Gleichzeitig entwickelt sich ein Trend hin zu Multi-Core Prozessoren, basierend auf Networks on chips (NoCs). Neben dem Ende des Mooreschen Gesetzes ergeben sich bei immer kleiner werdenden Technologiegrößen, vor allem jenseits der 60 nm, neue Herausforderungen. Eine Schwierigkeit ist die Wärmeableitung in großskalierten integrierten Schaltkreisen und die daraus resultierende Überhitzung des Chips. Um diesem Problem in modernen Multi-Core Architekturen zu begegnen, muss auch die Verlustleistung der Netzwerkressourcen stark reduziert werden. Diese Arbeit umfasst eine durch Hardware gesteuerte Kombination aus Frequenzskalierung und Power Gating für 3D On-Chip Netzwerke, einschließlich eines FPGA Prototypen. Dafür wurde ein Takt-synchrones 2D Netzwerk auf ein dreidimensionales asynchrones Netzwerk mit mehreren Frequenzbereichen erweitert. Zusätzlich wurde ein skalierbares Online-Power-Management System mit geringem Ressourcenaufwand entwickelt. Die Verifikation neuer Hardwarekomponenten ist einer der zeitaufwendigsten Schritte im Entwicklungsprozess hochintegrierter digitaler Schaltkreise. Um diese Aufgabe zu beschleunigen und um eine parallele Softwareentwicklung zu ermöglichen, wurde im Rahmen dieser Arbeit ein automatisiertes und benutzerfreundliches Tool für den Entwurf neuer Hardware Projekte entwickelt. Eine grafische Benutzeroberfläche zum Erstellen des gesamten Designablaufs, vom Erstellen der Architektur, Parameter Deklaration, Simulation, Synthese und Test ist Teil dieses Werkzeugs. Zudem stellt die Größe der Architektur für die Erstellung eines Prototypen eine besondere Herausforderung dar. Frühere Arbeiten haben es versäumt, eine schnelles und unkompliziertes Prototyping, insbesondere von Architekturen mit mehr als 50 Prozessorkernen, zu realisieren. Diese Arbeit umfasst eine Design Space Exploration und FPGA-basierte Prototypen von verschiedenen 3D-NoC Implementierungen mit mehr als 80 Prozessoren

    Hardware Algorithm Implementation for Mission Specific Processing

    Get PDF
    There is a need to expedite the process of designing military hardware to stay ahead of the adversary. The core of this project was to build reusable, synthesizeable libraries to make this a possibility. In order to build these libraries, Matlab® commands and functions, such as Conv2, Round, Floor, Pinv, etc., had to be converted into reusable VHDL modules. These modules make up reusable libraries for the Mission Specific Process (MSP) which will support AFRL/RY. The MSP allows the VLSI design process to be completed in a mere matter of days or months using an FPGA or ASIC design, as opposed to the current way of developing a system which can take 1-2 years to complete. By having the libraries built, the components can be implemented in an FPGA or ASIC design over and over again. The libraries make it possible to make upgrades to weapons systems to meet the ever-changing needs the War Fighter faces. MSP makes it possible to develop various algorithms, including algorithms implemented in Matlab®. The MSP libraries were built and tested using TSMC 250-nmrtechnology library from the Taiwan Semiconductor Manufacturing Company. They were also synthesized for an FPGA. The modules were all synthesized using the CAD tools from Cadence® and Mentor Graphics®. Power, area, and delay results for each module were presented

    A Survey of FPGA Optimization Methods for Data Center Energy Efficiency

    Get PDF
    This article provides a survey of academic literature about field programmable gate array (FPGA) and their utilization for energy efficiency acceleration in data centers. The goal is to critically present the existing FPGA energy optimization techniques and discuss how they can be applied to such systems. To do so, the article explores current energy trends and their projection to the future with particular attention to the requirements set out by the European Code of Conduct for Data Center Energy Efficiency. The article then proposes a complete analysis of over ten years of research in energy optimization techniques, classifying them by purpose, method of application, and impacts on the sources of consumption. Finally, we conclude with the challenges and possible innovations we expect for this sector.Comment: Accepted for publication in IEEE Transactions on Sustainable Computin

    Research on low power technology by AC power supply circuits

    Get PDF
    制度:新 ; 報告番号:甲3692号 ; 学位の種類:博士(工学) ; 授与年月日:2012/9/15 ; 早大学位記番号:新6060Waseda Universit

    DESIGN SPACE EXPLORATION FOR SIGNAL PROCESSING SYSTEMS USING LIGHTWEIGHT DATAFLOW GRAPHS

    Get PDF
    Digital signal processing (DSP) is widely used in many types of devices, including mobile phones, tablets, personal computers, and numerous forms of embedded systems. Implementation of modern DSP applications is very challenging in part due to the complex design spaces that are involved. These design spaces involve many kinds of configurable parameters associated with the signal processing algorithms that are used, as well as different ways of mapping the algorithms onto the targeted platforms. In this thesis, we develop new algorithms, software tools and design methodologies to systematically explore the complex design spaces that are involved in design and implementation of signal processing systems. To improve the efficiency of design space exploration, we develop and apply compact system level models, which are carefully formulated to concisely capture key properties of signal processing algorithms, target platforms, and algorithm-platform interactions. Throughout the thesis, we develop design methodologies and tools for integrating new compact system level models and design space exploration methods with lightweight dataflow (LWDF) techniques for design and implementation of signal processing systems. LWDF is a previously-introduced approach for integrating new forms of design space exploration and system-level optimization into design processes for DSP systems. LWDF provides a compact set of retargetable application programming interfaces (APIs) that facilitates the integration of dataflow-based models and methods. Dataflow provides an important formal foundation for advanced DSP system design, and the flexible support for dataflow in LWDF facilitates experimentation with and application of novel design methods that are founded in dataflow concepts. Our developed methodologies apply LWDF programming to facilitate their application to different types of platforms and their efficient integration with platform-based tools for hardware/software implementation. Additionally, we introduce novel extensions to LWDF to improve its utility for digital hardware design and adaptive signal processing implementation. To address the aforementioned challenges of design space exploration and system optimization, we present a systematic multiobjective optimization framework for dataflow-based architectures. This framework builds on the methodology of multiobjective evolutionary algorithms and derives key system parameters subject to time-varying and multidimensional constraints on system performance. We demonstrate the framework by applying LWDF techniques to develop a dataflow-based architecture that can be dynamically reconfigured to realize strategic configurations in the underlying parameter space based on changing operational requirements. Secondly, we apply Markov decision processes (MDPs) for design space exploration in adaptive embedded signal processing systems. We propose a framework, known as the Hierarchical MDP framework for Compact System-level Modeling (HMCSM), which embraces MDPs to enable autonomous adaptation of embedded signal processing under multidimensional constraints and optimization objectives. The framework integrates automated, MDP-based generation of optimal reconfiguration policies, dataflow-based application modeling, and implementation of embedded control software that carries out the generated reconfiguration policies. Third, we present a new methodology for design and implementation of signal processing systems that are targeted to system-on-chip (SoC) platforms. The methodology is centered on the use of LWDF concepts and methods for applying principles of dataflow design at different layers of abstraction. The development processes integrated in our approach are software implementation, hardware implementation, hardware-software co-design, and optimized application mapping. The proposed methodology facilitates development and integration of signal processing hardware and software modules that involve heterogeneous programming languages and platforms. Through three case studies involving complex applications, we demonstrate the effectiveness of the proposed contributions for compact system level design and design space exploration: a digital predistortion (DPD) system, a reconfigurable channelizer for wireless communication, and a deep neural network (DNN) for vehicle classification
    corecore