53 research outputs found

    Design of asynchronous microprocessor for power proportionality

    Get PDF
    PhD ThesisMicroprocessors continue to get exponentially cheaper for end users following Moore’s law, while the costs involved in their design keep growing, also at an exponential rate. The reason is the ever increasing complexity of processors, which modern EDA tools struggle to keep up with. This makes further scaling for performance subject to a high risk in the reliability of the system. To keep this risk low, yet improve the performance, CPU designers try to optimise various parts of the processor. Instruction Set Architecture (ISA) is a significant part of the whole processor design flow, whose optimal design for a particular combination of available hardware resources and software requirements is crucial for building processors with high performance and efficient energy utilisation. This is a challenging task involving a lot of heuristics and high-level design decisions. Another issue impacting CPU reliability is continuous scaling for power consumption. For the last decades CPU designers have been mainly focused on improving performance, but “keeping energy and power consumption in mind”. The consequence of this was a development of energy-efficient systems, where energy was considered as a resource whose consumption should be optimised. As CMOS technology was progressing, with feature size decreasing and power delivered to circuit components becoming less stable, the energy resource turned from an optimisation criterion into a constraint, sometimes a critical one. At this point power proportionality becomes one of the most important aspects in system design. Developing methods and techniques which will address the problem of designing a power-proportional microprocessor, capable to adapt to varying operating conditions (such as low or even unstable voltage levels) and application requirements in the runtime, is one of today’s grand challenges. In this thesis this challenge is addressed by proposing a new design flow for the development of an ISA for microprocessors, which can be altered to suit a particular hardware platform or a specific operating mode. This flow uses an expressive and powerful formalism for the specification of processor instruction sets called the Conditional Partial Order Graph (CPOG). The CPOG model captures large sets of behavioural scenarios for a microarchitectural level in a computationally efficient form amenable to formal transformations for synthesis, verification and automated derivation of asynchronous hardware for the CPU microcontrol. The feasibility of the methodology, novel design flow and a number of optimisation techniques was proven in a full size asynchronous Intel 8051 microprocessor and its demonstrator silicon. The chip showed the ability to work in a wide range of operating voltage and environmental conditions. Depending on application requirements and power budget our ASIC supports several operating modes: one optimised for energy consumption and the other one for performance. This was achieved by extending a traditional datapath structure with an auxiliary control layer for adaptable and fault tolerant operation. These and other optimisations resulted in a reconfigurable and adaptable implementation, which was proven by measurements, analysis and evaluation of the chip.EPSR

    HAL-ASOS - Linux com aceleração em hardware para sistemas operativos dedicados à aplicação

    Get PDF
    Programa doutoral em Engenharia Eletrónica e de Computadores (PDEEC) (especialidade de Informática Industrial e Sistemas Embebidos)O ecossistema de sistemas embebidos de hoje tornou-se enorme, cobrindo vários e diferentes sistemas, exigindo desempenho e mobilidade completa enquanto atingem autonomias de bateria cada vez maiores. Mas a crescente frequência de relógio que resultou em dispositivos cada vez mais rápidos começou a estagnar antes dos transístores pararem de encolher. Plataformas Field Programmable Gate Array (FPGA) são uma solução alternativa para a implementação de sistemas completos e reconfiguráveis. Fornecem desempenho e eficiência computacional para satisfazer requisitos da aplicação e do sistema embebido. Vários Sistemas Operativos (SO) assistidos por FPGA foram propostos, mas ao estreitar seu foco na síntese do datapath do acelerador de hardware, a grande maioria ignora a integração semântica destes no SO. Ambientes de síntese de alto nível (HLS) elevaram a abstração além da linguagem de transferência de registo (RTL), seguindo uma abordagem específica de domínio enquanto misturam software e abstrações de hardware ad hoc, que dificultam as otimizações. Além disso, os modelos de programação para software e hardware reconfigurável carecem de semelhanças, o que com o tempo dificultará a Exploração do Ambiente de Design (DSE) e diminuirá o potencial de reutilização de código. Para responder a estas necessidades, propomos HAL-ASOS, uma ferramenta para implementar sistemas embebidos baseados em Linux que fornece (1) elasticidade no design em conformidade com a natureza evolutiva deste SO, (2) integração semântica profunda de tarefas de hardware nos modelos de programação do Linux, (3) facilidade na gestão de complexidade através de metodologia e ferramentas para apoiar o design, verificação e implementação, (4) orientada por princípios de design híbridos e eficiência no sistema. Para avaliar as funcionalidades da ferramenta, foi implementado um aplicativo criptográfico que demonstra alcance de desempenho enquanto se emprega a metodologia de design. Novos níveis de desempenho são atingidos numa aplicação de Visão por Computador que explora recursos de programação assíncrona-síncrona. Os resultados demonstram uma abordagem flexível na reconfiguração entre hardware e software, e desempenho que aumenta consistentemente com acréscimo de recursos ou frequência de relógio.Today’s embedded systems ecosystem became huge while covering several and different computer-based systems, demanding for performance and complete mobility while experiencing longer battery lives. But the rampant frequency that resulted in faster devices began hitting a wall even before transistors stopped shrinking. Field Programmable Gate Array (FPGA) platforms are an alternative solution towards implementing complete reconfigurable systems. They provide computational power, efficiency, in a lightweight solution to serve the application requirements and increase performance in the overall system. Several FPGA-assisted Operating Systems (OS) have been proposed, but by narrowing their focus on datapath synthesis of the hardware accelerator, they completely ignore the deep semantic integration of these accelerators into the OS. State-of-the-art High-Level Synthesis (HLS) environments have raised the level of abstraction beyond Register Transfer Language (RTL) by following a domain-specific approach while mixing ad hoc software and hardware abstractions, making harder for performance optimizations. Furthermore, the programming models for software and reconfigurable hardware lack commonalities, which in time will hinder the Design Space Exploration (DSE) and lower the potential for code reuse. To overcome these issues, we propose HAL-ASOS, a framework to implement Linux-based Embedded systems which provides (1) elasticity by design to comply with the evolutive nature of Linux, (2) deep semantic integration of the hardware tasks in the Linux programming models, (3) easy complexity management using methodology and tools to fully support design, verification and deployment, (4) hybrid and efficiency-oriented design principles. To evaluate the framework functionalities, a cryptographic application was implemented and demonstrates performance achievements while using the promoted application-driven design methodology. To demonstrate new levels of performance that can be achieved, a Computer Vision application explores several mixed asynchronous-synchronous programming features. Experiments demonstrate a flexible design approach in terms of hardware and software reconfiguration, and significant performance that increases consistently with the rising in processing resources or clock frequencies.Financial support received from Portuguese Foundation for Science and Technology (FCT) with the PhD grant SFRH/BD/82732/2011

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

    Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis

    Get PDF
    Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform. This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements

    Methodology and Ecosystem for the Design of a Complex Network ASIC

    Full text link
    Performance of HPC systems has risen steadily. While the 10 Petaflop/s barrier has been breached in the year 2011 the next large step into the exascale era is expected sometime between the years 2018 and 2020. The EXTOLL project will be an integral part in this venture. Originally designed as a research project on FPGA basis it will make the transition to an ASIC to improve its already excelling performance even further. This transition poses many challenges that will be presented in this thesis. Nowadays, it is not enough to look only at single components in a system. EXTOLL is part of complex ecosystem which must be optimized overall since everything is tightly interwoven and disregarding some aspects can cause the whole system either to work with limited performance or even to fail. This thesis examines four different aspects in the design hierarchy and proposes efficient solutions or improvements for each of them. At first it takes a look at the design implementation and the differences between FPGA and ASIC design. It introduces a methodology to equip all on-chip memory with ECC logic automatically without the user’s input and in a transparent way so that the underlying code that uses the memory does not have to be changed. In the next step the floorplanning process is analyzed and an iterative solution is worked out based on physical and logical constraints of the EXTOLL design. Besides, a work flow for collaborative design is presented that allows multiple users to work on the design concurrently. The third part concentrates on the high-speed signal path from the chip to the connector and how it is affected by technological limitations. All constraints are analyzed and a package layout for the EXTOLL chip is proposed that is seen as the optimal solution. The last part develops a cost model for wafer and package level test and raises technological concerns that will affect the testing methodology. In order to run testing internally it proposes the development of a stand-alone test platform that is able to test packaged EXTOLL chips in every aspect

    Vector-thread architecture and implementation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 181-186).This thesis proposes vector-thread architectures as a performance-efficient solution for all-purpose computing. The VT architectural paradigm unifies the vector and multithreaded compute models. VT provides the programmer with a control processor and a vector of virtual processors. The control processor can use vector-fetch commands to broadcast instructions to all the VPs or each VP can use thread-fetches to direct its own control flow. A seamless intermixing of the vector and threaded control mechanisms allows a VT architecture to flexibly and compactly encode application parallelism and locality. VT architectures can efficiently exploit a wide variety of loop-level parallelism, including non-vectorizable loops with cross-iteration dependencies or internal control flow. The Scale VT architecture is an instantiation of the vector-thread paradigm designed for low-power and high-performance embedded systems. Scale includes a scalar RISC control processor and a four-lane vector-thread unit that can execute 16 operations per cycle and supports up to 128 simultaneously active virtual processor threads. Scale provides unit-stride and strided-segment vector loads and stores, and it implements cache refill/access decoupling. The Scale memory system includes a four-port, non-blocking, 32-way set-associative, 32 KB cache. A prototype Scale VT processor was implemented in 180 nm technology using an ASIC-style design flow. The chip has 7.1 million transistors and a core area of 16.6 mm2, and it runs at 260 MHz while consuming 0.4-1.1 W. This thesis evaluates Scale using a diverse selection of embedded benchmarks, including example kernels for image processing, audio processing, text and data processing, cryptography, network processing, and wireless communication.(cont.) Larger applications also include a JPEG image encoder and an IEEE 802.11 la wireless transmitter. Scale achieves high performance on a range of different types of codes, generally executing 3-11 compute operations per cycle. Unlike other architectures which improve performance at the expense of increased energy consumption, Scale is generally even more energy efficient than a scalar RISC processor.by Ronny Meir Krashinsky.Ph.D

    The 1991 3rd NASA Symposium on VLSI Design

    Get PDF
    Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2

    Built-In Self-Test (BIST) for Multi-Threshold NULL Convention Logic (MTNCL) Circuits

    Get PDF
    This dissertation proposes a Built-In Self-Test (BIST) hardware implementation for Multi-Threshold NULL Convention Logic (MTNCL) circuits. Two different methods are proposed: an area-optimized topology that requires minimal area overhead, and a test-performance-optimized topology that utilizes parallelism and internal hardware to reduce the overall test time through additional controllability points. Furthermore, an automated software flow is proposed to insert, simulate, and analyze an input MTNCL netlist to obtain a desired fault coverage, if possible, through iterative digital and fault simulations. The proposed automated flow is capable of producing both area-optimized and test-performance-optimized BIST circuits and scripts for digital and fault simulation using commercial software that may be utilized to manually verify or adjust further, if desired
    corecore