18 research outputs found

    Design of a dedicated IFT microcontroller

    Get PDF
    The design of a Dedicated IFT Microcontroller originated from the successful implementation of the Iterative Feedback Tuning (IFT) technique into the Digital Signal Processor microcontroller (DSP56F807C) at the University of Cape Town in 2006. However, implementation of the IFT technique on a general-purpose microcontroller is neither optimal, nor a cost-effective exercise, as most of the microcontroller peripherals remain unused, and drain energy for doing nothing. In addition, microcontrollers and DSPs are software-driven devices whose nature is sequential in executing algorithms, and hence have a significant effect on the bandwidth of the closed-loop control. To mitigate the said problem, the design of a Dedicated IFT Microcontroller is proposed in this thesis. To accomplish this goal, the preliminary task was to explore the IFT theory and its applications, followed by a review of the literature on FPGA design methodology for industrial control systems, Microcontroller design principles, and FPGA theory and trends. Furthermore, a survey of electronic design automation (EDA) tools and other application software was also conducted. After the literature review, the IFT was investigated exhaustively by applying it to three types of plants, namely: a DC motor, an oscillatory plant, and an unstable plant. Each of these plants were tested using three types of initial controllers, namely heavilydamped, critically damped and under-damped initial controllers. The plants were also tested by varying the amplitude of the reference signal, followed by using a single-step signal of constant amplitude of one volt. The intention of exploring all of these possibilities was meant to firmly expose the IFT boundaries of applicability, so that the final product would not be vulnerable to unnecessary post-production discoveries. The design methodology adopted in this research was a popular hierarchical and modular top-down procedure, which is an array of abstraction levels that are detailed as: system level, behavioural level, Register-Transfer Level (RTL) and Gate level. At system level, the Dedicated IFT Microcontroller was defined. Thereafter, at behavioural level, the design was simulated using VHDL, created by porting the LabView IFT code to the Xilinx EDA tool. At the RTL, the synthesisable VHDL code utilising fixed-point number representation was written. The compiled bit file was downloaded onto National Instruments (NI) Digital Electronics FPGA Board featuring iii the Spartan 3 series FPGA. This was tested, using a method known as simulation in the hardware. The key contribution of this thesis is the experimental validation of the IFT technique on FPGA hardware as it has never been published before, the work described in chapter four and five. The other contribution is the analysis of 1DOF IFT technique in terms of limitations of applicability for correct implementation, which is the main work of chapter three. This work could be used to explore other computational methods, like the use of floating-point number representation for high resolution and accuracy in numerical computations. Another avenue that could be exploited is Xilinx's recent Vivado methodology, which has the capacity for traditional programming languages like C or C++, as these have in-built floating-point number capability. Finally, out of this work, two papers have already been published by Springer and IEEE Xplore Publishers, and a journal paper has also been written for publication in the Control Systems Technology journal

    Self-healing concepts involving fine-grained redundancy for electronic systems

    Get PDF
    The start of the digital revolution came through the metal-oxide-semiconductor field-effect transistor (MOSFET) in 1959 followed by massive integration onto a silicon die by means of constant down scaling of individual components. Digital systems for certain applications require fault-tolerance against faults caused by temporary or permanent influence. The most widely used technique is triple module redundancy (TMR) in conjunction with a majority voter, which is regarded as a passive fault mitigation strategy. Design by functional resilience has been applied to circuit structures for increased fault-tolerance and towards self-diagnostic triggered self-healing. The focus of this thesis is therefore to develop new design strategies for fault detection and mitigation within transistor, gate and cell design levels. The research described in this thesis makes three contributions. The first contribution is based on adding fine-grained transistor level redundancy to logic gates in order to accomplish stuck-at fault-tolerance. The objective is to realise maximum fault-masking for a logic gate with minimal added redundant transistors. In the case of non-maskable stuck-at faults, the gate structure generates an intrinsic indication signal that is suitable for autonomous self-healing functions. As a result, logic circuitry utilising this design is now able to differentiate between gate faults and faults occurring in inter-gate connections. This distinction between fault-types can then be used for triggering selective self-healing responses. The second contribution is a logic matrix element which applies the three core redundancy concepts of spatial- temporal- and data-redundancy. This logic structure is composed of quad-modular redundant structures and is capable of selective fault-masking and localisation depending of fault-type at the cell level, which is referred to as a spatiotemporal quadded logic cell (QLC) structure. This QLC structure has the capability of cellular self-healing. Through the combination of fault-tolerant and masking logic features the QLC is designed with a fault-behaviour that is equal to existing quadded logic designs using only 33.3% of the equivalent transistor resources. The inherent self-diagnosing feature of QLC is capable of identifying individual faulty cells and can trigger self-healing features. The final contribution is focused on the conversion of finite state machines (FSM) into memory to achieve better state transition timing, minimal memory utilisation and fault protection compared to common FSM designs. A novel implementation based on content-addressable type memory (CAM) is used to achieve this. The FSM is further enhanced by creating the design out of logic gates of the first contribution by achieving stuck-at fault resilience. Applying cross-data parity checking, the FSM becomes equipped with single bit fault detection and correction

    GROK-FPGA: Generating Real on-Chip Knowledge for FPGA Fine-Grain Delays Using Timing Extraction

    Get PDF
    Circuit variation is one of the biggest problems to overcome if Moore\u27s Law is to continue. It is no longer possible to maintain an abstraction of identical devices without huge yield losses, performance penalties, and energy costs. Current techniques such as margining and grade binning are used to deal with this problem. However, they tend to be conservative, offering limited solutions that will not scale as variation increases. Conventional circuits use limited tests and statistical models to determine the margining and binning required to counteract variation. If the limited tests fail, the whole chip is discarded. On the other hand, reconfigurable circuits, such as FPGAs, can use more fine-grained, aggressive techniques that carefully choose which resources to use in order to mitigate variation. Knowing which resources to use and avoid, however, requires measurement of underlying variation. We present Timing Extraction, a methodology that allows measurement of process variation without expensive testers nor highly invasive techniques, rather, relying only on resources already available on conventional FPGAs. It takes advantage of the fact that we can measure the delay of logic paths between any two registers. Measuring enough paths, provides the information necessary to decompose the delay of each path into individual components-essentially, forming a system of linear equations. Determining which paths to measure requires simple graph transformation algorithms applied to a representation of the FPGA circuit. Ultimately, this process decomposes the FPGA into individual components and identifies which paths to measure for computing the delay of individual components. We apply Timing Extraction to 18 commercially available Altera Cyclone III (65 nm) FPGAs. We measure 22×28 logic clusters and the interconnect within and between cluster. Timing Extraction decomposes this region into 1,356,182 components, classified into 10 categories, requiring 2,736,556 path measurements. With an accuracy of ±3.2 ps, our measurements reveal regional variation on the order of 50 ps, systematic variation from 30 ps to 70 ps, and random variation in the clusters with σ=15 ps and in the interconnect with σ=62 ps

    Heterogeneous multicore systems for signal processing

    Get PDF
    This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included

    Outsourcing trends in semiconductor industry

    Get PDF
    Thesis (S.M. in Engineering and Management)--Massachusetts Institute of Technology, Engineering Systems Division, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 82-83).Microelectronic devices traditionally were manufactured by companies that both designed and produced integrated chips. This process was important in 1970's and 1980's when the manufacturing processes required tweaking the design, understanding of the manufacturing processes and occasional need to redesign. As manufacturing techniques and standards evolved, companies have changed their business model and have started to outsource their manufacturing to merchant foundries. Semiconductor companies have also started to outsource the design and verification of their chips to third party design service companies and focus on core competence like research and development of new technologies and defining protocols. This trend has evolved even though the chips have become much more complex, hard to design and hard to manufacture. This thesis studies the different players in the supply chain, how each player has evolved and the challenges companies face in making decisions regarding outsourcing internal processes. It was found that the advancements in the downstream industries such as EDA, Design Suppliers and EMS have helped fabless companies remain competitive with IDM's (Integrated Device Manufacturers). The fabless companies compete in different markets that do not need the most advanced processing technologies used by leading-edge companies.by Karthikeyan Malli Mohan.S.M.in Engineering and Managemen

    Communication and energy delivery architectures for personal medical devices

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 219-232).Advances in sensor technologies and integrated electronics are revolutionizing how humans access and receive healthcare. However, many envisioned wearable or implantable systems are not deployable in practice due to high energy consumption and anatomically-limited size constraints, necessitating large form-factors for external devices, or eventual surgical re-implantation procedures for in-vivo applications. Since communication and energy-management sub-systems often dominate the power budgets of personal biomedical devices, this thesis explores alternative usecases, system architectures, and circuit solutions to reduce their energy burden. For wearable applications, a system-on-chip is designed that both communicates and delivers power over an eTextiles network. The transmitter and receiver front-ends are at least an order of magnitude more efficient than conventional body-area networks. For implantable applications, two separate systems are proposed that avoid reimplantation requirements. The first system extracts energy from the endocochlear potential, an electrochemical gradient found naturally within the inner-ear of mammals, in order to power a wireless sensor. Since extractable energy levels are limited, novel sensing, communication, and energy management solutions are proposed that leverage duty-cycling to achieve enabling power consumptions that are at least an order of magnitude lower than previous work. Clinical measurements show the first system demonstrated to sustain itself with a mammalian-generated electrochemical potential operating as the only source of energy into the system. The second system leverages the essentially unlimited number of re-charge cycles offered by ultracapacitors. To ease patient usability, a rapid wireless capacitor charging architecture is proposed that employs a multi-tapped secondary inductive coil to provide charging times that are significantly faster than conventional approaches.by Patrick Philip Mercier.Ph.D

    Understanding Quantum Technologies 2022

    Full text link
    Understanding Quantum Technologies 2022 is a creative-commons ebook that provides a unique 360 degrees overview of quantum technologies from science and technology to geopolitical and societal issues. It covers quantum physics history, quantum physics 101, gate-based quantum computing, quantum computing engineering (including quantum error corrections and quantum computing energetics), quantum computing hardware (all qubit types, including quantum annealing and quantum simulation paradigms, history, science, research, implementation and vendors), quantum enabling technologies (cryogenics, control electronics, photonics, components fabs, raw materials), quantum computing algorithms, software development tools and use cases, unconventional computing (potential alternatives to quantum and classical computing), quantum telecommunications and cryptography, quantum sensing, quantum technologies around the world, quantum technologies societal impact and even quantum fake sciences. The main audience are computer science engineers, developers and IT specialists as well as quantum scientists and students who want to acquire a global view of how quantum technologies work, and particularly quantum computing. This version is an extensive update to the 2021 edition published in October 2021.Comment: 1132 pages, 920 figures, Letter forma

    Effizientes Programmiermodell fĂŒr OpenMP auf einem Cluster-basierten Many-Core-System

    Get PDF
    Da die KomplexitĂ€t „System-on-Chip“ (SoC) auch weiterhin zunimmt, wird man die Herausforderungen aufgrund der Konvergenz der Software- und Hardwareentwicklung nicht ignorieren können. Dies gilt auch fĂŒr den Umgang mit dem hierarchischen Design, in dem die Prozessorkerne in Clustern oder sogenannten „Tiles“ angeordnet werden, um mittels eines schnellen lokalen Speicherzugriffs eine geringe Latenz und eine hohe Bandbreite der lokalen Kommunikation zu gewĂ€hrleisten. Aus der Sicht eines Programmierers ist es wĂŒnschenswert, sich diese Eigenheiten der Hardware zunutze zu machen und sie bei der Ausgestaltung der abstrakten Parallel-Programmierung gewissenhaft und zielfĂŒhrend zu berĂŒcksichtigen. Diese Dissertation ĂŒberwindet viele EngpĂ€sse in Bezug auf die Skalierbarkeit Cluster-basierter Many-Core-Systeme und fĂŒhrt das Programmiermodell OpenMP zur Vereinfachung der Anwendungsentwicklung ein. OpenMP abstrahiert von der Sichtweise des Programmierers – und es werden Richtlinien eingefĂŒhrt, mit denen Schleifen in Programmsequenzen eingeteilt werden, als Basis fĂŒr die parallele Programmierung. In dieser Arbeit wird das OpenMP-Modell bespielhaft in einem konkreten Cluster-basierten Many-Core-System umgesetzt; dem Intel Single-Chip Cloud Computer (SCC). Es wird eine schlanke und hoch-optimierte Laufzeitschicht fĂŒr die AusfĂŒhrung von OpenMP sowie ein Speichermodell vorgestellt. Auf Basis dieser Laufzeitschicht wird der parallele Code automatisch von einem nativen Backend-Compiler (GCC 4.6) erzeugt, der mit der Laufzeitbibliothek verknĂŒpft ist. Im Rahmen der Arbeit wird auf einen effizienten Designansatz fĂŒr die OpenMP-Programmierung eingegangen, wobei der Intel SCC als Beispiel fĂŒr Cluster-basierte Systeme zum Einsatz kommt. In nicht-Cache-kohĂ€renten Systemen dient die SCC OpenMP Laufzeitbibliothek primĂ€r dazu, die folgenden Herausforderungen zu bewĂ€ltigen: 1. Die AusfĂŒhrung von unmodifizierten, bestehenden OpenMP Programmen auf solchen Systemen. 2. Die Portierung des OpenMP-Speichermodells auf den SCC. 3. Die Synchronisation der parallelen Threads, auf die ein betrĂ€chtlicher Anteil der AusfĂŒhrungszeit einer Anwendung entfĂ€llt. Eine Reihe weiterer Beispiele, basierend auf verschiedenen gebrĂ€uchlichen Kernen und realen Anwendungen, untermauert die Tauglichkeit von OpenMP – und eine Reihe von Experimenten zeigt, wie dieses Modell zu einer deutlichen Beschleunigung (bis zu 48-fach) in verschiedenen parallelen Anwendungen fĂŒhrt.As the complexity of systems-on-chip (SoCs) continues to increase, it is no longer possible to ignore the challenges caused by the convergence of software and hardware development. This involves attempts to deal with the hierarchical design – in which several cores are grouped in clusters or tiles – to ensure low-latency, high-bandwidth local communication by relying on fast local memories. From a programmer’s perspec- tive, it is desirable to make use of these peculiarities of the hardware, which must be clearly and carefully taken into account when designing the support for high-level parallel programming models. This dissertation overcomes many scalability bottlenecks in cluster-based many-core systems and introduces the OpenMP programming model as a means of simplifying application development. OpenMP represents an abstraction of the programmer’s view by providing abundant directives that decompose loops in sequential programs and lead to parallel programs. In this work, the full OpenMP model is implemented on a specific instance of a cluster-based many-core system: the Intel Single-chip Cloud Computer (SCC). In this thesis, a lightweight and highly optimized runtime layer for OpenMP execution and memory model by generating the parallel code that is automatically compiled by native back-end compiler (GCC 4.6) that linked with the runtime library. In this dissertation, I will address an efficient design approach of the OpenMP pro- gramming model for the Intel SCC as an example for cluster-based systems. The SCC OpenMP runtime library is designed to cope with three main challenges in a non-cache coherent system: 1. Executing unmodified legacy OpenMP programs on such system. 2. Landing OpenMP memory model on the SCC. 3. Synchronization in the work of parallel threads accounts for a sizeable fraction of an application’s execution time. Furthermore, the effectiveness of OpenMP is demonstrated on a set of widely used kernels and real-world applications. An extensive set of experiments shows how this model achieves significant parallel speedups up to 48x in several applications

    Tiled microprocessors

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 251-258).Current-day microprocessors have reached the point of diminishing returns due to inherent scalability limitations. This thesis examines the tiled microprocessor, a class of microprocessor which is physically scalable but inherits many of the desirable properties of conventional microprocessors. Tiled microprocessors are composed of an array of replicated tiles connected by a special class of network, the Scalar Operand Network (SON), which is optimized for low-latency, low-occupancy communication between remote ALUs on different tiles. Tiled microprocessors can be constructed to scale to 100's or 1000's of functional units. This thesis identifies seven key criteria for achieving physical scalability in tiled microprocessors. It employs an archetypal tiled microprocessor to examine the challenges in achieving these criteria and to explore the properties of Scalar Operand Networks. The thesis develops the field of SONs in three major ways: it introduces the 5-tuple performance metric, it describes a complete, high-frequency SON implementation, and it proposes a taxonomy, called AsTrO, for categorizing them.(cont.) To develop these ideas, the thesis details the design, implementation and analysis of a tiled microprocessor prototype, the Raw Microprocessor, which was implemented at MIT in 180 nm technology. Overall, compared to Raw, recent commercial processors with half the transistors required 30x as many lines of code, occupied 100x as many designers, contained 50x as many pre-tapeout bugs, and resulted in 33x as many post-tapeout bugs. At the same time, the Raw microprocessor proves to be more versatile in exploiting ILP, stream, and server-farm workloads with modest to large amounts of parallelism.by Michael Bedford Taylor.Ph.D
    corecore