53,523 research outputs found

    Application-Specific Heterogeneous Network-on-Chip Design

    Get PDF
    Cataloged from PDF version of article.As a result of increasing communication demands, application-specific and scalable Network-on-Chips (NoCs) have emerged to connect processing cores and subsystems in Multiprocessor System-on-Chips. A challenge in application-specific NoC design is to find the right balance among different tradeoffs, such as communication latency, power consumption and chip area. We propose a novel approach that generates latency-aware heterogeneous NoC topology. Experimental results show that our approach improves the total communication latency up to 27% with modest power consumption. © 2013 The Author 2013. Published by Oxford University Press on behalf of The British Computer Society

    Entwicklung einer Fully-Convolutional-Netzwerkarchitektur für die Detektion von defekten LED-Chips in Photolumineszenzbildern

    Get PDF
    Nowadays, light-emitting diodes (LEDs) can be found in a large variety of applications, from standard LEDs in domestic lighting solutions to advanced chip designs in automobiles, smart watches and video walls. The advances in chip design also affect the test processes, where the execution of certain contact measurements is exacerbated by ever decreasing chip dimensions or even rendered impossible due to the chip design. As an instance, wafer probing determines the electrical and optical properties of all LED chips on a wafer by contacting each and every chip with a prober needle. Chip designs without a contact pad on the surface, however, elude wafer probing and while electrical and optical properties can be determined by sample measurements, defective LED chips are distributed randomly over the wafer. Here, advanced data analysis methods provide a new approach to gather defect information from already available non-contact measurements. Photoluminescence measurements, for example, record a brightness image of an LED wafer, where conspicuous brightness values indicate defective chips. To extract these defect information from photoluminescence images, a computer-vision algorithm is required that transforms photoluminescence images into defect maps. In other words, each and every pixel of a photoluminescence image must be classifed into a class category via semantic segmentation, where so-called fully-convolutional-network algorithms represent the state-of-the-art method. However, the aforementioned task poses several challenges: on the one hand, each pixel in a photoluminescence image represents an LED chip and thus, pixel-fine output resolution is required. On the other hand, photoluminescence images show a variety of brightness values from wafer to wafer in addition to local areas of differing brightness. Additionally, clusters of defective chips assume various shapes, sizes and brightness gradients and thus, the algorithm must reliably recognise objects at multiple scales. Finally, not all salient brightness values correspond to defective LED chips, requiring the algorithm to distinguish salient brightness values corresponding to measurement artefacts, non-defect structures and defects, respectively. In this dissertation, a novel fully-convolutional-network architecture was developed that allows the accurate segmentation of defective LED chips in highly variable photoluminescence wafer images. For this purpose, the basic fully-convolutional-network architecture was modifed with regard to the given application and advanced architectural concepts were incorporated so as to enable a pixel-fine output resolution and a reliable segmentation of multiple scaled defect structures. Altogether, the developed dense ASPP Vaughan architecture achieved a pixel accuracy of 97.5 %, mean pixel accuracy of 96.2% and defect-class accuracy of 92.0 %, trained on a dataset of 136 input-label pairs and hereby showed that fully-convolutional-network algorithms can be a valuable contribution to data analysis in industrial manufacturing.Leuchtdioden (LEDs) werden heutzutage in einer Vielzahl von Anwendungen verbaut, angefangen bei Standard-LEDs in der Hausbeleuchtung bis hin zu technisch fortgeschrittenen Chip-Designs in Automobilen, Smartwatches und Videowänden. Die Weiterentwicklungen im Chip-Design beeinflussen auch die Testprozesse: Hierbei wird die Durchführung bestimmter Kontaktmessungen durch zunehmend verringerte Chip-Dimensionen entweder erschwert oder ist aufgrund des Chip-Designs unmöglich. Die sogenannteWafer-Prober-Messung beispielsweise ermittelt die elektrischen und optischen Eigenschaften aller LED-Chips auf einem Wafer, indem jeder einzelne Chip mit einer Messnadel kontaktiert und vermessen wird; Chip-Designs ohne Kontaktpad auf der Oberfläche können daher nicht durch die Wafer-Prober-Messung charakterisiert werden. Während die elektrischen und optischen Chip-Eigenschaften auch mittels Stichprobenmessungen bestimmt werden können, verteilen sich defekte LED-Chips zufällig über die Waferfläche. Fortgeschrittene Datenanalysemethoden ermöglichen hierbei einen neuen Ansatz, Defektinformationen aus bereits vorhandenen, berührungslosen Messungen zu gewinnen. Photolumineszenzmessungen, beispielsweise, erfassen ein Helligkeitsbild des LEDWafers, in dem auffällige Helligkeitswerte auf defekte LED-Chips hinweisen. Ein Bildverarbeitungsalgorithmus, der diese Defektinformationen aus Photolumineszenzbildern extrahiert und ein Defektabbild erstellt, muss hierzu jeden einzelnen Bildpunkt mittels semantischer Segmentation klassifizieren, eine Technik bei der sogenannte Fully-Convolutional-Netzwerke den Stand der Technik darstellen. Die beschriebene Aufgabe wird jedoch durch mehrere Faktoren erschwert: Einerseits entspricht jeder Bildpunkt eines Photolumineszenzbildes einem LED-Chip, so dass eine bildpunktfeine Auflösung der Netzwerkausgabe notwendig ist. Andererseits weisen Photolumineszenzbilder sowohl stark variierende Helligkeitswerte von Wafer zu Wafer als auch lokal begrenzte Helligkeitsabweichungen auf. Zusätzlich nehmen Defektanhäufungen unterschiedliche Formen, Größen und Helligkeitsgradienten an, weswegen der Algorithmus Objekte verschiedener Abmessungen zuverlässig erkennen können muss. Schlussendlich weisen nicht alle auffälligen Helligkeitswerte auf defekte LED-Chips hin, so dass der Algorithmus in der Lage sein muss zu unterscheiden, ob auffällige Helligkeitswerte mit Messartefakten, defekten LED-Chips oder defektfreien Strukturen korrelieren. In dieser Dissertation wurde eine neuartige Fully-Convolutional-Netzwerkarchitektur entwickelt, die die akkurate Segmentierung defekter LED-Chips in stark variierenden Photolumineszenzbildern von LED-Wafern ermöglicht. Zu diesem Zweck wurde die klassische Fully-Convolutional-Netzwerkarchitektur hinsichtlich der beschriebenen Anwendung angepasst und fortgeschrittene architektonische Konzepte eingearbeitet, um eine bildpunktfeine Ausgabeauflösung und eine zuverlässige Sementierung verschieden großer Defektstrukturen umzusetzen. Insgesamt erzielt die entwickelte dense-ASPP-Vaughan-Architektur eine Pixelgenauigkeit von 97,5 %, durchschnittliche Pixelgenauigkeit von 96,2% und eine Defektklassengenauigkeit von 92,0 %, trainiert mit einem Datensatz von 136 Bildern. Hiermit konnte gezeigt werden, dass Fully-Convolutional-Netzwerke eine wertvolle Erweiterung der Datenanalysemethoden sein können, die in der industriellen Fertigung eingesetzt werden

    Energy consumption in networks on chip : efficiency and scaling

    Get PDF
    Computer architecture design is in a new era where performance is increased by replicating processing cores on a chip rather than making CPUs larger and faster. This design strategy is motivated by the superior energy efficiency of the multi-core architecture compared to the traditional monolithic CPU. If the trend continues as expected, the number of cores on a chip is predicted to grow exponentially over time as the density of transistors on a die increases. A major challenge to the efficiency of multi-core chips is the energy used for communication among cores over a Network on Chip (NoC). As the number of cores increases, this energy also increases, imposing serious constraints on design and performance of both applications and architectures. Therefore, understanding the impact of different design choices on NoC power and energy consumption is crucial to the success of the multi- and many-core designs. This dissertation proposes methods for modeling and optimizing energy consumption in multi- and many-core chips, with special focus on the energy used for communication on the NoC. We present a number of tools and models to optimize energy consumption and model its scaling behavior as the number of cores increases. We use synthetic traffic patterns and full system simulations to test and validate our methods. Finally, we take a step back and look at the evolution of computer hardware in the last 40 years and, using a scaling theory from biology, present a predictive theory for power-performance scaling in microprocessor systems

    Wireless Interconnects for Intra-chip & Inter-chip Transmission

    Get PDF
    With the emergence of Internet of Things and information revolution, the demand of high performance computing systems is increasing. The copper interconnects inside the computing chips have evolved into a sophisticated network of interconnects known as Network on Chip (NoC) comprising of routers, switches, repeaters, just like computer networks. When network on chip is implemented on a large scale like in Multicore Multichip (MCMC) systems for High Performance Computing (HPC) systems, length of interconnects increases and so are the problems like power dissipation, interconnect delays, clock synchronization and electrical noise. In this thesis, wireless interconnects are chosen as the substitute for wired copper interconnects. Wireless interconnects offer easy integration with CMOS fabrication and chip packaging. Using wireless interconnects working at unlicensed mm-wave band (57-64GHz), high data rate of Gbps can be achieved. This thesis presents study of transmission between zigzag antennas as wireless interconnects for Multichip multicores (MCMC) systems and 3D IC. For MCMC systems, a four-chips 16-cores model is analyzed with only four wireless interconnects in three configurations with different antenna orientations and locations. Return loss and transmission coefficients are simulated in ANSYS HFSS. Moreover, wireless interconnects are designed, fabricated and tested on a 6’’ silicon wafer with resistivity of 55Ω-cm using a basic standard CMOS process. Wireless interconnect are designed to work at 30GHz using ANSYS HFSS. The fabricated antennas are resonating around 20GHz with a return loss of less than -10dB. The transmission coefficients between antenna pair within a 20mm x 20mm silicon die is found to be varying between -45dB to -55dB. Furthermore, wireless interconnect approach is extended for 3D IC. Wireless interconnects are implemented as zigzag antenna. This thesis extends the work of analyzing the wireless interconnects in 3D IC with different configurations of antenna orientations and coolants. The return loss and transmission coefficients are simulated using ANSYS HFSS

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Energy-Efficient Neural Network Hardware Design and Circuit Techniques to Enhance Hardware Security

    Get PDF
    University of Minnesota Ph.D. dissertation. May 2019. Major: Electrical Engineering. Advisor: Chris Kim. 1 computer file (PDF); ix, 108 pages.Artificial intelligence (AI) algorithms and hardware are being developed at a rapid pace for emerging applications such as self-driving cars, speech/image/video recognition, deep learning, etc. Today’s AI tasks are mostly performed at remote datacenters, while in the future, more AI workloads are expected to run on edge devices. To fulfill this goal, innovative design techniques are needed to improve energy-efficiency, form factor, and as well as the security of AI chips. In this dissertation, two topics are focused on to address these challenges: building energy-efficient AI chips based on various neural network architectures, and designing “chip fingerprint” circuits as well as counterfeit chip sensors to improve hardware security. First of all, in order to deploy AI tasks on edge devices, we come up with various energy and area efficient computing platforms. One is a novel time-domain computing scheme for fully connected multi-layer perceptron (MLP) neural network and the other is an efficient binarized architecture for long short-term memory (LSTM) neural network. Secondly, to enhance the hardware security and ensure secure data communication between edge devices, we need to make sure the authenticity of the chip. Physical Unclonable Function (PUF) is a circuit primitive that can serve as a chip “fingerprint” by generating a unique ID for each chip. Another source of security concerns comes from the counterfeit ICs, and recycled and remarked ICs account for more than 80% of the counterfeit electronics. To effectively detect those counterfeit chips that have been physically compromised, we came up with a passive IC tamper sensor. This proposed sensor is demonstrated to be able to efficiently and reliably detect suspicious activities such as high temperature cycling, ambient humidity rise, and increased dust particles in the chip cavity

    Address-Event based Platform for Bio-inspired Spiking Systems

    Get PDF
    Address Event Representation (AER) is an emergent neuromorphic interchip communication protocol that allows a real-time virtual massive connectivity between huge number neurons, located on different chips. By exploiting high speed digital communication circuits (with nano-seconds timings), synaptic neural connections can be time multiplexed, while neural activity signals (with mili-seconds timings) are sampled at low frequencies. Also, neurons generate "events" according to their activity levels. More active neurons generate more events per unit time, and access the interchip communication channel more frequently, while neurons with low activity consume less communication bandwidth. When building multi-chip muti-layered AER systems, it is absolutely necessary to have a computer interface that allows (a) reading AER interchip traffic into the computer and visualizing it on the screen, and (b) converting conventional frame-based video stream in the computer into AER and injecting it at some point of the AER structure. This is necessary for test and debugging of complex AER systems. In the other hand, the use of a commercial personal computer implies to depend on software tools and operating systems that can make the system slower and un-robust. This paper addresses the problem of communicating several AER based chips to compose a powerful processing system. The problem was discussed in the Neuromorphic Engineering Workshop of 2006. The platform is based basically on an embedded computer, a powerful FPGA and serial links, to make the system faster and be stand alone (independent from a PC). A new platform is presented that allow to connect up to eight AER based chips to a Spartan 3 4000 FPGA. The FPGA is responsible of the network communication based in Address-Event and, at the same time, to map and transform the address space of the traffic to implement a pre-processing. A MMU microprocessor (Intel XScale 400MHz Gumstix Connex computer) is also connected to the FPGA to allow the platform to implement eventbased algorithms to interact to the AER system, like control algorithms, network connectivity, USB support, etc. The LVDS transceiver allows a bandwidth of up to 1.32 Gbps, around ~66 Mega events per second (Mevps)

    GRAPE-6: The massively-parallel special-purpose computer for astrophysical particle simulation

    Full text link
    In this paper, we describe the architecture and performance of the GRAPE-6 system, a massively-parallel special-purpose computer for astrophysical NN-body simulations. GRAPE-6 is the successor of GRAPE-4, which was completed in 1995 and achieved the theoretical peak speed of 1.08 Tflops. As was the case with GRAPE-4, the primary application of GRAPE-6 is simulation of collisional systems, though it can be used for collisionless systems. The main differences between GRAPE-4 and GRAPE-6 are (a) The processor chip of GRAPE-6 integrates 6 force-calculation pipelines, compared to one pipeline of GRAPE-4 (which needed 3 clock cycles to calculate one interaction), (b) the clock speed is increased from 32 to 90 MHz, and (c) the total number of processor chips is increased from 1728 to 2048. These improvements resulted in the peak speed of 64 Tflops. We also discuss the design of the successor of GRAPE-6.Comment: Accepted for publication in PASJ, scheduled to appear in Vol. 55, No.
    corecore