254 research outputs found
Recommended from our members
Design and performance optimization of asynchronous networks-on-chip
As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems.
In the past two decades, networks-on-chip (NoCâs) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoCâs, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoCâs, therefore, enable a modular and extensible system composition for an âobject-orientâ design style.
The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoCâs are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions.
First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called âmonitoring networkâ, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories â a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains.
Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow.
Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoCâs for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel.
In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement
CIRCUITS AND ARCHITECTURE FOR BIO-INSPIRED AI ACCELERATORS
Technological advances in microelectronics envisioned through Mooreâs law have led to powerful processors that can handle complex and computationally intensive tasks. Nonetheless, these advancements through technology scaling have come at an unfavorable cost of significantly larger power consumption, which has posed challenges for data processing centers and computers at scale. Moreover, with the emergence of mobile computing platforms constrained by power and bandwidth for distributed computing, the necessity for more energy-efficient scalable local processing has become more significant. Unconventional Compute-in-Memory architectures such as the analog winner-takes-all associative-memory and the Charge-Injection Device processor have been proposed as alternatives.
Unconventional charge-based computation has been employed for neural network accelerators in the past, where impressive energy efficiency per operation has been attained in 1-bit vector-vector multiplications, and in recent work, multi-bit vector-vector multiplications. In the latter, computation was carried out by counting quanta of charge at the thermal noise limit, using packets of about 1000 electrons. These systems are neither analog nor digital in the traditional sense but employ mixed-signal circuits to count the packets of charge and hence we call them Quasi-Digital. By amortizing the energy costs of the mixed-signal encoding/decoding over compute-vectors with many elements, high energy efficiencies can be achieved.
In this dissertation, I present a design framework for AI accelerators using scalable compute-in-memory architectures. On the device level, two primitive elements are designed and characterized as target computational technologies: (i) a multilevel non-volatile cell and (ii) a pseudo Dynamic Random-Access Memory (pseudo-DRAM) bit-cell. At the level of circuit description, compute-in-memory crossbars and mixed-signal circuits were designed, allowing seamless connectivity to digital controllers. At the level of data representation, both binary and stochastic-unary coding are used to compute Vector-Vector Multiplications (VMMs) at the array level. Finally, on the architectural level, two AI accelerator for data-center processing and edge computing are discussed. Both designs are scalable multi-core Systems-on-Chip (SoCs), where vector-processor arrays are tiled on a 2-layer Network-on-Chip (NoC), enabling neighbor communication and flexible compute vs. memory trade-off. General purpose Arm/RISCV co-processors provide adequate bootstrapping and system-housekeeping and a high-speed interface fabric facilitates Input/Output to main memory
Predicting power scalability in a reconfigurable platform
This thesis focuses on the evolution of digital hardware systems. A reconfigurable platform is proposed and analysed based on thin-body, fully-depleted silicon-on-insulator Schottky-barrier transistors with metal gates and silicide source/drain (TBFDSBSOI). These offer the potential for simplified processing that will allow them to reach ultimate nanoscale gate dimensions. Technology CAD was used to show that the threshold voltage in TBFDSBSOI devices will be controllable by gate potentials that scale down with the channel dimensions while remaining within appropriate gate reliability limits. SPICE simulations determined that the magnitude of the threshold shift predicted by TCAD software would be sufficient to control the logic configuration of a simple, regular array of these TBFDSBSOI transistors as well as to constrain its overall subthreshold power growth. Using these devices, a reconfigurable platform is proposed based on a regular 6-input, 6-output NOR LUT block in which the logic and configuration functions of the array are mapped onto separate gates of the double-gate device. A new analytic model of the relationship between power (P), area (A) and performance (T) has been developed based on a simple VLSI complexity metric of the form ATÏ = constant. As Ï defines the performance âreturnâ gained as a result of an increase in area, it also represents a bound on the architectural options available in power-scalable digital systems. This analytic model was used to determine that simple computing functions mapped to the reconfigurable platform will exhibit continuous power-area-performance scaling behavior. A number of simple arithmetic circuits were mapped to the array and their delay and subthreshold leakage analysed over a representative range of supply and threshold voltages, thus determining a worse-case range for the device/circuit-level parameters of the model. Finally, an architectural simulation was built in VHDL-AMS. The frequency scaling described by Ï, combined with the device/circuit-level parameters predicts the overall power and performance scaling of parallel architectures mapped to the array
Space Communications: Theory and Applications. Volume 3: Information Processing and Advanced Techniques. A Bibliography, 1958 - 1963
Annotated bibliography on information processing and advanced communication techniques - theory and applications of space communication
Precision Pointing Control System (PPCS) system design and analysis
The precision pointing control system (PPCS) is an integrated system for precision attitude determination and orientation of gimbaled experiment platforms. The PPCS concept configures the system to perform orientation of up to six independent gimbaled experiment platforms to design goal accuracy of 0.001 degrees, and to operate in conjunction with a three-axis stabilized earth-oriented spacecraft in orbits ranging from low altitude (200-2500 n.m., sun synchronous) to 24 hour geosynchronous, with a design goal life of 3 to 5 years. The system comprises two complementary functions: (1) attitude determination where the attitude of a defined set of body-fixed reference axes is determined relative to a known set of reference axes fixed in inertial space; and (2) pointing control where gimbal orientation is controlled, open-loop (without use of payload error/feedback) with respect to a defined set of body-fixed reference axes to produce pointing to a desired target
Statically-analyzed stream monitoring for cyber-physical Systems
Cyber-physical systems are digital systems interacting with the physical world. Even though this induces an inherent complexity, they are responsible for safety-critical tasks like governing nuclear power plants or controlling autonomous vehicles. To preserve trust into the safety of such systems, this thesis presents a runtime verification approach designed to generate trustworthy monitors from a formal specification. These monitors are responsible for observing the cyber-physical system during runtime and ensuring its safety. As underlying language, I present the asynchronous real-time specification language RTLola. It contains primitives for arithmetic properties and grants precise control over the timing of the monitor. With this, it enables specifiers to express properties relevant to cyber-physical systems. The thesis further presents a static analysis that identifies inconsistencies in the specification and provides insights into the dynamic behavior of the monitor. As a result, the resource consumption of the monitor becomes predictable. The generation of the monitor produces either a hardware description synthesizable onto programmable hardware, or Rust code with verification annotation. These annotations allow for proving the correctness of the monitor with respect to the semantics of RTLola. Last, I present the construction of a conservative hybrid model of the underlying system using information extracted from the specification. This model enables further verification steps.Cyber-physische Systeme sind digitale Systeme, die mit der physischen Welt interagieren. Obwohl das zu einer inhĂ€renten KomplexitĂ€t fĂŒhrt, sind sie verantwortlich fĂŒr sicherheitskritische Aufgaben wie der Steuerung von Kernkraftwerken oder autonomen Fahrzeugen. Umdas Vertrauen in deren Sicherheit zu wahren, prĂ€sentiert diese Doktorarbeit einen Ansatz zur Laufzeitverifikation, konzipiert, um vertrauenswĂŒrdige Monitore aus einer formalen Spezifikation zu generieren. Diese Monitore sind dafĂŒr verantwortlich, das cyber-physische System zur Laufzeit zu ĂŒberwachen und dessen Sicherheit zu gewĂ€hrleisten. Als zugrundeliegende Sprache prĂ€sentiere ich die asynchrone Echtzeit-Spezifikationssprache RTLola. Sie enthĂ€lt Primitiven fĂŒr arithmetische Eigenschaften und gewĂ€hrt prĂ€zise Kontrolle ĂŒber das Timing des Monitors. Damit wird es Spezifizierenden ermöglicht Eigenschaften auszudrĂŒcken, die fĂŒr Cyber-physische Systeme relevant sind. Weiterhin prĂ€sentiert diese Doktorarbeit eine statische Analyse, die Unstimmigkeiten in der Spezifikation identifiziert und Einblicke in das dynamische Verhalten des Monitors liefert. Aufgrund dessen wird der Ressourcenverbrauch des Monitors vorhersehbar. Die Generierung des Monitors erzeugt entweder eine Hardwarebeschreibung, die auf programmierbarer Hardware synthetisiert werden kann, oder Rust Code mit Verifikationsannotationen. Diese Annotationen erlauben es, die Korrektheit des Monitors bezogen auf die Semantik von RTLola zu beweisen. AbschlieĂend prĂ€sentiere ich die Konstruktion von einem konservativen hybriden Modell des zugrundeliegenden Systems anhand von Informationen, die aus der Spezifikation gewonnen wurden. Dieses Modell ermöglicht weitere Verifikationsschritte
Integrated Application of Active Controls (IAAC) technology to an advanced subsonic transport project: Current and advanced act control system definition study. Volume 2: Appendices
The current status of the Active Controls Technology (ACT) for the advanced subsonic transport project is investigated through analysis of the systems technical data. Control systems technologies under examination include computerized reliability analysis, pitch axis fly by wire actuator, flaperon actuation system design trade study, control law synthesis and analysis, flutter mode control and gust load alleviation analysis, and implementation of alternative ACT systems. Extensive analysis of the computer techniques involved in each system is included
SpiNNaker - A Spiking Neural Network Architecture
20 years in conception and 15 in construction, the SpiNNaker project has delivered the worldâs largest neuromorphic computing platform incorporating over a million ARM mobile phone processors and capable of modelling spiking neural networks of the scale of a mouse brain in biological real time. This machine, hosted at the University of Manchester in the UK, is freely available under the auspices of the EU Flagship Human Brain Project. This book tells the story of the origins of the machine, its development and its deployment, and the immense software development effort that has gone into making it openly available and accessible to researchers and students the world over. It also presents exemplar applications from âTalkâ, a SpiNNaker-controlled robotic exhibit at the Manchester Art Gallery as part of âThe Imitation Gameâ, a set of works commissioned in 2016 in honour of Alan Turing, through to a way to solve hard computing problems using stochastic neural networks. The book concludes with a look to the future, and the SpiNNaker-2 machine which is yet to come
- âŠ