21 research outputs found
Neural networks-on-chip for hybrid bio-electronic systems
PhD ThesisBy modelling the brains computation we can further our understanding
of its function and develop novel treatments for neurological disorders. The
brain is incredibly powerful and energy e cient, but its computation does
not t well with the traditional computer architecture developed over the
previous 70 years. Therefore, there is growing research focus in developing
alternative computing technologies to enhance our neural modelling capability,
with the expectation that the technology in itself will also bene t from
increased awareness of neural computational paradigms.
This thesis focuses upon developing a methodology to study the design
of neural computing systems, with an emphasis on studying systems suitable
for biomedical experiments. The methodology allows for the design to be
optimized according to the application. For example, di erent case studies
highlight how to reduce energy consumption, reduce silicon area, or to
increase network throughput.
High performance processing cores are presented for both Hodgkin-Huxley
and Izhikevich neurons incorporating novel design features. Further, a complete
energy/area model for a neural-network-on-chip is derived, which is
used in two exemplar case-studies: a cortical neural circuit to benchmark
typical system performance, illustrating how a 65,000 neuron network could
be processed in real-time within a 100mW power budget; and a scalable highperformance
processing platform for a cerebellar neural prosthesis. From
these case-studies, the contribution of network granularity towards optimal
neural-network-on-chip performance is explored
MOCAST 2021
The 10th International Conference on Modern Circuit and System Technologies on Electronics and Communications (MOCAST 2021) will take place in Thessaloniki, Greece, from July 5th to July 7th, 2021. The MOCAST technical program includes all aspects of circuit and system technologies, from modeling to design, verification, implementation, and application. This Special Issue presents extended versions of top-ranking papers in the conference. The topics of MOCAST include:Analog/RF and mixed signal circuits;Digital circuits and systems design;Nonlinear circuits and systems;Device and circuit modeling;High-performance embedded systems;Systems and applications;Sensors and systems;Machine learning and AI applications;Communication; Network systems;Power management;Imagers, MEMS, medical, and displays;Radiation front ends (nuclear and space application);Education in circuits, systems, and communications
Interconnects architectures for many-core era using surface-wave communication
PhD ThesisNetworks-on-chip (NoCs) is a communication paradigm that has
emerged aiming to address on-chip communication challenges and
to satisfy interconnection demands for chip-multiprocessors (CMPs).
Nonetheless, there is continuous demand for even higher computational
power, which is leading to a relentless downscaling of CMOS
technology to enable the integration of many-cores. However, technology
downscaling is in favour of the gate nodes over wires in terms
of latency and power consumption. Consequently, this has led to the
era of many-core processors where power consumption and performance
are governed by inter-core communications rather than core
computation. Therefore, NoCs need to evolve from being merely metalbased
implementations which threaten to be a performance and power
bottleneck for many-core efficiency and scalability.
To overcome such intensified inter-core communication challenges,
this thesis proposes a novel interconnect technology: the surface-wave
interconnect (SWI). This new RF-based on-chip interconnect has notable
characteristics compared to cutting-edge on-chip interconnects
in terms of CMOS compatibility, high speed signal propagation, low
power dissipation, and massive signal fan-out. Nonetheless, the realization
of the SWI requires investigations at different levels of abstraction,
such as the device integration and RF engineering levels. The aim
of this thesis is to address the networking and system level challenges
and highlight the potential of this interconnect. This should
encourage further research at other levels of abstraction. Two specific
system-level challenges crucial in future many-core systems are tackled
in this study, which are cross-the-chip global communication and
one-to-many communication.
This thesis makes four major contributions towards this aim. The
first is reducing the NoC average-hop count, which would otherwise
increase packet-latency exponentially, by proposing a novel hybrid
interconnect architecture. This hybrid architecture can not only utilize
both regular metal-wire and SWI, but also exploits merits of
both bus and NoC architectures in terms of connectivity compared to
other general-purpose on-chip interconnect architectures. The second
contribution addresses global communication issues by developing
a distance-based weighted-round-robin arbitration (DWA) algorithm.
This technique prioritizes global communication to be send via SWI
short-cuts, which offer more efficient power dissipation and faster
across-the-chip signal propagation. Results obtained using a cycleaccurate
simulator demonstrate the effectiveness of the proposed
system architecture in terms of significant power reduction, considervii
able average delay reduction and higher throughput compared to a
regular NoC. The third contribution is in handling multicast communications,
which are normally associated with traffic overload, hotspots
and deadlocks and therefore increase, by an order of magnitude the
power consumption and latency. This has been achieved by proposing
a novel routing and centralized arbitration schemes that exploits
the SWI0s remarkable fan-out features. The evaluation demonstrates
drastic improvements in the effectiveness of the proposed architecture
in terms of power consumption ( 2-10x) and performance ( 22x) but
with negligible hardware overheads ( 2%). The fourth contribution is
to further explore multicast contention handling in a flexible decentralized
manner, where original techniques such as stretch-multicast
and ID-tagging flow control have been developed. A comparison of
these techniques shows that the decentralized approach is superior
to the centralized approach with low traffic loads, while the latter
outperforms the former near and after NoC saturation
End-to-End Benchmarking of Chiplet-Based In-Memory Computing
In-memory computing (IMC)-based hardware reduces latency and energy consumption for compute-intensive machine learning (ML) applications. Several SRAM/RRAM-based IMC hardware architectures to accelerate ML applications have been proposed in the literature. However, crossbar-based IMC hardware poses several design challenges. We first discuss the different ML algorithms recently adopted in the literature. We then discuss the hardware implications of ML algorithms. Next, we elucidate the need for IMC architecture and the different components within a conventional IMC architecture. After that, we introduce the need for 2.5D or chiplet-based architectures. We then discuss the different benchmarking simulators proposed for monolithic IMC architectures. Finally, we describe an end-to-end chiplet-based IMC benchmarking simulator, SIAM
Recommended from our members
Design and performance optimization of asynchronous networks-on-chip
As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems.
In the past two decades, networks-on-chip (NoC’s) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoC’s, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoC’s, therefore, enable a modular and extensible system composition for an ‘object-orient’ design style.
The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoC’s are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions.
First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called ‘monitoring network’, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories – a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains.
Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow.
Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoC’s for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel.
In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement
Architecting a One-to-many Traffic-Aware and Secure Millimeter-Wave Wireless Network-in-Package Interconnect for Multichip Systems
With the aggressive scaling of device geometries, the yield of complex Multi Core Single Chip(MCSC) systems with many cores will decrease due to the higher probability of manufacturing defects especially, in dies with a large area. Disintegration of large System-on-Chips(SoCs) into smaller chips called chiplets has shown to improve the yield and cost of complex systems. Therefore, platform-based computing modules such as embedded systems and micro-servers have already adopted Multi Core Multi Chip (MCMC) architectures overMCSC architectures. Due to the scaling of memory intensive parallel applications in such systems, data is more likely to be shared among various cores residing in different chips resulting in a significant increase in chip-to-chip traffic, especially one-to-many traffic. This one-to-many traffic is originated mainly to maintain cache-coherence between many cores residing in multiple chips. Besides, one-to-many traffics are also exploited by many parallel programming models, system-level synchronization mechanisms, and control signals. How-ever, state-of-the-art Network-on-Chip (NoC)-based wired interconnection architectures do not provide enough support as they handle such one-to-many traffic as multiple unicast trafficusing a multi-hop MCMC communication fabric. As a result, even a small portion of such one-to-many traffic can significantly reduce system performance as traditional NoC-basedinterconnect cannot mask the high latency and energy consumption caused by chip-to-chipwired I/Os. Moreover, with the increase in memory intensive applications and scaling of MCMC systems, traditional NoC-based wired interconnects fail to provide a scalable inter-connection solution required to support the increased cache-coherence and synchronization generated one-to-many traffic in future MCMC-based High-Performance Computing (HPC) nodes. Therefore, these computation and memory intensive MCMC systems need an energy-efficient, low latency, and scalable one-to-many (broadcast/multicast) traffic-aware interconnection infrastructure to ensure high-performance.
Research in recent years has shown that Wireless Network-in-Package (WiNiP) architectures with CMOS compatible Millimeter-Wave (mm-wave) transceivers can provide a scalable, low latency, and energy-efficient interconnect solution for on and off-chip communication. In this dissertation, a one-to-many traffic-aware WiNiP interconnection architecture with a starvation-free hybrid Medium Access Control (MAC), an asymmetric topology, and a novel flow control has been proposed. The different components of the proposed architecture are individually one-to-many traffic-aware and as a system, they collaborate with each other to provide required support for one-to-many traffic communication in a MCMC environment. It has been shown that such interconnection architecture can reduce energy consumption and average packet latency by 46.96% and 47.08% respectively for MCMC systems.
Despite providing performance enhancements, wireless channel, being an unguided medium, is vulnerable to various security attacks such as jamming induced Denial-of-Service (DoS), eavesdropping, and spoofing. Further, to minimize the time-to-market and design costs, modern SoCs often use Third Party IPs (3PIPs) from untrusted organizations. An adversary either at the foundry or at the 3PIP design house can introduce a malicious circuitry, to jeopardize an SoC. Such malicious circuitry is known as a Hardware Trojan (HT). An HTplanted in the WiNiP from a vulnerable design or manufacturing process can compromise a Wireless Interface (WI) to enable illegitimate transmission through the infected WI resulting in a potential DoS attack for other WIs in the MCMC system. Moreover, HTs can be used for various other malicious purposes, including battery exhaustion, functionality subversion, and information leakage. This information when leaked to a malicious external attackercan reveals important information regarding the application suites running on the system, thereby compromising the user profile. To address persistent jamming-based DoS attack in WiNiP, in this dissertation, a secure WiNiP interconnection architecture for MCMC systems has been proposed that re-uses the one-to-many traffic-aware MAC and existing Design for Testability (DFT) hardware along with Machine Learning (ML) approach. Furthermore, a novel Simulated Annealing (SA)-based routing obfuscation mechanism was also proposed toprotect against an HT-assisted novel traffic analysis attack. Simulation results show that,the ML classifiers can achieve an accuracy of 99.87% for DoS attack detection while SA-basedrouting obfuscation could reduce application detection accuracy to only 15% for HT-assistedtraffic analysis attack and hence, secure the WiNiP fabric from age-old and emerging attacks