1,208 research outputs found
Parallel convolution processing using an integrated photonic tensor core
With the proliferation of ultra-high-speed mobile networks and
internet-connected devices, along with the rise of artificial intelligence, the
world is generating exponentially increasing amounts of data - data that needs
to be processed in a fast, efficient and smart way. These developments are
pushing the limits of existing computing paradigms, and highly parallelized,
fast and scalable hardware concepts are becoming progressively more important.
Here, we demonstrate a computational specific integrated photonic tensor core -
the optical analog of an ASIC-capable of operating at Tera-Multiply-Accumulate
per second (TMAC/s) speeds. The photonic core achieves parallelized photonic
in-memory computing using phase-change memory arrays and photonic chip-based
optical frequency combs (soliton microcombs). The computation is reduced to
measuring the optical transmission of reconfigurable and non-resonant passive
components and can operate at a bandwidth exceeding 14 GHz, limited only by the
speed of the modulators and photodetectors. Given recent advances in hybrid
integration of soliton microcombs at microwave line rates, ultra-low loss
silicon nitride waveguides, and high speed on-chip detectors and modulators,
our approach provides a path towards full CMOS wafer-scale integration of the
photonic tensor core. While we focus on convolution processing, more generally
our results indicate the major potential of integrated photonics for parallel,
fast, and efficient computational hardware in demanding AI applications such as
autonomous driving, live video processing, and next generation cloud computing
services
Building a photonic tensor core unit with an electronic interface for convolution processing
With huge amounts of data being generated every second, the demand for parallelized, high speed, and efficient computing power is rising rapidly, pushing the limits of existing computing paradigms. In this circumstance, photonic computing hardware is a promising alternative to conventional electronics with prospects of speed and remarkably power efficient at accelerating multiply-accumulate (MAC) operations. Moreover, optical computing enables massive parallelism over their electronic counter parts through wavelength division multiplexing. This work involves the design and fabrication of an integrated photonic tensor core (PTC) capable of performing 60 millon MAC operations per second. Optical computing hardware makes use of multiple electro-optic and digital-analog converters. This work also involves the design and characterisation of a dedicated electronic interface to feed data to the PTC. In order to demonstrate the application potential, we perform convolution processing on 2D images in the optical domain with the newly developed hardware
Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors
Photonic Microring Resonator (MRR) based hardware accelerators have been
shown to provide disruptive speedup and energy-efficiency improvements for
processing deep Convolutional Neural Networks (CNNs). However, previous
MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with
mixed-sized tensors. One example of such CNNs is depthwise separable CNNs.
Performing inferences of CNNs with mixed-sized tensors on such inflexible
accelerators often leads to low hardware utilization, which diminishes the
achievable performance and energy efficiency from the accelerators. In this
paper, we present a novel way of introducing reconfigurability in the MRR-based
CNN accelerators, to enable dynamic maximization of the size compatibility
between the accelerator hardware components and the CNN tensors that are
processed using the hardware components. We classify the state-of-the-art
MRR-based CNN accelerators from prior works into two categories, based on the
layout and relative placements of the utilized hardware components in the
accelerators. We then use our method to introduce reconfigurability in
accelerators from these two classes, to consequently improve their parallelism,
the flexibility of efficiently mapping tensors of different sizes, speed, and
overall energy efficiency. We evaluate our reconfigurable accelerators against
three prior works for the area proportionate outlook (equal hardware area for
all accelerators). Our evaluation for the inference of four modern CNNs
indicates that our designed reconfigurable CNN accelerators provide
improvements of up to 1.8x in Frames-Per-Second (FPS) and up to 1.5x in FPS/W,
compared to an MRR-based accelerator from prior work.Comment: Paper accepted at CASES (ESWEEK) 202
SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs
The acceleration of a CNN inference task uses convolution operations that are
typically transformed into vector-dot-product (VDP) operations. Several
photonic microring resonators (MRRs) based hardware architectures have been
proposed to accelerate integer-quantized CNNs with remarkably higher throughput
and energy efficiency compared to their electronic counterparts. However, the
existing photonic MRR-based analog accelerators exhibit a very strong trade-off
between the achievable input/weight precision and VDP operation size, which
severely restricts their achievable VDP operation size for the quantized
input/weight precision of 4 bits and higher. The restricted VDP operation size
ultimately suppresses computing throughput to severely diminish the achievable
performance benefits. To address this shortcoming, we for the first time
present a merger of stochastic computing and MRR-based CNN accelerators. To
leverage the innate precision flexibility of stochastic computing, we invent an
MRR-based optical stochastic multiplier (OSM). We employ multiple OSMs in a
cascaded manner using dense wavelength division multiplexing, to forge a novel
Stochastic Computing based Optical Neural Network Accelerator (SCONNA). SCONNA
achieves significantly high throughput and energy efficiency for accelerating
inferences of high-precision quantized CNNs. Our evaluation for the inference
of four modern CNNs at 8-bit input/weight precision indicates that SCONNA
provides improvements of up to 66.5x, 90x, and 91x in frames-per-second (FPS),
FPS/W and FPS/W/mm2, respectively, on average over two photonic MRR-based
analog CNN accelerators from prior work, with Top-1 accuracy drop of only up to
0.4% for large CNNs and up to 1.5% for small CNNs. We developed a
transaction-level, event-driven python-based simulator for the evaluation of
SCONNA and other accelerators (https://github.com/uky-UCAT/SC_ONN_SIM.git).Comment: To Appear at IPDPS 202
Parallel convolutional processing using an integrated photonic tensor core
This is the author accepted manuscript. The final version is available from Nature Research via the DOI in this recordData availability:
All data used in this study are available from the corresponding author upon reasonable request.With the proliferation of ultrahigh-speed mobile networks and internet-connected devices, along with the rise of artificial intelligence (AI)1, the world is generating exponentially increasing amounts of data that need to be processed in a fast and efficient way. Highly parallelized, fast and scalable hardware is therefore becoming progressively more important2. Here we demonstrate a computationally specific integrated photonic hardware accelerator (tensor core) that is capable of operating at speeds of trillions of multiply-accumulate operations per second (1012 MAC operations per second or tera-MACs per second). The tensor core can be considered as the optical analogue of an application-specific integrated circuit (ASIC). It achieves parallelized photonic in-memory computing using phase-change-material memory arrays and photonic chip-based optical frequency combs (soliton microcombs3). The computation is reduced to measuring the optical transmission of reconfigurable and non-resonant passive components and can operate at a bandwidth exceeding 14 gigahertz, limited only by the speed of the modulators and photodetectors. Given recent advances in hybrid integration of soliton microcombs at microwave line rates3,4,5, ultralow-loss silicon nitride waveguides6,7, and high-speed on-chip detectors and modulators, our approach provides a path towards full complementary metal–oxide–semiconductor (CMOS) wafer-scale integration of the photonic tensor core. Although we focus on convolutional processing, more generally our results indicate the potential of integrated photonics for parallel, fast, and efficient computational hardware in data-heavy AI applications such as autonomous driving, live video processing, and next-generation cloud computing services.Engineering and Physical Sciences Research Council (EPSRC)Deutsche Forschungsgemeinschaft (DFG)Air Force Office of Scientific ResearchEuropean Research Council (ERC)European Union Horizon 2020Studienstiftung des deutschen Volke
Higher-dimensional processing using a photonic tensor core with continuous-time data
This is the final version. Available from Nature Research via the DOI in this record. Data availability: The data that support the findings of this study are available from the corresponding author upon request. The ECG dataset analysed in this study is available from the open-source ‘Sudden Cardiac Death Holter Database’ via PhysioNet at https://doi.org/10.13026/C2W306. A sustainability report related to this article is available at https://nanoeng.materials.ox.ac.uk/sustainability.Code availability: The code used in the present work is available from the corresponding author upon request.New developments in hardware-based ‘accelerators’ range from electronic tensor cores and memristor-based arrays to photonic implementations. The goal of these approaches is to handle the exponentially growing computational load of machine learning, which currently requires the doubling of hardware capability approximately every 3.5 months. One solution is increasing the data dimensionality that is processable by such hardware. Although two-dimensional data processing by multiplexing space and wavelength has been previously reported, the use of three-dimensional processing has not yet been implemented in hardware. In this paper, we introduce the radio-frequency modulation of photonic signals to increase parallelization, adding an additional dimension to the data alongside spatially distributed non-volatile memories and wavelength multiplexing. We leverage higher-dimensional processing to configure such a system to an architecture compatible with edge computing frameworks. Our system achieves a parallelism of 100, two orders higher than implementations using only the spatial and wavelength degrees of freedom. We demonstrate this by performing a synchronous convolution of 100 clinical electrocardiogram signals from patients with cardiovascular diseases, and constructing a convolutional neural network capable of identifying patients at sudden death risk with 93.5% accuracy.European Union’s Horizon 2020European Union’s Innovation Council Pathfinder programmeSingapore A*STAR International Fellowshi
In-memory photonic dot-product engine with electrically programmable weight banks
Electronically reprogrammable photonic circuits based on phase-change chalcogenides present an avenue to resolve the von-Neumann bottleneck; however, implementation of such hybrid photonic–electronic processing has not achieved computational success. Here, we achieve this milestone by demonstrating an in-memory photonic–electronic dot-product engine, one that decouples electronic programming of phase-change materials (PCMs) and photonic computation. Specifically, we develop non-volatile electronically reprogrammable PCM memory cells with a record-high 4-bit weight encoding, the lowest energy consumption per unit modulation depth (1.7 nJ/dB) for Erase operation (crystallization), and a high switching contrast (158.5%) using non-resonant silicon-on-insulator waveguide microheater devices. This enables us to perform parallel multiplications for image processing with a superior contrast-to-noise ratio (≥87.36) that leads to an enhanced computing accuracy (standard deviation σ ≤ 0.007). An in-memory hybrid computing system is developed in hardware for convolutional processing for recognizing images from the MNIST database with inferencing accuracies of 86% and 87%
DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator
The wide adoption and significant computing resource consumption of
attention-based Transformers, e.g., Vision Transformer and large language
models, have driven the demands for efficient hardware accelerators. While
electronic accelerators have been commonly used, there is a growing interest in
exploring photonics as an alternative technology due to its high energy
efficiency and ultra-fast processing speed. Optical neural networks (ONNs) have
demonstrated promising results for convolutional neural network (CNN) workloads
that only require weight-static linear operations. However, they fail to
efficiently support Transformer architectures with attention operations due to
the lack of ability to process dynamic full-range tensor multiplication. In
this work, we propose a customized high-performance and energy-efficient
photonic Transformer accelerator, DOTA. To overcome the fundamental limitation
of existing ONNs, we introduce a novel photonic tensor core, consisting of a
crossbar array of interference-based optical vector dot-product engines, that
supports highly-parallel, dynamic, and full-range matrix-matrix multiplication.
Our comprehensive evaluation demonstrates that DOTA achieves a >4x energy and a
>10x latency reduction compared to prior photonic accelerators, and delivers
over 20x energy reduction and 2 to 3 orders of magnitude lower latency compared
to the electronic Transformer accelerator. Our work highlights the immense
potential of photonic computing for efficient hardware accelerators,
particularly for advanced machine learning workloads.Comment: The short version is accepted by Next-Gen AI System Workshop at MLSys
202
A compact butterfly-style silicon photonic-electronic neural chip for hardware-efficient deep learning
The optical neural network (ONN) is a promising hardware platform for
next-generation neurocomputing due to its high parallelism, low latency, and
low energy consumption. Previous ONN architectures are mainly designed for
general matrix multiplication (GEMM), leading to unnecessarily large area cost
and high control complexity. Here, we move beyond classical GEMM-based ONNs and
propose an optical subspace neural network (OSNN) architecture, which trades
the universality of weight representation for lower optical component usage,
area cost, and energy consumption. We devise a butterfly-style
photonic-electronic neural chip to implement our OSNN with up to 7x fewer
trainable optical components compared to GEMM-based ONNs. Additionally, a
hardware-aware training framework is provided to minimize the required device
programming precision, lessen the chip area, and boost the noise robustness. We
experimentally demonstrate the utility of our neural chip in practical image
recognition tasks, showing that a measured accuracy of 94.16% can be achieved
in hand-written digit recognition tasks with 3-bit weight programming
precision.Comment: 17 pages,5 figure
- …