49 research outputs found
Neuromorphic In-Memory Computing Framework using Memtransistor Cross-bar based Support Vector Machines
This paper presents a novel framework for designing support vector machines
(SVMs), which does not impose restriction on the SVM kernel to be
positive-definite and allows the user to define memory constraint in terms of
fixed template vectors. This makes the framework scalable and enables its
implementation for low-power, high-density and memory constrained embedded
application. An efficient hardware implementation of the same is also
discussed, which utilizes novel low power memtransistor based cross-bar
architecture, and is robust to device mismatch and randomness. We used
memtransistor measurement data, and showed that the designed SVMs can achieve
classification accuracy comparable to traditional SVMs on both synthetic and
real-world benchmark datasets. This framework would be beneficial for design of
SVM based wake-up systems for internet of things (IoTs) and edge devices where
memtransistors can be used to optimize system's energy-efficiency and perform
in-memory matrix-vector multiplication (MVM).Comment: 4 pages, 5 figures, MWSCAS 201
Analogue neuromorphic systems.
This thesis addresses a new area of science and technology, that of neuromorphic
systems, namely the problems and prospects of analogue neuromorphic systems. The
subject is subdivided into three chapters.
Chapter 1 is an introduction. It formulates the oncoming problem of the creation
of highly computationally costly systems of nonlinear information processing (such as
artificial neural networks and artificial intelligence systems). It shows that an analogue
technology could make a vital contribution to the creation such systems. The basic principles
of creation of analogue neuromorphic systems are formulated. The importance
will be emphasised of the principle of orthogonality for future highly efficient complex
information processing systems.
Chapter 2 reviews the basics of neural and neuromorphic systems and informs on
the present situation in this field of research, including both experimental and theoretical
knowledge gained up-to-date. The chapter provides the necessary background for
correct interpretation of the results reported in Chapter 3 and for a realistic decision on
the direction for future work.
Chapter 3 describes my own experimental and computational results within the
framework of the subject, obtained at De Montfort University. These include: the
building of (i) Analogue Polynomial Approximator/lnterpolatoriExtrapolator, (ii) Synthesiser
of orthogonal functions, (iii) analogue real-time video filter (performing the
homomorphic filtration), (iv) Adaptive polynomial compensator of geometrical distortions
of CRT- monitors, (v) analogue parallel-learning neural network (backpropagation
algorithm).
Thus, this thesis makes a dual contribution to the chosen field: it summarises the
present knowledge on the possibility of utilising analogue technology in up-to-date and
future computational systems, and it reports new results within the framework of the
subject. The main conclusion is that due to its promising power characteristics, small
sizes and high tolerance to degradation, the analogue neuromorphic systems will playa
more and more important role in future computational systems (in particular in systems
of artificial intelligence)
Simulation and implementation of novel deep learning hardware architectures for resource constrained devices
Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
Integrated Photonic Tensor Processing Unit for a Matrix Multiply: a Review
The explosion of artificial intelligence and machine-learning algorithms,
connected to the exponential growth of the exchanged data, is driving a search
for novel application-specific hardware accelerators. Among the many, the
photonics field appears to be in the perfect spotlight for this global data
explosion, thanks to its almost infinite bandwidth capacity associated with
limited energy consumption. In this review, we will overview the major
advantages that photonics has over electronics for hardware accelerators,
followed by a comparison between the major architectures implemented on
Photonics Integrated Circuits (PIC) for both the linear and nonlinear parts of
Neural Networks. By the end, we will highlight the main driving forces for the
next generation of photonic accelerators, as well as the main limits that must
be overcome
Algorithm and Architecture Co-design for High-performance Digital Signal Processing.
CMOS scaling has been the driving force behind the revolution of digital signal processing (DSP) systems, but scaling is slowing down and the CMOS device is approaching its fundamental scaling limit. At the same time, DSP algorithms are continuing to evolve, so there is a growing gap between the increasing complexities of the algorithms and what is practically implementable. The gap can be bridged by exploring the synergy between algorithm and hardware design, using the so-called co-design techniques.
In this thesis, algorithm and architecture co-design techniques are applied to X-ray computed tomography (CT) image reconstruction. Analysis of fixed-point quantization and CT geometry identifies an optimal word length and a mismatch between the object and projection grids. A water-filling buffer is designed to resolve the grid mismatch, and is combined with parallel fixed-point arithmetic units to improve the throughput. The analysis eventually leads to an out-of-order scheduling architecture that reduces the off-chip memory access by three orders of magnitude.
The co-design techniques are further applied to the design of neural networks for sparse coding. Analysis of the neuron spiking dynamics leads to the optimal tuning of network size, spiking rate, and update step size to keep the spiking sparse. The resulting sparsity enables a bus-ring architecture to achieve both high throughput and scalability. A 65nm CMOS chip implementing the architecture demonstrates feature extraction at a throughput of 1.24G pixel/s at 1.0V and 310MHz. The error tolerance of sparse coding can be exploited to enhance the energy efficiency.
As a natural next step after the sparse coding chip, a neural-inspired inference module (IM) is designed for object recognition. The object recognition chip consists of an IM based on sparse coding and an event-driven classifier. A learning co-processor is integrated on chip to enable on-chip learning. The throughput and energy efficiency are further improved using architectural techniques including sub-dividing the IM and classifier into modules and optimal pipelining. The result is a 65nm CMOS chip that performs sparse coding at 10.16G pixel/s at 1.0V and 635MHz.
The co-design techniques can be applied to the design of other advanced DSP algorithms for emerging applications.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113344/1/jungkook_1.pd