9 research outputs found
Accelerating Pattern Matching in Neuromorphic Text Recognition System Using Intel Xeon Phi Coprocessor
Neuromorphic computing systems refer to the computing architecture inspired by the working mechanism of human brains. The rapidly reducing cost and increasing performance of state-of-the-art computing hardware allows large-scale implementation of machine intelligence models with neuromorphic architectures and opens the opportunity for new applications. One such computing hardware is Intel Xeon Phi coprocessor, which delivers over a TeraFLOP of computing power with 61 integrated processing cores. How to efficiently harness such computing power to achieve real time decision and cognition is one of the key design considerations. This work presents an optimized implementation of Brain-State-in-a-Box (BSB) neural network model on the Xeon Phi coprocessor for pattern matching in the context of intelligent text recognition of noisy document images. From a scalability standpoint on a High Performance Computing (HPC) platform we show that efficient workload partitioning and resource management can double the performance of this many-core architecture for neuromorphic applications
The Potential of the Intel Xeon Phi for Supervised Deep Learning
Supervised learning of Convolutional Neural Networks (CNNs), also known as
supervised Deep Learning, is a computationally demanding process. To find the
most suitable parameters of a network for a given application, numerous
training sessions are required. Therefore, reducing the training time per
session is essential to fully utilize CNNs in practice. While numerous research
groups have addressed the training of CNNs using GPUs, so far not much
attention has been paid to the Intel Xeon Phi coprocessor. In this paper we
investigate empirically and theoretically the potential of the Intel Xeon Phi
for supervised learning of CNNs. We design and implement a parallelization
scheme named CHAOS that exploits both the thread- and SIMD-parallelism of the
coprocessor. Our approach is evaluated on the Intel Xeon Phi 7120P using the
MNIST dataset of handwritten digits for various thread counts and CNN
architectures. Results show a 103.5x speed up when training our large network
for 15 epochs using 244 threads, compared to one thread on the coprocessor.
Moreover, we develop a performance model and use it to assess our
implementation and answer what-if questions.Comment: The 17th IEEE International Conference on High Performance Computing
and Communications (HPCC 2015), Aug. 24 - 26, 2015, New York, US
Neuromorphic Learning Systems for Supervised and Unsupervised Applications
The advancements in high performance computing (HPC) have enabled the large-scale implementation of neuromorphic learning models and pushed the research on computational intelligence into a new era. Those bio-inspired models are constructed on top of unified building blocks, i.e. neurons, and have revealed potentials for learning of complex information. Two major challenges remain in neuromorphic computing. Firstly, sophisticated structuring methods are needed to determine the connectivity of the neurons in order to model various problems accurately. Secondly, the models need to adapt to non-traditional architectures for improved computation speed and energy efficiency. In this thesis, we address these two problems and apply our techniques to different cognitive applications.
This thesis first presents the self-structured confabulation network for anomaly detection. Among the machine learning applications, unsupervised detection of the anomalous streams is especially challenging because it requires both detection accuracy and real-time performance. Designing a computing framework that harnesses the growing computing power of the multicore systems while maintaining high sensitivity and specificity to the anomalies is an urgent research need. We present AnRAD (Anomaly Recognition And Detection), a bio-inspired detection framework that performs probabilistic inferences. We leverage the mutual information between the features and develop a self-structuring procedure that learns a succinct confabulation network from the unlabeled data. This network is capable of fast incremental learning, which continuously refines the knowledge base from the data streams. Compared to several existing anomaly detection methods, the proposed approach provides competitive detection accuracy as well as the insight to reason the decision making. Furthermore, we exploit the massive parallel structure of the AnRAD framework. Our implementation of the recall algorithms on the graphic processing unit (GPU) and the Xeon Phi co-processor both obtain substantial speedups over the sequential implementation on general-purpose microprocessor (GPP). The implementation enables real-time service to concurrent data streams with diversified contexts, and can be applied to large problems with multiple local patterns. Experimental results demonstrate high computing performance and memory efficiency. For vehicle abnormal behavior detection, the framework is able to monitor up to 16000 vehicles and their interactions in real-time with a single commodity co-processor, and uses less than 0.2ms for each testing subject.
While adapting our streaming anomaly detection model to mobile devices or unmanned systems, the key challenge is to deliver required performance under the stringent power constraint. To address the paradox between performance and power consumption, brain-inspired hardware, such as the IBM Neurosynaptic System, has been developed to enable low power implementation of neural models. As a follow-up to the AnRAD framework, we proposed to port the detection network to the TrueNorth architecture. Implementing inference based anomaly detection on a neurosynaptic processor is not straightforward due to hardware limitations. A design flow and the supporting component library are developed to flexibly map the learned detection networks to the neurosynaptic cores. Instead of the popular rate code, burst code is adopted in the design, which represents numerical value using the phase of a burst of spike trains. This does not only reduce the hardware complexity, but also increases the result\u27s accuracy. A Corelet library, NeoInfer-TN, is implemented for basic operations in burst code and two-phase pipelines are constructed based on the library components. The design can be configured for different tradeoffs between detection accuracy, hardware resource consumptions, throughput and energy. We evaluate the system using network intrusion detection data streams. The results show higher detection rate than some conventional approaches and real-time performance, with only 50mW power consumption. Overall, it achieves 10^8 operations per Joule.
In addition to the modeling and implementation of unsupervised anomaly detection, we also investigate a supervised learning model based on neural networks and deep fragment embedding and apply it to text-image retrieval. The study aims at bridging the gap between image and natural language. It continues to improve the bidirectional retrieval performance across the modalities. Unlike existing works that target at single sentence densely describing the image objects, we elevate the topic to associating deep image representations with noisy texts that are only loosely correlated. Based on text-image fragment embedding, our model employs a sequential configuration, connects two embedding stages together. The first stage learns the relevancy of the text fragments, and the second stage uses the filtered output from the first one to improve the matching results. The model also integrates multiple convolutional neural networks (CNN) to construct the image fragments, in which rich context information such as human faces can be extracted to increase the alignment accuracy. The proposed method is evaluated with both synthetic dataset and real-world dataset collected from picture news website. The results show up to 50% ranking performance improvement over the comparison models
Efficient Implementation of Stochastic Inference on Heterogeneous Clusters and Spiking Neural Networks
Neuromorphic computing refers to brain inspired algorithms and architectures. This paradigm of computing can solve complex problems which were not possible with traditional computing methods. This is because such implementations learn to identify the required features and classify them based on its training, akin to how brains function. This task involves performing computation on large quantities of data. With this inspiration, a comprehensive multi-pronged approach is employed to study and efficiently implement neuromorphic inference model using heterogeneous clusters to address the problem using traditional Von Neumann architectures and by developing spiking neural networks (SNN) for native and ultra-low power implementation. In this regard, an extendable high-performance computing (HPC) framework and optimizations are proposed for heterogeneous clusters to modularize complex neuromorphic applications in a distributed manner. To achieve best possible throughput and load balancing for such modularized architectures a set of algorithms are proposed to suggest the optimal mapping of different modules as an asynchronous pipeline to the available cluster resources while considering the complex data dependencies between stages. On the other hand, SNNs are more biologically plausible and can achieve ultra-low power implementation due to its sparse spike based communication, which is possible with emerging non-Von Neumann computing platforms. As a significant progress in this direction, spiking neuron models capable of distributed online learning are proposed. A high performance SNN simulator (SpNSim) is developed for simulation of large scale mixed neuron model networks. An accompanying digital hardware neuron RTL is also proposed for efficient real time implementation of SNNs capable of online learning. Finally, a methodology for mapping probabilistic graphical model to off-the-shelf neurosynaptic processor (IBM TrueNorth) as a stochastic SNN is presented with ultra-low power consumption
Analog Photonics Computing for Information Processing, Inference and Optimisation
This review presents an overview of the current state-of-the-art in photonics
computing, which leverages photons, photons coupled with matter, and
optics-related technologies for effective and efficient computational purposes.
It covers the history and development of photonics computing and modern
analogue computing platforms and architectures, focusing on optimization tasks
and neural network implementations. The authors examine special-purpose
optimizers, mathematical descriptions of photonics optimizers, and their
various interconnections. Disparate applications are discussed, including
direct encoding, logistics, finance, phase retrieval, machine learning, neural
networks, probabilistic graphical models, and image processing, among many
others. The main directions of technological advancement and associated
challenges in photonics computing are explored, along with an assessment of its
efficiency. Finally, the paper discusses prospects and the field of optical
quantum computing, providing insights into the potential applications of this
technology.Comment: Invited submission by Journal of Advanced Quantum Technologies;
accepted version 5/06/202
Design and Implementation of Hardware Accelerators for Neural Processing Applications
Primary motivation for this work was the need to implement hardware
accelerators for a newly proposed ANN structure called Auto Resonance Network
(ARN) for robotic motion planning. ARN is an approximating feed-forward
hierarchical and explainable network. It can be used in various AI applications
but the application base was small. Therefore, the objective of the research
was twofold: to develop a new application using ARN and to implement a hardware
accelerator for ARN. As per the suggestions given by the Doctoral Committee, an
image recognition system using ARN has been implemented. An accuracy of around
94% was achieved with only 2 layers of ARN. The network also required a small
training data set of about 500 images. Publicly available MNIST dataset was
used for this experiment. All the coding was done in Python. Massive
parallelism seen in ANNs presents several challenges to CPU design. For a given
functionality, e.g., multiplication, several copies of serial modules can be
realized within the same area as a parallel module. Advantage of using serial
modules compared to parallel modules under area constraints has been discussed.
One of the module often useful in ANNs is a multi-operand addition. One problem
in its implementation is that the estimation of carry bits when the number of
operands changes. A theorem to calculate exact number of carry bits required
for a multi-operand addition has been presented in the thesis which alleviates
this problem. The main advantage of the modular approach to multi-operand
addition is the possibility of pipelined addition with low reconfiguration
overhead. This results in overall increase in throughput for large number of
additions, typically seen in several DNN configurations
GPU Computing for Cognitive Robotics
This thesis presents the first investigation of the impact of GPU
computing on cognitive robotics by providing a series of novel experiments in
the area of action and language acquisition in humanoid robots and computer
vision. Cognitive robotics is concerned with endowing robots with high-level
cognitive capabilities to enable the achievement of complex goals in complex
environments. Reaching the ultimate goal of developing cognitive robots will
require tremendous amounts of computational power, which was until
recently provided mostly by standard CPU processors. CPU cores are
optimised for serial code execution at the expense of parallel execution, which
renders them relatively inefficient when it comes to high-performance
computing applications. The ever-increasing market demand for
high-performance, real-time 3D graphics has evolved the GPU into a highly
parallel, multithreaded, many-core processor extraordinary computational
power and very high memory bandwidth. These vast computational resources
of modern GPUs can now be used by the most of the cognitive robotics models
as they tend to be inherently parallel. Various interesting and insightful
cognitive models were developed and addressed important scientific questions
concerning action-language acquisition and computer vision. While they have
provided us with important scientific insights, their complexity and
application has not improved much over the last years. The experimental
tasks as well as the scale of these models are often minimised to avoid
excessive training times that grow exponentially with the number of neurons
and the training data. This impedes further progress and development of
complex neurocontrollers that would be able to take the cognitive robotics
research a step closer to reaching the ultimate goal of creating intelligent
machines. This thesis presents several cases where the application of the GPU
computing on cognitive robotics algorithms resulted in the development of
large-scale neurocontrollers of previously unseen complexity enabling the
conducting of the novel experiments described herein.European Commission Seventh Framework
Programm
Understanding Quantum Technologies 2022
Understanding Quantum Technologies 2022 is a creative-commons ebook that
provides a unique 360 degrees overview of quantum technologies from science and
technology to geopolitical and societal issues. It covers quantum physics
history, quantum physics 101, gate-based quantum computing, quantum computing
engineering (including quantum error corrections and quantum computing
energetics), quantum computing hardware (all qubit types, including quantum
annealing and quantum simulation paradigms, history, science, research,
implementation and vendors), quantum enabling technologies (cryogenics, control
electronics, photonics, components fabs, raw materials), quantum computing
algorithms, software development tools and use cases, unconventional computing
(potential alternatives to quantum and classical computing), quantum
telecommunications and cryptography, quantum sensing, quantum technologies
around the world, quantum technologies societal impact and even quantum fake
sciences. The main audience are computer science engineers, developers and IT
specialists as well as quantum scientists and students who want to acquire a
global view of how quantum technologies work, and particularly quantum
computing. This version is an extensive update to the 2021 edition published in
October 2021.Comment: 1132 pages, 920 figures, Letter forma