12,495 research outputs found
A Review of Modularization Techniques in Artificial Neural Networks
Artificial neural networks (ANNs) have achieved significant success in
tackling classical and modern machine learning problems. As learning problems
grow in scale and complexity, and expand into multi-disciplinary territory, a
more modular approach for scaling ANNs will be needed. Modular neural networks
(MNNs) are neural networks that embody the concepts and principles of
modularity. MNNs adopt a large number of different techniques for achieving
modularization. Previous surveys of modularization techniques are relatively
scarce in their systematic analysis of MNNs, focusing mostly on empirical
comparisons and lacking an extensive taxonomical framework. In this review, we
aim to establish a solid taxonomy that captures the essential properties and
relationships of the different variants of MNNs. Based on an investigation of
the different levels at which modularization techniques act, we attempt to
provide a universal and systematic framework for theorists studying MNNs, also
trying along the way to emphasise the strengths and weaknesses of different
modularization approaches in order to highlight good practices for neural
network practitioners.Comment: Artif Intell Rev (2019
ScriptNet: Neural Static Analysis for Malicious JavaScript Detection
Malicious scripts are an important computer infection threat vector in the
wild. For web-scale processing, static analysis offers substantial computing
efficiencies. We propose the ScriptNet system for neural malicious JavaScript
detection which is based on static analysis. We use the Convoluted Partitioning
of Long Sequences (CPoLS) model, which processes Javascript files as byte
sequences. Lower layers capture the sequential nature of these byte sequences
while higher layers classify the resulting embedding as malicious or benign.
Unlike previously proposed solutions, our model variants are trained in an
end-to-end fashion allowing discriminative training even for the sequential
processing layers. Evaluating this model on a large corpus of 212,408
JavaScript files indicates that the best performing CPoLS model offers a 97.20%
true positive rate (TPR) for the first 60K byte subsequence at a false positive
rate (FPR) of 0.50%. The best performing CPoLS model significantly outperform
several baseline models
Dynamics Estimation Using Recurrent Neural Network
There is a plenty of research going on in field of robotics. One of the most
important task is dynamic estimation of response during motion. One of the main
applications of this research topics is the task of pouring, which is performed
daily and is commonly used while cooking. We present an approach to estimate
response to a sequence of manipulation actions. We are experimenting with
pouring motion and the response is the change of the amount of water in the
pouring cup. The pouring motion is represented by rotation angle and the amount
of water is represented by its weight. We are using recurrent neural networks
for building the neural network model to train on sequences which represents
1307 trails of pouring. The model gives great results on unseen test data which
does not too different with training data in terms of dimensions of the cup
used for pouring and receiving. The loss obtained with this test data is
4.5920. The model does not give good results on generalization experiments when
we provide a test set which has dimensions of the cup very different from those
in training data
PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite
Tensor methods have gained increasingly attention from various applications,
including machine learning, quantum chemistry, healthcare analytics, social
network analysis, data mining, and signal processing, to name a few. Sparse
tensors and their algorithms become critical to further improve the performance
of these methods and enhance the interpretability of their output. This work
presents a sparse tensor algorithm benchmark suite (PASTA) for single- and
multi-core CPUs. To the best of our knowledge, this is the first benchmark
suite for sparse tensor world. PASTA targets on: 1) helping application users
to evaluate different computer systems using its representative computational
workloads; 2) providing insights to better utilize existed computer
architecture and systems and inspiration for the future design. This benchmark
suite is publicly released https://gitlab.com/tensorworld/pasta
Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures
Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to
attain good accuracy in various machine learning tasks. A DNN's architecture
(i.e., its hyper-parameters) broadly determines the DNN's accuracy and
performance, and is often confidential. Attacking a DNN in the cloud to obtain
its architecture can potentially provide major commercial value. Further,
attaining a DNN's architecture facilitates other, existing DNN attacks.
This paper presents Cache Telepathy: a fast and accurate mechanism to steal a
DNN's architecture using the cache side channel. Our attack is based on the
insight that DNN inference relies heavily on tiled GEMM (Generalized Matrix
Multiply), and that DNN architecture parameters determine the number of GEMM
calls and the dimensions of the matrices used in the GEMM functions. Such
information can be leaked through the cache side channel.
This paper uses Prime+Probe and Flush+Reload to attack VGG and ResNet DNNs
running OpenBLAS and Intel MKL libraries. Our attack is effective in helping
obtain the architectures by very substantially reducing the search space of
target DNN architectures. For example, for VGG using OpenBLAS, it reduces the
search space from more than architectures to just 16
FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review
Due to recent advances in digital technologies, and availability of credible
data, an area of artificial intelligence, deep learning, has emerged, and has
demonstrated its ability and effectiveness in solving complex learning problems
not possible before. In particular, convolution neural networks (CNNs) have
demonstrated their effectiveness in image detection and recognition
applications. However, they require intensive CPU operations and memory
bandwidth that make general CPUs fail to achieve desired performance levels.
Consequently, hardware accelerators that use application specific integrated
circuits (ASICs), field programmable gate arrays (FPGAs), and graphic
processing units (GPUs) have been employed to improve the throughput of CNNs.
More precisely, FPGAs have been recently adopted for accelerating the
implementation of deep learning networks due to their ability to maximize
parallelism as well as due to their energy efficiency. In this paper, we review
recent existing techniques for accelerating deep learning networks on FPGAs. We
highlight the key features employed by the various techniques for improving the
acceleration performance. In addition, we provide recommendations for enhancing
the utilization of FPGAs for CNNs acceleration. The techniques investigated in
this paper represent the recent trends in FPGA-based accelerators of deep
learning networks. Thus, this review is expected to direct the future advances
on efficient hardware accelerators and to be useful for deep learning
researchers.Comment: This article has been accepted for publication in IEEE Access
(December, 2018
FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters
Deep Neural Networks (DNNs) have revolutionized numerous applications, but
the demand for ever more performance remains unabated. Scaling DNN computations
to larger clusters is generally done by distributing tasks in batch mode using
methods such as distributed synchronous SGD. Among the issues with this
approach is that to make the distributed cluster work with high utilization,
the workload distributed to each node must be large, which implies nontrivial
growth in the SGD mini-batch size.
In this paper, we propose a framework called FPDeep, which uses a hybrid of
model and layer parallelism to configure distributed reconfigurable clusters to
train DNNs. This approach has numerous benefits. First, the design does not
suffer from batch size growth. Second, novel workload and weight partitioning
leads to balanced loads of both among nodes. And third, the entire system is a
fine-grained pipeline. This leads to high parallelism and utilization and also
minimizes the time features need to be cached while waiting for
back-propagation. As a result, storage demand is reduced to the point where
only on-chip memory is used for the convolution layers. We evaluate FPDeep with
the Alexnet, VGG-16, and VGG-19 benchmarks. Experimental results show that
FPDeep has good scalability to a large number of FPGAs, with the limiting
factor being the FPGA-to-FPGA bandwidth. With 6 transceivers per FPGA, FPDeep
shows linearity up to 83 FPGAs. Energy efficiency is evaluated with respect to
GOPs/J. FPDeep provides, on average, 6.36x higher energy efficiency than
comparable GPU servers.Comment: Accepted by IEEE TRANSACTIONS ON COMPUTERS (TC
Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC
Convolutional Neural Networks (CNN) have been widely deployed in diverse
application domains. There has been significant progress in accelerating both
their training and inference using high-performance GPUs, FPGAs, and custom
ASICs for datacenter-scale environments. The recent proliferation of mobile and
IoT devices have necessitated real-time, energy-efficient deep neural network
inference on embedded-class, resource-constrained platforms. In this context,
we present {\em Synergy}, an automated, hardware-software co-designed,
pipelined, high-throughput CNN inference framework on embedded heterogeneous
system-on-chip (SoC) architectures (Xilinx Zynq). {\em Synergy} leverages,
through multi-threading, all the available on-chip resources, which includes
the dual-core ARM processor along with the FPGA and the NEON SIMD engines as
accelerators. Moreover, {\em Synergy} provides a unified abstraction of the
heterogeneous accelerators (FPGA and NEON) and can adapt to different network
configurations at runtime without changing the underlying hardware accelerator
architecture by balancing workload across accelerators through work-stealing.
{\em Synergy} achieves 7.3X speedup, averaged across seven CNN models, over a
well-optimized software-only solution. {\em Synergy} demonstrates substantially
better throughput and energy-efficiency compared to the contemporary CNN
implementations on the same SoC architecture.Comment: 34 pages, submitted to ACM Transactions on Embedded Computing Systems
(TECS
Learning to Flip Successive Cancellation Decoding of Polar Codes with LSTM Networks
The key to successive cancellation (SC) flip decoding of polar codes is to
accurately identify the first error bit. The optimal flipping strategy is
considered difficult due to lack of an analytical solution. Alternatively, we
propose a deep learning aided SC flip algorithm. Specifically, before each SC
decoding attempt, a long short-term memory (LSTM) network is exploited to
either (i) locate the first error bit, or (ii) undo a previous `wrong' flip. In
each SC attempt, the sequence of log likelihood ratios (LLRs) derived in the
previous SC attempt is exploited to decide which action to take. Accordingly, a
two-stage training method of the LSTM network is proposed, i.e., learn to
locate first error bits in the first stage, and then to undo `wrong' flips in
the second stage. Simulation results show that the proposed approach identifies
error bits more accurately and achieves better performance than the
state-of-the-art SC flip algorithms.Comment: 5 pages, 7 figure
Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Taxi demand prediction is an important building block to enabling intelligent
transportation systems in a smart city. An accurate prediction model can help
the city pre-allocate resources to meet travel demand and to reduce empty taxis
on streets which waste energy and worsen the traffic congestion. With the
increasing popularity of taxi requesting services such as Uber and Didi Chuxing
(in China), we are able to collect large-scale taxi demand data continuously.
How to utilize such big data to improve the demand prediction is an interesting
and critical real-world problem. Traditional demand prediction methods mostly
rely on time series forecasting techniques, which fail to model the complex
non-linear spatial and temporal relations. Recent advances in deep learning
have shown superior performance on traditionally challenging tasks such as
image classification by learning the complex features and correlations from
large-scale data. This breakthrough has inspired researchers to explore deep
learning techniques on traffic prediction problems. However, existing methods
on traffic prediction have only considered spatial relation (e.g., using CNN)
or temporal relation (e.g., using LSTM) independently. We propose a Deep
Multi-View Spatial-Temporal Network (DMVST-Net) framework to model both spatial
and temporal relations. Specifically, our proposed model consists of three
views: temporal view (modeling correlations between future demand values with
near time points via LSTM), spatial view (modeling local spatial correlation
via local CNN), and semantic view (modeling correlations among regions sharing
similar temporal patterns). Experiments on large-scale real taxi demand data
demonstrate effectiveness of our approach over state-of-the-art methods.Comment: AAAI 2018 pape
- …