20,060 research outputs found
Audio-only Bird Species Automated Identification Method with Limited Training Data Based on Multi-Channel Deep Convolutional Neural Networks
Based on the transfer learning, we design a bird species identification model
that uses the VGG-16 model (pretrained on ImageNet) for feature extraction,
then a classifier consisting of two fully-connected hidden layers and a Softmax
layer is attached. We compare the performance of the proposed model with the
original VGG16 model. The results show that the former has higher train
efficiency, but lower mean average precisions(MAP). To improve the MAP of the
proposed model, we investigate the result fusion mode to form multi-channel
identification model, the best MAP reaches 0.9998. The number of model
parameters is 13110, which is only 0.0082% of the VGG16 model. Also, the size
demand of sample is decreased.Comment: 11 pages,11 figure
Network Decoupling: From Regular to Depthwise Separable Convolutions
Depthwise separable convolution has shown great efficiency in network design,
but requires time-consuming training procedure with full training-set
available. This paper first analyzes the mathematical relationship between
regular convolutions and depthwise separable convolutions, and proves that the
former one could be approximated with the latter one in closed form. We show
depthwise separable convolutions are principal components of regular
convolutions. And then we propose network decoupling (ND), a training-free
method to accelerate convolutional neural networks (CNNs) by transferring
pre-trained CNN models into the MobileNet-like depthwise separable convolution
structure, with a promising speedup yet negligible accuracy loss. We further
verify through experiments that the proposed method is orthogonal to other
training-free methods like channel decomposition, spatial decomposition, etc.
Combining the proposed method with them will bring even larger CNN speedup. For
instance, ND itself achieves about 2X speedup for the widely used VGG16, and
combined with other methods, it reaches 3.7X speedup with graceful accuracy
degradation. We demonstrate that ND is widely applicable to classification
networks like ResNet, and object detection network like SSD300
Per-pixel Classification Rebar Exposures in Bridge Eye-inspection
Efficient inspection and accurate diagnosis are required for civil
infrastructures with 50 years since completion. Especially in municipalities,
the shortage of technical staff and budget constraints on repair expenses have
become a critical problem. If we can detect damaged photos automatically
per-pixels from the record of the inspection record in addition to the 5-step
judgment and countermeasure classification of eye-inspection vision, then it is
possible that countermeasure information can be provided more flexibly, whether
we need to repair and how large the expose of damage interest. A piece of
damage photo is often sparse as long as it is not zoomed around damage, exactly
the range where the detection target is photographed, is at most only 1%.
Generally speaking, rebar exposure is frequently occurred, and there are many
opportunities to judge repair measure. In this paper, we propose three damage
detection methods of transfer learning which enables semantic segmentation in
an image with low pixels using damaged photos of human eye-inspection. Also, we
tried to create a deep convolutional network from scratch with the
preprocessing that random crops with rotations are generated. In fact, we show
the results applied this method using the 208 rebar exposed images on the 106
real-world bridges. Finally, future tasks of damage detection modeling are
mentioned.Comment: 4 pages, 3 figure
PoTrojan: powerful neural-level trojan designs in deep learning models
With the popularity of deep learning (DL), artificial intelligence (AI) has
been applied in many areas of human life. Neural network or artificial neural
network (NN), the main technique behind DL, has been extensively studied to
facilitate computer vision and natural language recognition. However, the more
we rely on information technology, the more vulnerable we are. That is,
malicious NNs could bring huge threat in the so-called coming AI era. In this
paper, for the first time in the literature, we propose a novel approach to
design and insert powerful neural-level trojans or PoTrojan in pre-trained NN
models. Most of the time, PoTrojans remain inactive, not affecting the normal
functions of their host NN models. PoTrojans could only be triggered in very
rare conditions. Once activated, however, the PoTrojans could cause the host NN
models to malfunction, either falsely predicting or classifying, which is a
significant threat to human society of the AI era. We would explain the
principles of PoTrojans and the easiness of designing and inserting them in
pre-trained deep learning models. PoTrojans doesn't modify the existing
architecture or parameters of the pre-trained models, without re-training.
Hence, the proposed method is very efficient.Comment: 7 pages, 6 figure
SNN: Stacked Neural Networks
It has been proven that transfer learning provides an easy way to achieve
state-of-the-art accuracies on several vision tasks by training a simple
classifier on top of features obtained from pre-trained neural networks. The
goal of this work is to generate better features for transfer learning from
multiple publicly available pre-trained neural networks. To this end, we
propose a novel architecture called Stacked Neural Networks which leverages the
fast training time of transfer learning while simultaneously being much more
accurate. We show that using a stacked NN architecture can result in up to 8%
improvements in accuracy over state-of-the-art techniques using only one
pre-trained network for transfer learning. A second aim of this work is to make
network fine- tuning retain the generalizability of the base network to unseen
tasks. To this end, we propose a new technique called "joint fine-tuning" that
is able to give accuracies comparable to finetuning the same network
individually over two datasets. We also show that a jointly finetuned network
generalizes better to unseen tasks when compared to a network finetuned over a
single task.Comment: 8page
Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware
As Machine Learning (ML) gets applied to security-critical or sensitive
domains, there is a growing need for integrity and privacy for outsourced ML
computations. A pragmatic solution comes from Trusted Execution Environments
(TEEs), which use hardware and software protections to isolate sensitive
computations from the untrusted software stack. However, these isolation
guarantees come at a price in performance, compared to untrusted alternatives.
This paper initiates the study of high performance execution of Deep Neural
Networks (DNNs) in TEEs by efficiently partitioning DNN computations between
trusted and untrusted devices. Building upon an efficient outsourcing scheme
for matrix multiplication, we propose Slalom, a framework that securely
delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX
or Sanctum) to a faster, yet untrusted, co-located processor. We evaluate
Slalom by running DNNs in an Intel SGX enclave, which selectively delegates
work to an untrusted GPU. For canonical DNNs (VGG16, MobileNet and ResNet
variants) we obtain 6x to 20x increases in throughput for verifiable inference,
and 4x to 11x for verifiable and private inference.Comment: Accepted as an oral presentation at ICLR 2019. OpenReview available
at https://openreview.net/forum?id=rJVorjCcK
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web
Recently, attempts have been made to collect millions of videos to train CNN
models for action recognition in videos. However, curating such large-scale
video datasets requires immense human labor, and training CNNs on millions of
videos demands huge computational resources. In contrast, collecting action
images from the Web is much easier and training on images requires much less
computation. In addition, labeled web images tend to contain discriminative
action poses, which highlight discriminative portions of a video's temporal
progression. We explore the question of whether we can utilize web action
images to train better CNN models for action recognition in videos. We collect
23.8K manually filtered images from the Web that depict the 101 actions in the
UCF101 action video dataset. We show that by utilizing web action images along
with videos in training, significant performance boosts of CNN models can be
achieved. We then investigate the scalability of the process by leveraging
crawled web images (unfiltered) for UCF101 and ActivityNet. We replace 16.2M
video frames by 393K unfiltered images and get comparable performance
Face-MagNet: Magnifying Feature Maps to Detect Small Faces
In this paper, we introduce the Face Magnifier Network (Face-MageNet), a face
detector based on the Faster-RCNN framework which enables the flow of
discriminative information of small scale faces to the classifier without any
skip or residual connections. To achieve this, Face-MagNet deploys a set of
ConvTranspose, also known as deconvolution, layers in the Region Proposal
Network (RPN) and another set before the Region of Interest (RoI) pooling layer
to facilitate detection of finer faces. In addition, we also design, train, and
evaluate three other well-tuned architectures that represent the conventional
solutions to the scale problem: context pooling, skip connections, and scale
partitioning. Each of these three networks achieves comparable results to the
state-of-the-art face detectors. With extensive experiments, we show that
Face-MagNet based on a VGG16 architecture achieves better results than the
recently proposed ResNet101-based HR method on the task of face detection on
WIDER dataset and also achieves similar results on the hard set as our other
method SSH.Comment: Accepted in WACV1
Understanding Deep Neural Networks Using Topological Data Analysis
Deep neural networks (DNN) are black box algorithms. They are trained using a
gradient descent back propagation technique which trains weights in each layer
for the sole goal of minimizing training error. Hence, the resulting weights
cannot be directly explained. Using Topological Data Analysis (TDA) we can get
an insight on how the neural network is thinking, specifically by analyzing the
activation values of validation images as they pass through each layer.Comment: 13 pages, 14 figure
PipeDream: Fast and Efficient Pipeline Parallel DNN Training
PipeDream is a Deep Neural Network(DNN) training system for GPUs that
parallelizes computation by pipelining execution across multiple machines. Its
pipeline parallel computing model avoids the slowdowns faced by data-parallel
training when large models and/or limited network bandwidth induce high
communication-to-computation ratios. PipeDream reduces communication by up to
95% for large DNNs relative to data-parallel training, and allows perfect
overlap of communication and computation. PipeDream keeps all available GPUs
productive by systematically partitioning DNN layers among them to balance work
and minimize communication, versions model parameters for backward pass
correctness, and schedules the forward and backward passes of different inputs
in round-robin fashion to optimize "time to target accuracy". Experiments with
five different DNNs on two different clusters show that PipeDream is up to 5x
faster in time-to-accuracy compared to data-parallel training
- …
