20,060 research outputs found

    Audio-only Bird Species Automated Identification Method with Limited Training Data Based on Multi-Channel Deep Convolutional Neural Networks

    Full text link
    Based on the transfer learning, we design a bird species identification model that uses the VGG-16 model (pretrained on ImageNet) for feature extraction, then a classifier consisting of two fully-connected hidden layers and a Softmax layer is attached. We compare the performance of the proposed model with the original VGG16 model. The results show that the former has higher train efficiency, but lower mean average precisions(MAP). To improve the MAP of the proposed model, we investigate the result fusion mode to form multi-channel identification model, the best MAP reaches 0.9998. The number of model parameters is 13110, which is only 0.0082% of the VGG16 model. Also, the size demand of sample is decreased.Comment: 11 pages,11 figure

    Network Decoupling: From Regular to Depthwise Separable Convolutions

    Full text link
    Depthwise separable convolution has shown great efficiency in network design, but requires time-consuming training procedure with full training-set available. This paper first analyzes the mathematical relationship between regular convolutions and depthwise separable convolutions, and proves that the former one could be approximated with the latter one in closed form. We show depthwise separable convolutions are principal components of regular convolutions. And then we propose network decoupling (ND), a training-free method to accelerate convolutional neural networks (CNNs) by transferring pre-trained CNN models into the MobileNet-like depthwise separable convolution structure, with a promising speedup yet negligible accuracy loss. We further verify through experiments that the proposed method is orthogonal to other training-free methods like channel decomposition, spatial decomposition, etc. Combining the proposed method with them will bring even larger CNN speedup. For instance, ND itself achieves about 2X speedup for the widely used VGG16, and combined with other methods, it reaches 3.7X speedup with graceful accuracy degradation. We demonstrate that ND is widely applicable to classification networks like ResNet, and object detection network like SSD300

    Per-pixel Classification Rebar Exposures in Bridge Eye-inspection

    Full text link
    Efficient inspection and accurate diagnosis are required for civil infrastructures with 50 years since completion. Especially in municipalities, the shortage of technical staff and budget constraints on repair expenses have become a critical problem. If we can detect damaged photos automatically per-pixels from the record of the inspection record in addition to the 5-step judgment and countermeasure classification of eye-inspection vision, then it is possible that countermeasure information can be provided more flexibly, whether we need to repair and how large the expose of damage interest. A piece of damage photo is often sparse as long as it is not zoomed around damage, exactly the range where the detection target is photographed, is at most only 1%. Generally speaking, rebar exposure is frequently occurred, and there are many opportunities to judge repair measure. In this paper, we propose three damage detection methods of transfer learning which enables semantic segmentation in an image with low pixels using damaged photos of human eye-inspection. Also, we tried to create a deep convolutional network from scratch with the preprocessing that random crops with rotations are generated. In fact, we show the results applied this method using the 208 rebar exposed images on the 106 real-world bridges. Finally, future tasks of damage detection modeling are mentioned.Comment: 4 pages, 3 figure

    PoTrojan: powerful neural-level trojan designs in deep learning models

    Full text link
    With the popularity of deep learning (DL), artificial intelligence (AI) has been applied in many areas of human life. Neural network or artificial neural network (NN), the main technique behind DL, has been extensively studied to facilitate computer vision and natural language recognition. However, the more we rely on information technology, the more vulnerable we are. That is, malicious NNs could bring huge threat in the so-called coming AI era. In this paper, for the first time in the literature, we propose a novel approach to design and insert powerful neural-level trojans or PoTrojan in pre-trained NN models. Most of the time, PoTrojans remain inactive, not affecting the normal functions of their host NN models. PoTrojans could only be triggered in very rare conditions. Once activated, however, the PoTrojans could cause the host NN models to malfunction, either falsely predicting or classifying, which is a significant threat to human society of the AI era. We would explain the principles of PoTrojans and the easiness of designing and inserting them in pre-trained deep learning models. PoTrojans doesn't modify the existing architecture or parameters of the pre-trained models, without re-training. Hence, the proposed method is very efficient.Comment: 7 pages, 6 figure

    SNN: Stacked Neural Networks

    Full text link
    It has been proven that transfer learning provides an easy way to achieve state-of-the-art accuracies on several vision tasks by training a simple classifier on top of features obtained from pre-trained neural networks. The goal of this work is to generate better features for transfer learning from multiple publicly available pre-trained neural networks. To this end, we propose a novel architecture called Stacked Neural Networks which leverages the fast training time of transfer learning while simultaneously being much more accurate. We show that using a stacked NN architecture can result in up to 8% improvements in accuracy over state-of-the-art techniques using only one pre-trained network for transfer learning. A second aim of this work is to make network fine- tuning retain the generalizability of the base network to unseen tasks. To this end, we propose a new technique called "joint fine-tuning" that is able to give accuracies comparable to finetuning the same network individually over two datasets. We also show that a jointly finetuned network generalizes better to unseen tasks when compared to a network finetuned over a single task.Comment: 8page

    Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

    Full text link
    As Machine Learning (ML) gets applied to security-critical or sensitive domains, there is a growing need for integrity and privacy for outsourced ML computations. A pragmatic solution comes from Trusted Execution Environments (TEEs), which use hardware and software protections to isolate sensitive computations from the untrusted software stack. However, these isolation guarantees come at a price in performance, compared to untrusted alternatives. This paper initiates the study of high performance execution of Deep Neural Networks (DNNs) in TEEs by efficiently partitioning DNN computations between trusted and untrusted devices. Building upon an efficient outsourcing scheme for matrix multiplication, we propose Slalom, a framework that securely delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX or Sanctum) to a faster, yet untrusted, co-located processor. We evaluate Slalom by running DNNs in an Intel SGX enclave, which selectively delegates work to an untrusted GPU. For canonical DNNs (VGG16, MobileNet and ResNet variants) we obtain 6x to 20x increases in throughput for verifiable inference, and 4x to 11x for verifiable and private inference.Comment: Accepted as an oral presentation at ICLR 2019. OpenReview available at https://openreview.net/forum?id=rJVorjCcK

    Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web

    Full text link
    Recently, attempts have been made to collect millions of videos to train CNN models for action recognition in videos. However, curating such large-scale video datasets requires immense human labor, and training CNNs on millions of videos demands huge computational resources. In contrast, collecting action images from the Web is much easier and training on images requires much less computation. In addition, labeled web images tend to contain discriminative action poses, which highlight discriminative portions of a video's temporal progression. We explore the question of whether we can utilize web action images to train better CNN models for action recognition in videos. We collect 23.8K manually filtered images from the Web that depict the 101 actions in the UCF101 action video dataset. We show that by utilizing web action images along with videos in training, significant performance boosts of CNN models can be achieved. We then investigate the scalability of the process by leveraging crawled web images (unfiltered) for UCF101 and ActivityNet. We replace 16.2M video frames by 393K unfiltered images and get comparable performance

    Face-MagNet: Magnifying Feature Maps to Detect Small Faces

    Full text link
    In this paper, we introduce the Face Magnifier Network (Face-MageNet), a face detector based on the Faster-RCNN framework which enables the flow of discriminative information of small scale faces to the classifier without any skip or residual connections. To achieve this, Face-MagNet deploys a set of ConvTranspose, also known as deconvolution, layers in the Region Proposal Network (RPN) and another set before the Region of Interest (RoI) pooling layer to facilitate detection of finer faces. In addition, we also design, train, and evaluate three other well-tuned architectures that represent the conventional solutions to the scale problem: context pooling, skip connections, and scale partitioning. Each of these three networks achieves comparable results to the state-of-the-art face detectors. With extensive experiments, we show that Face-MagNet based on a VGG16 architecture achieves better results than the recently proposed ResNet101-based HR method on the task of face detection on WIDER dataset and also achieves similar results on the hard set as our other method SSH.Comment: Accepted in WACV1

    Understanding Deep Neural Networks Using Topological Data Analysis

    Full text link
    Deep neural networks (DNN) are black box algorithms. They are trained using a gradient descent back propagation technique which trains weights in each layer for the sole goal of minimizing training error. Hence, the resulting weights cannot be directly explained. Using Topological Data Analysis (TDA) we can get an insight on how the neural network is thinking, specifically by analyzing the activation values of validation images as they pass through each layer.Comment: 13 pages, 14 figure

    PipeDream: Fast and Efficient Pipeline Parallel DNN Training

    Full text link
    PipeDream is a Deep Neural Network(DNN) training system for GPUs that parallelizes computation by pipelining execution across multiple machines. Its pipeline parallel computing model avoids the slowdowns faced by data-parallel training when large models and/or limited network bandwidth induce high communication-to-computation ratios. PipeDream reduces communication by up to 95% for large DNNs relative to data-parallel training, and allows perfect overlap of communication and computation. PipeDream keeps all available GPUs productive by systematically partitioning DNN layers among them to balance work and minimize communication, versions model parameters for backward pass correctness, and schedules the forward and backward passes of different inputs in round-robin fashion to optimize "time to target accuracy". Experiments with five different DNNs on two different clusters show that PipeDream is up to 5x faster in time-to-accuracy compared to data-parallel training
    corecore