26,781 research outputs found

    Practical Block-wise Neural Network Architecture Generation

    Full text link
    Convolutional neural networks have gained a remarkable success in computer vision. However, most usable network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained sequentially to choose component layers. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it performs competitive results in comparison to the hand-crafted state-of-the-art networks on image classification, additionally, the best network generated by BlockQNN achieves 3.54% top-1 error rate on CIFAR-10 which beats all existing auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of the search space in designing networks which only spends 3 days with 32 GPUs, and (3) moreover, it has strong generalizability that the network built on CIFAR also performs well on a larger-scale ImageNet dataset.Comment: Accepted to CVPR 201

    Automatically Evolving CNN Architectures Based on Blocks

    Full text link
    The performance of Convolutional Neural Networks (CNNs) highly relies on their architectures. In order to design a CNN with promising performance, extended expertise in both CNNs and the investigated problem is required, which is not necessarily held by every user interested in CNNs or the problem domain. In this paper, we propose to automatically evolve CNN architectures by using a genetic algorithm based on ResNet blocks and DenseNet blocks. The proposed algorithm is \textbf{completely} automatic in designing CNN architectures, particularly, neither pre-processing before it starts nor post-processing on the designed CNN is needed. Furthermore, the proposed algorithm does not require users with domain knowledge on CNNs, the investigated problem or even genetic algorithms. The proposed algorithm is evaluated on CIFAR10 and CIFAR100 against 18 state-of-the-art peer competitors. Experimental results show that it outperforms state-of-the-art CNNs hand-crafted and CNNs designed by automatic peer competitors in terms of the classification accuracy, and achieves the competitive classification accuracy against semi-automatic peer competitors. In addition, the proposed algorithm consumes much less time than most peer competitors in finding the best CNN architectures

    Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA

    Full text link
    Deep convolutional neural networks (CNN) based solutions are the current state- of-the-art for computer vision tasks. Due to the large size of these models, they are typically run on clusters of CPUs or GPUs. However, power requirements and cost budgets can be a major hindrance in adoption of CNN for IoT applications. Recent research highlights that CNN contain significant redundancy in their structure and can be quantized to lower bit-width parameters and activations, while maintaining acceptable accuracy. Low bit-width and especially single bit-width (binary) CNN are particularly suitable for mobile applications based on FPGA implementation, due to the bitwise logic operations involved in binarized CNN. Moreover, the transition to lower bit-widths opens new avenues for performance optimizations and model improvement. In this paper, we present an automatic flow from trained TensorFlow models to FPGA system on chip implementation of binarized CNN. This flow involves quantization of model parameters and activations, generation of network and model in embedded-C, followed by automatic generation of the FPGA accelerator for binary convolutions. The automated flow is demonstrated through implementation of binarized "YOLOV2" on the low cost, low power Cyclone- V FPGA device. Experiments on object detection using binarized YOLOV2 demonstrate significant performance benefit in terms of model size and inference speed on FPGA as compared to CPU and mobile CPU platforms. Furthermore, the entire automated flow from trained models to FPGA synthesis can be completed within one hour.Comment: 7 pages, 9 figures. Accepted and presented at MLPCD workshop, NIPS 2017 (LongBeach, California

    Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation

    Full text link
    Deep convolutional neural networks have achieved remarkable progress on a variety of medical image computing tasks. A common problem when applying supervised deep learning methods to medical images is the lack of labeled data, which is very expensive and time-consuming to be collected. In this paper, we present a novel semi-supervised method for medical image segmentation, where the network is optimized by the weighted combination of a common supervised loss for labeled inputs only and a regularization loss for both labeled and unlabeled data. To utilize the unlabeled data, our method encourages the consistent predictions of the network-in-training for the same input under different regularizations. Aiming for the semi-supervised segmentation problem, we enhance the effect of regularization for pixel-level predictions by introducing a transformation, including rotation and flipping, consistent scheme in our self-ensembling model. With the aim of semi-supervised segmentation tasks, we introduce a transformation consistent strategy in our self-ensembling model to enhance the regularization effect for pixel-level predictions. We have extensively validated the proposed semi-supervised method on three typical yet challenging medical image segmentation tasks: (i) skin lesion segmentation from dermoscopy images on International Skin Imaging Collaboration (ISIC) 2017 dataset, (ii) optic disc segmentation from fundus images on Retinal Fundus Glaucoma Challenge (REFUGE) dataset, and (iii) liver segmentation from volumetric CT scans on Liver Tumor Segmentation Challenge (LiTS) dataset. Compared to the state-of-the-arts, our proposed method shows superior segmentation performance on challenging 2D/3D medical images, demonstrating the effectiveness of our semi-supervised method for medical image segmentation.Comment: Accept at IEEE Transactions on Neural Networks and Learning System

    Music Genre Classification with Paralleling Recurrent Convolutional Neural Network

    Full text link
    Deep learning has been demonstrated its effectiveness and efficiency in music genre classification. However, the existing achievements still have several shortcomings which impair the performance of this classification task. In this paper, we propose a hybrid architecture which consists of the paralleling CNN and Bi-RNN blocks. They focus on spatial features and temporal frame orders extraction respectively. Then the two outputs are fused into one powerful representation of musical signals and fed into softmax function for classification. The paralleling network guarantees the extracting features robust enough to represent music. Moreover, the experiments prove our proposed architecture improve the music genre classification performance and the additional Bi-RNN block is a supplement for CNNs

    Densely Connected Convolutional Networks for Speech Recognition

    Full text link
    This paper presents our latest investigation on Densely Connected Convolutional Networks (DenseNets) for acoustic modelling (AM) in automatic speech recognition. DenseN-ets are very deep, compact convolutional neural networks, which have demonstrated incredible improvements over the state-of-the-art results on several data sets in computer vision. Our experimental results show that DenseNet can be used for AM significantly outperforming other neural-based models such as DNNs, CNNs, VGGs. Furthermore, results on Wall Street Journal revealed that with only a half of the training data DenseNet was able to outperform other models trained with the full data set by a large margin.Comment: 5 pages, 3 figures, the 13th ITG conference on Speech Communicatio

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Semi-Supervised Segmentation of Salt Bodies in Seismic Images using an Ensemble of Convolutional Neural Networks

    Full text link
    Seismic image analysis plays a crucial role in a wide range of industrial applications and has been receiving significant attention. One of the essential challenges of seismic imaging is detecting subsurface salt structure which is indispensable for identification of hydrocarbon reservoirs and drill path planning. Unfortunately, exact identification of large salt deposits is notoriously difficult and professional seismic imaging often requires expert human interpretation of salt bodies. Convolutional neural networks (CNNs) have been successfully applied in many fields, and several attempts have been made in the field of seismic imaging. But the high cost of manual annotations by geophysics experts and scarce publicly available labeled datasets hinder the performance of the existing CNN-based methods. In this work, we propose a semi-supervised method for segmentation (delineation) of salt bodies in seismic images which utilizes unlabeled data for multi-round self-training. To reduce error amplification during self-training we propose a scheme which uses an ensemble of CNNs. We show that our approach outperforms state-of-the-art on the TGS Salt Identification Challenge dataset and is ranked the first among the 3234 competing methods.Comment: Accepted at GCPR 2019, Source code: https://github.com/ybabakhin/kaggle_salt_bes_phalan

    FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations

    Full text link
    Neural network-based methods for image processing are becoming widely used in practical applications. Modern neural networks are computationally expensive and require specialized hardware, such as graphics processing units. Since such hardware is not always available in real life applications, there is a compelling need for the design of neural networks for mobile devices. Mobile neural networks typically have reduced number of parameters and require a relatively small number of arithmetic operations. However, they usually still are executed at the software level and use floating-point calculations. The use of mobile networks without further optimization may not provide sufficient performance when high processing speed is required, for example, in real-time video processing (30 frames per second). In this study, we suggest optimizations to speed up computations in order to efficiently use already trained neural networks on a mobile device. Specifically, we propose an approach for speeding up neural networks by moving computation from software to hardware and by using fixed-point calculations instead of floating-point. We propose a number of methods for neural network architecture design to improve the performance with fixed-point calculations. We also show an example of how existing datasets can be modified and adapted for the recognition task in hand. Finally, we present the design and the implementation of a floating-point gate array-based device to solve the practical problem of real-time handwritten digit classification from mobile camera video feed

    ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

    Full text link
    Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel CNN-RNN-transducer architecture, which we call ContextNet. ContextNet features a fully convolutional encoder that incorporates global context information into convolution layers by adding squeeze-and-excitation modules. In addition, we propose a simple scaling method that scales the widths of ContextNet that achieves good trade-off between computation and accuracy. We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2.1%/4.6% without external language model (LM), 1.9%/4.1% with LM and 2.9%/7.0% with only 10M parameters on the clean/noisy LibriSpeech test sets. This compares to the previous best published system of 2.0%/4.6% with LM and 3.9%/11.3% with 20M parameters. The superiority of the proposed ContextNet model is also verified on a much larger internal dataset.Comment: Submitted to Interspeech 202
    • …
    corecore