1,042 research outputs found

    A Convolutional Neural Network model based on Neutrosophy for Noisy Speech Recognition

    Full text link
    Convolutional neural networks are sensitive to unknown noisy condition in the test phase and so their performance degrades for the noisy data classification task including noisy speech recognition. In this research, a new convolutional neural network (CNN) model with data uncertainty handling; referred as NCNN (Neutrosophic Convolutional Neural Network); is proposed for classification task. Here, speech signals are used as input data and their noise is modeled as uncertainty. In this task, using speech spectrogram, a definition of uncertainty is proposed in neutrosophic (NS) domain. Uncertainty is computed for each Time-frequency point of speech spectrogram as like a pixel. Therefore, uncertainty matrix with the same size of spectrogram is created in NS domain. In the next step, a two parallel paths CNN classification model is proposed. Speech spectrogram is used as input of the first path and uncertainty matrix for the second path. The outputs of two paths are combined to compute the final output of the classifier. To show the effectiveness of the proposed method, it has been compared with conventional CNN on the isolated words of Aurora2 dataset. The proposed method achieves the average accuracy of 85.96 in noisy train data. It is more robust against Car, Airport and Subway noises with accuracies 90, 88 and 81 in test sets A, B and C, respectively. Results show that the proposed method outperforms conventional CNN with the improvement of 6, 5 and 2 percentage in test set A, test set B and test sets C, respectively. It means that the proposed method is more robust against noisy data and handle these data effectively.Comment: International conference on Pattern Recognition and Image Analysis (IPRIA 2019

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

    Full text link
    Image annotation aims to annotate a given image with a variable number of class labels corresponding to diverse visual concepts. In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept; 2) how to annotate an image with the optimal number of class labels. To address the first issue, we propose a novel multi-scale deep model for extracting rich and discriminative features capable of representing a wide range of visual concepts. Specifically, a novel two-branch deep neural network architecture is proposed which comprises a very deep main network branch and a companion feature fusion network branch designed for fusing the multi-scale features computed from the main branch. The deep model is also made multi-modal by taking noisy user-provided tags as model input to complement the image input. For tackling the second issue, we introduce a label quantity prediction auxiliary task to the main label prediction task to explicitly estimate the optimal label number for a given image. Extensive experiments are carried out on two large-scale image annotation benchmark datasets and the results show that our method significantly outperforms the state-of-the-art.Comment: Submited to IEEE TI

    CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery

    Full text link
    Accurate and robust detection of multi-class objects in optical remote sensing images is essential to many real-world applications such as urban planning, traffic control, searching and rescuing, etc. However, state-of-the-art object detection techniques designed for images captured using ground-level sensors usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in term of sparse texture, low contrast, arbitrary orientations, large scale variations, etc. This paper presents a novel object detection network (CAD-Net) that exploits attention-modulated features as well as global and local contexts to address the new challenges in detecting objects from remote sensing images. The proposed CAD-Net learns global and local contexts of objects by capturing their correlations with the global scene (at scene-level) and the local neighboring objects or features (at object-level), respectively. In addition, it designs a spatial-and-scale-aware attention module that guides the network to focus on more informative regions and features as well as more appropriate feature scales. Experiments over two publicly available object detection datasets for remote sensing images demonstrate that the proposed CAD-Net achieves superior detection performance. The implementation codes will be made publicly available for facilitating future researches
    • …
    corecore