3,989 research outputs found

    Augmented Kernel Matrix vs Classifier Fusion for Object Recognition

    Full text link

    Kernel and Classifier Level Fusion for Image Classification.

    Get PDF
    Automatic understanding of visual information is one of the main requirements for a complete artificial intelligence system and an essential component of autonomous robots. State-of-the-art image recognition approaches are based on different local descriptors, each capturing some properties of the image such as intensity, color and texture. Each set of local descriptors is represented by a codebook and gives rise to a separate feature channel. For classification the feature channels are combined by using multiple kernel learning (MKL), early fusion or classifier level fusion approaches. Due to the importance of complementary information in fusion techniques, there is an increasing demand for diverse feature channels. The first part of the thesis focuses on the ways to encode information from images that is complementary to the state-of-the-art local features. To address this issue we present a novel image representation which can encode the structure of an object and propose three descriptors based on this representation. In the state-of-the-art recognition system the kernels are often computed independently of each other and thus may be highly informative yet redundant. Proper selection and fusion of the kernels is, therefore, crucial to maximize the performance and to address the efficiency issues in visual recognition applications. We address this issue in second part of the thesis where, we propose novel techniques to fuse feature channels for object and pattern recognition. We present an extensive evaluation of the fusion methods on four object recognition datasets and achieve state-of-the-art results on all of them. We also present results on four bioinformatics datasets to demonstrate that the proposed fusion methods work for a variety of pattern recognition problems, provided that we have multiple feature channels

    A Novel Approach to Face Recognition using Image Segmentation based on SPCA-KNN Method

    Get PDF
    In this paper we propose a novel method for face recognition using hybrid SPCA-KNN (SIFT-PCA-KNN) approach. The proposed method consists of three parts. The first part is based on preprocessing face images using Graph Based algorithm and SIFT (Scale Invariant Feature Transform) descriptor. Graph Based topology is used for matching two face images. In the second part eigen values and eigen vectors are extracted from each input face images. The goal is to extract the important information from the face data, to represent it as a set of new orthogonal variables called principal components. In the final part a nearest neighbor classifier is designed for classifying the face images based on the SPCA-KNN algorithm. The algorithm has been tested on 100 different subjects (15 images for each class). The experimental result shows that the proposed method has a positive effect on overall face recognition performance and outperforms other examined methods

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    On Designing Deep Learning Approaches for Classification of Football Jersey Images in the Wild

    Get PDF
    Internet shopping has spread wide and into social networking. Someone may want to buy a shirt, accessories, etc., in a random picture or a streaming video. In this thesis, the problem of automatic classification was taken upon, constraining the target to jerseys in the wild, assuming the object is detected.;A dataset of 7,840 jersey images, namely the JerseyXIV is created, containing images of 14 categories of various football jersey types (Home and Alternate) belonging to 10 teams of 2015 Big 12 Conference football season. The quality of images varies in terms of pose, standoff distance, level of occlusion and illumination. Due to copyright restrictions on certain images, unaltered original images with appropriate credits can be provided upon request.;While various conventional and deep learning based classification approaches were empirically designed, optimized and tested, a solution that resulted in the highest accuracy in terms of classification was achieved by a train-time fused Convolutional Neural Network (CNN) architecture, namely CNN-F, with 92.61% accuracy. The final solution combines three different CNNs through score level average fusion achieving 96.90% test accuracy. To test these trained CNN models on a larger, application oriented scale, a video dataset is created, which may present an addition of higher rate of occlusion and elements of transmission noise. It consists of 14 videos, one for each class, totaling to 3,584 frames, with 2,188 frames containing the object of interest. With manual detection, the score level average fusion has achieved the highest classification accuracy of 81.31%.;In addition, three Image Quality Assessment techniques were tested to assess the drop in accuracy of the average-fusion method on the video dataset. The Natural Image Quality Evaluator (NIQE) index by Bovik et al. with a threshold of 0.40 on input images improved the test accuracy of the average fusion model on the video dataset to 86.36% by removing the low quality input images before it reaches the CNN.;The thesis concludes that the recommended solution for the classification is composed of data augmentation and fusion of networks, while for application of trained models on videos, an image quality metric would aid in performance increase with a trade-off in loss of input data
    corecore