143 research outputs found

    Insights from Classifying Visual Concepts with Multiple Kernel Learning

    Get PDF
    Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unfortunately, so-called 1-norm MKL variants are often observed to be outperformed by an unweighted sum kernel. The contribution of this paper is twofold: We apply a recently developed non-sparse MKL variant to state-of-the-art concept recognition tasks within computer vision. We provide insights on benefits and limits of non-sparse MKL and compare it against its direct competitors, the sum kernel SVM and the sparse MKL. We report empirical results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo Annotation challenge data sets. About to be submitted to PLoS ONE.Comment: 18 pages, 8 tables, 4 figures, format deviating from plos one submission format requirements for aesthetic reason

    Kernel and Classifier Level Fusion for Image Classification.

    Get PDF
    Automatic understanding of visual information is one of the main requirements for a complete artificial intelligence system and an essential component of autonomous robots. State-of-the-art image recognition approaches are based on different local descriptors, each capturing some properties of the image such as intensity, color and texture. Each set of local descriptors is represented by a codebook and gives rise to a separate feature channel. For classification the feature channels are combined by using multiple kernel learning (MKL), early fusion or classifier level fusion approaches. Due to the importance of complementary information in fusion techniques, there is an increasing demand for diverse feature channels. The first part of the thesis focuses on the ways to encode information from images that is complementary to the state-of-the-art local features. To address this issue we present a novel image representation which can encode the structure of an object and propose three descriptors based on this representation. In the state-of-the-art recognition system the kernels are often computed independently of each other and thus may be highly informative yet redundant. Proper selection and fusion of the kernels is, therefore, crucial to maximize the performance and to address the efficiency issues in visual recognition applications. We address this issue in second part of the thesis where, we propose novel techniques to fuse feature channels for object and pattern recognition. We present an extensive evaluation of the fusion methods on four object recognition datasets and achieve state-of-the-art results on all of them. We also present results on four bioinformatics datasets to demonstrate that the proposed fusion methods work for a variety of pattern recognition problems, provided that we have multiple feature channels

    Neural Generalization of Multiple Kernel Learning

    Full text link
    Multiple Kernel Learning is a conventional way to learn the kernel function in kernel-based methods. MKL algorithms enhance the performance of kernel methods. However, these methods have a lower complexity compared to deep learning models and are inferior to these models in terms of recognition accuracy. Deep learning models can learn complex functions by applying nonlinear transformations to data through several layers. In this paper, we show that a typical MKL algorithm can be interpreted as a one-layer neural network with linear activation functions. By this interpretation, we propose a Neural Generalization of Multiple Kernel Learning (NGMKL), which extends the conventional multiple kernel learning framework to a multi-layer neural network with nonlinear activation functions. Our experiments on several benchmarks show that the proposed method improves the complexity of MKL algorithms and leads to higher recognition accuracy

    Augmented Kernel Matrix vs Classifier Fusion for Object Recognition

    Full text link

    Anomaly detection & object classification using multi-spectral LiDAR and sonar

    Get PDF
    In this thesis, we present the theory of high-dimensional signal approximation of multifrequency signals. We also present both linear and non-linear compressive sensing (CS) algorithms that generate encoded representations of time-correlated single photon counting (TCSPC) light detection and ranging (LiDAR) data, side-scan sonar (SSS) and synthetic aperture sonar (SAS). The main contributions of this thesis are summarised as follows: 1. Research is carried out studying full-waveform (FW) LiDARs, in particular, the TCSPC data, capture, storage and processing. 2. FW-LiDARs are capable of capturing large quantities of photon-counting data in real-time. However, the real-time processing of the raw LiDAR waveforms hasn’t been widely exploited. This thesis answers some of the fundamental questions: • can semantic information be extracted and encoded from raw multi-spectral FW-LiDAR signals? • can these encoded representations then be used for object segmentation and classification? 3. Research is carried out into signal approximation and compressive sensing techniques, its limitations and the application domains. 4. Research is also carried out in 3D point cloud processing, combining geometric features with material spectra (spectral-depth representation), for object segmentation and classification. 5. Extensive experiments have been carried out with publicly available datasets, e.g. the Washington RGB Image and Depth (RGB-D) dataset [108], YaleB face dataset1 [110], real-world multi-frequency aerial laser scans (ALS)2 and an underwater multifrequency (16 wavelengths) TCSPC dataset collected using custom-build targets especially for this thesis. 6. The multi-spectral measurements were made underwater on targets with different shapes and materials. A novel spectral-depth representation is presented with strong discrimination characteristics on target signatures. Several custom-made and realistically scaled exemplars with known and unknown targets have been investigated using a multi-spectral single photon counting LiDAR system. 7. In this work, we also present a new approach to peak modelling and classification for waveform enabled LiDAR systems. Not all existing approaches perform peak modelling and classification simultaneously in real-time. This was tested on both simulated waveform enabled LiDAR data and real ALS data2 . This PhD also led to an industrial secondment at Carbomap, Edinburgh, where some of the waveform modelling algorithms were implemented in C++ and CUDA for Nvidia TX1 boards for real-time performance. 1http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ 2This dataset was captured in collaboration with Carbomap Ltd. Edinburgh, UK. The data was collected during one of the trials in Austria using commercial-off-the-shelf (COTS) sensors

    Visual vocabularies for category-level object recognition

    Get PDF
    This thesis focuses on the study of visual vocabularies for category-level object recognition. Specifically, we state novel approaches for building visual codebooks. Our aim is not just to obtain more discriminative and more compact visual codebooks, but to bridge the gap between visual features and semantic concepts. A novel approach for obtaining class representative visual words is presented. It is based on a maximisation procedure, i. e. the Cluster Precision Maximisation (CPM), of a novel cluster precision criterion, and on an adaptive threshold refinement scheme for agglomerative clustering algorithms based on correlation clustering techniques. The objective is to increase the vocabulary compactness while at the same time improve the recognition rate and further increase the representativeness of the visual words. Moreover, we describe a novel clustering aggregation based approach for building efficient and semantic visual vocabularies. It consist of a novel framework for incorporating neighboring appearances of local descriptors into the vocabulary construction, and a rigorous approach for adding meaningful spatial coherency among the local features into the visual codebooks. We also propose an efficient high-dimensional data clustering algorithm, the Fast Reciprocal Nearest Neighbours (Fast-RNN). Our approach, which is a speeded up version of the standard RNN algorithm, is based on the projection search paradigm. Finally, we release a new database of images called Image Collection of Annotated Real-world Objects (ICARO), which is especially designed for evaluating category-level object recognition systems. An exhaustive comparison of ICARO with other well-known datasets used within the same context is carried out. We also propose a benchmark for both object classification and detection

    Visual vocabularies for category-level object recognition

    Get PDF
    This thesis focuses on the study of visual vocabularies for category-level object recognition. Specifically, we state novel approaches for building visual codebooks. Our aim is not just to obtain more discriminative and more compact visual codebooks, but to bridge the gap between visual features and semantic concepts. A novel approach for obtaining class representative visual words is presented. It is based on a maximisation procedure, i. e. the Cluster Precision Maximisation (CPM), of a novel cluster precision criterion, and on an adaptive threshold refinement scheme for agglomerative clustering algorithms based on correlation clustering techniques. The objective is to increase the vocabulary compactness while at the same time improve the recognition rate and further increase the representativeness of the visual words. Moreover, we describe a novel clustering aggregation based approach for building efficient and semantic visual vocabularies. It consist of a novel framework for incorporating neighboring appearances of local descriptors into the vocabulary construction, and a rigorous approach for adding meaningful spatial coherency among the local features into the visual codebooks. We also propose an efficient high-dimensional data clustering algorithm, the Fast Reciprocal Nearest Neighbours (Fast-RNN). Our approach, which is a speeded up version of the standard RNN algorithm, is based on the projection search paradigm. Finally, we release a new database of images called Image Collection of Annotated Real-world Objects (ICARO), which is especially designed for evaluating category-level object recognition systems. An exhaustive comparison of ICARO with other well-known datasets used within the same context is carried out. We also propose a benchmark for both object classification and detection
    • …
    corecore