143 research outputs found
Insights from Classifying Visual Concepts with Multiple Kernel Learning
Combining information from various image features has become a standard
technique in concept recognition tasks. However, the optimal way of fusing the
resulting kernel functions is usually unknown in practical applications.
Multiple kernel learning (MKL) techniques allow to determine an optimal linear
combination of such similarity matrices. Classical approaches to MKL promote
sparse mixtures. Unfortunately, so-called 1-norm MKL variants are often
observed to be outperformed by an unweighted sum kernel. The contribution of
this paper is twofold: We apply a recently developed non-sparse MKL variant to
state-of-the-art concept recognition tasks within computer vision. We provide
insights on benefits and limits of non-sparse MKL and compare it against its
direct competitors, the sum kernel SVM and the sparse MKL. We report empirical
results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo
Annotation challenge data sets. About to be submitted to PLoS ONE.Comment: 18 pages, 8 tables, 4 figures, format deviating from plos one
submission format requirements for aesthetic reason
Kernel and Classifier Level Fusion for Image Classification.
Automatic understanding of visual information is one of the main requirements for a complete artificial intelligence system and an essential component of autonomous robots. State-of-the-art image recognition approaches are based on different local descriptors, each capturing some properties of the image such as intensity, color and texture. Each set of local descriptors is represented by a codebook and gives rise to a separate feature channel. For classification the feature channels are combined by using multiple kernel learning (MKL), early fusion or classifier level fusion approaches. Due to the importance of complementary information in fusion techniques, there is an increasing demand for diverse feature channels. The first part of the thesis focuses on the ways to encode information from images that is complementary to the state-of-the-art local features. To address this issue we present a novel image representation which can encode the structure of an object and propose three descriptors based on this representation. In the state-of-the-art recognition system the kernels are often computed independently of each other and thus may be highly informative yet redundant. Proper selection and fusion of the kernels is, therefore, crucial to maximize the performance and to address the efficiency issues in visual recognition applications. We address this issue in second part of the thesis where, we propose novel techniques to fuse feature channels for object and pattern recognition. We present an extensive evaluation of the fusion methods on four object recognition datasets and achieve state-of-the-art results on all of them. We also present results on four bioinformatics datasets to demonstrate that the proposed fusion methods work for a variety of pattern recognition problems, provided that we have multiple feature channels
Neural Generalization of Multiple Kernel Learning
Multiple Kernel Learning is a conventional way to learn the kernel function
in kernel-based methods. MKL algorithms enhance the performance of kernel
methods. However, these methods have a lower complexity compared to deep
learning models and are inferior to these models in terms of recognition
accuracy. Deep learning models can learn complex functions by applying
nonlinear transformations to data through several layers. In this paper, we
show that a typical MKL algorithm can be interpreted as a one-layer neural
network with linear activation functions. By this interpretation, we propose a
Neural Generalization of Multiple Kernel Learning (NGMKL), which extends the
conventional multiple kernel learning framework to a multi-layer neural network
with nonlinear activation functions. Our experiments on several benchmarks show
that the proposed method improves the complexity of MKL algorithms and leads to
higher recognition accuracy
Anomaly detection & object classification using multi-spectral LiDAR and sonar
In this thesis, we present the theory of high-dimensional signal approximation of multifrequency signals. We also present both linear and non-linear compressive sensing (CS)
algorithms that generate encoded representations of time-correlated single photon counting (TCSPC) light detection and ranging (LiDAR) data, side-scan sonar (SSS) and synthetic aperture sonar (SAS). The main contributions of this thesis are summarised as
follows:
1. Research is carried out studying full-waveform (FW) LiDARs, in particular, the
TCSPC data, capture, storage and processing.
2. FW-LiDARs are capable of capturing large quantities of photon-counting data in
real-time. However, the real-time processing of the raw LiDAR waveforms hasn’t
been widely exploited. This thesis answers some of the fundamental questions:
• can semantic information be extracted and encoded from raw multi-spectral
FW-LiDAR signals?
• can these encoded representations then be used for object segmentation and
classification?
3. Research is carried out into signal approximation and compressive sensing techniques, its limitations and the application domains.
4. Research is also carried out in 3D point cloud processing, combining geometric features with material spectra (spectral-depth representation), for object segmentation
and classification.
5. Extensive experiments have been carried out with publicly available datasets, e.g.
the Washington RGB Image and Depth (RGB-D) dataset [108], YaleB face dataset1
[110], real-world multi-frequency aerial laser scans (ALS)2 and an underwater multifrequency (16 wavelengths) TCSPC dataset collected using custom-build targets
especially for this thesis.
6. The multi-spectral measurements were made underwater on targets with different shapes and materials. A novel spectral-depth representation is presented with
strong discrimination characteristics on target signatures. Several custom-made
and realistically scaled exemplars with known and unknown targets have been investigated using a multi-spectral single photon counting LiDAR system.
7. In this work, we also present a new approach to peak modelling and classification
for waveform enabled LiDAR systems. Not all existing approaches perform peak
modelling and classification simultaneously in real-time. This was tested on both
simulated waveform enabled LiDAR data and real ALS data2
.
This PhD also led to an industrial secondment at Carbomap, Edinburgh, where some of
the waveform modelling algorithms were implemented in C++ and CUDA for Nvidia TX1
boards for real-time performance.
1http://vision.ucsd.edu/~leekc/ExtYaleDatabase/
2This dataset was captured in collaboration with Carbomap Ltd. Edinburgh, UK. The data was
collected during one of the trials in Austria using commercial-off-the-shelf (COTS) sensors
Visual vocabularies for category-level object recognition
This thesis focuses on the study of visual vocabularies for category-level object recognition. Specifically, we state novel approaches for building visual codebooks. Our aim is not just to obtain more discriminative and more compact visual codebooks, but to bridge the gap between visual features and semantic concepts. A novel approach for obtaining class representative visual words is presented. It is based on a maximisation procedure, i. e. the Cluster Precision Maximisation (CPM), of a novel cluster precision criterion, and on an adaptive threshold refinement scheme for agglomerative clustering algorithms based on correlation clustering techniques. The objective is to increase the vocabulary compactness while at the same time improve the recognition rate and further increase the representativeness of the visual words. Moreover, we describe a novel clustering aggregation based approach for building efficient and semantic visual vocabularies. It consist of a novel framework for incorporating neighboring appearances of local descriptors into the vocabulary construction, and a rigorous approach for adding meaningful spatial coherency among the local features into the visual codebooks. We also propose an efficient high-dimensional data clustering algorithm, the Fast Reciprocal Nearest Neighbours (Fast-RNN). Our approach, which is a speeded up version of the standard RNN algorithm, is based on the projection search paradigm. Finally, we release a new database of images called Image Collection of Annotated Real-world Objects (ICARO), which is especially designed for evaluating category-level object recognition systems. An exhaustive comparison of ICARO with other well-known datasets used within the same context is carried out. We also propose a benchmark for both object classification and detection
Visual vocabularies for category-level object recognition
This thesis focuses on the study of visual vocabularies for category-level object recognition. Specifically, we state novel approaches for building visual codebooks. Our aim is not just to obtain more discriminative and more compact visual codebooks, but to bridge the gap between visual features and semantic concepts. A novel approach for obtaining class representative visual words is presented. It is based on a maximisation procedure, i. e. the Cluster Precision Maximisation (CPM), of a novel cluster precision criterion, and on an adaptive threshold refinement scheme for agglomerative clustering algorithms based on correlation clustering techniques. The objective is to increase the vocabulary compactness while at the same time improve the recognition rate and further increase the representativeness of the visual words. Moreover, we describe a novel clustering aggregation based approach for building efficient and semantic visual vocabularies. It consist of a novel framework for incorporating neighboring appearances of local descriptors into the vocabulary construction, and a rigorous approach for adding meaningful spatial coherency among the local features into the visual codebooks. We also propose an efficient high-dimensional data clustering algorithm, the Fast Reciprocal Nearest Neighbours (Fast-RNN). Our approach, which is a speeded up version of the standard RNN algorithm, is based on the projection search paradigm. Finally, we release a new database of images called Image Collection of Annotated Real-world Objects (ICARO), which is especially designed for evaluating category-level object recognition systems. An exhaustive comparison of ICARO with other well-known datasets used within the same context is carried out. We also propose a benchmark for both object classification and detection
- …