3,615 research outputs found
Learning midlevel image features for natural scene and texture classification
This paper deals with coding of natural scenes in order to extract semantic information. We present a new scheme to project natural scenes onto a basis in which each dimension encodes statistically independent information. Basis extraction is performed by independent component analysis (ICA) applied to image patches culled from natural scenes. The study of the resulting coding units (coding filters) extracted from well-chosen categories of images shows that they adapt and respond selectively to discriminant features in natural scenes. Given this basis, we define global and local image signatures relying on the maximal activity of filters on the input image. Locally, the construction of the signature takes into account the spatial distribution of the maximal responses within the image. We propose a criterion to reduce the size of the space of representation for faster computation. The proposed approach is tested in the context of texture classification (111 classes), as well as natural scenes classification (11 categories, 2037 images). Using a common protocol, the other commonly used descriptors have at most 47.7% accuracy on average while our method obtains performances of up to 63.8%. We show that this advantage does not depend on the size of the signature and demonstrate the efficiency of the proposed criterion to select ICA filters and reduce the dimensio
Customizing kernel functions for SVM-based hyperspectral image classification
Previous research applying kernel methods such as support vector machines (SVMs) to hyperspectral image classification has achieved performance competitive with the best available algorithms. However, few efforts have been made to extend SVMs to cover the specific requirements of hyperspectral image classification, for example, by building tailor-made kernels. Observation of real-life spectral imagery from the AVIRIS hyperspectral sensor shows that the useful information for classification is not equally distributed across bands, which provides potential to enhance the SVM's performance through exploring different kernel functions. Spectrally weighted kernels are, therefore, proposed, and a set of particular weights is chosen by either optimizing an estimate of generalization error or evaluating each band's utility level. To assess the effectiveness of the proposed method, experiments are carried out on the publicly available 92AV3C dataset collected from the 220-dimensional AVIRIS hyperspectral sensor. Results indicate that the method is generally effective in improving performance: spectral weighting based on learning weights by gradient descent is found to be slightly better than an alternative method based on estimating ";relevance"; between band information and ground trut
Recommended from our members
Explainable and Advisable Learning for Self-driving Vehicles
Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers, etc., can understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rationalizations. Our work has focused on the challenge of generating introspective explanations of deep models for self-driving vehicles. In Chapter 3, we begin by exploring the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. In Chapter 4, we add an attention-based video-to-text model to produce textual explanations of model actions, e.g. "the car slows down because the road is wet". The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. These explainable systems represent an externalization of tacit knowledge. The network's opaque reasoning is simplified to a situation-specific dependence on a visible object in the image. This makes them brittle and potentially unsafe in situations that do not match training data. In Chapter 5, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Further, in Chapter 6, we propose a new approach that learns vehicle control with the help of long-term (global) human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly
3D Object Recognition Based On Constrained 2D Views
The aim of the present work was to build a novel 3D object recognition system capable of classifying
man-made and natural objects based on single 2D views. The approach to this problem
has been one motivated by recent theories on biological vision and multiresolution analysis. The
project's objectives were the implementation of a system that is able to deal with simple 3D
scenes and constitutes an engineering solution to the problem of 3D object recognition, allowing
the proposed recognition system to operate in a practically acceptable time frame.
The developed system takes further the work on automatic classification of marine phytoplank-
(ons, carried out at the Centre for Intelligent Systems, University of Plymouth. The thesis discusses
the main theoretical issues that prompted the fundamental system design options. The
principles and the implementation of the coarse data channels used in the system are described.
A new multiresolution representation of 2D views is presented, which provides the classifier
module of the system with coarse-coded descriptions of the scale-space distribution of potentially
interesting features. A multiresolution analysis-based mechanism is proposed, which directs
the system's attention towards potentially salient features. Unsupervised similarity-based
feature grouping is introduced, which is used in coarse data channels to yield feature signatures
that are not spatially coherent and provide the classifier module with salient descriptions of object
views. A simple texture descriptor is described, which is based on properties of a special wavelet
transform.
The system has been tested on computer-generated and natural image data sets, in conditions
where the inter-object similarity was monitored and quantitatively assessed by human subjects,
or the analysed objects were very similar and their discrimination constituted a difficult task even
for human experts. The validity of the above described approaches has been proven. The studies
conducted with various statistical and artificial neural network-based classifiers have shown that
the system is able to perform well in all of the above mentioned situations. These investigations
also made possible to take further and generalise a number of important conclusions drawn during
previous work carried out in the field of 2D shape (plankton) recognition, regarding the behaviour
of multiple coarse data channels-based pattern recognition systems and various classifier
architectures.
The system possesses the ability of dealing with difficult field-collected images of objects and
the techniques employed by its component modules make possible its extension to the domain
of complex multiple-object 3D scene recognition. The system is expected to find immediate applicability
in the field of marine biota classification
Acoustic-Phonetic Features for the Automatic Classification of Stop Consonants
In this paper, the acoustic–phonetic characteristics of American English stop consonants are investigated. Features studied in the literature are evaluated for their information content and new features are proposed. A statistically guided, knowledge-based, acoustic–phonetic system for the automatic classification of stops, in speaker independent continuous speech, is proposed. The system uses a new auditory-based front-end processing and incorporates new algorithms for the extraction and manipulation of the acoustic–phonetic features that proved to be rich in their information content. Recognition experiments are performed using hard decision algorithms on stops extracted from the TIMIT database continuous speech of 60 speakers (not used in the design process) from seven different dialects of American English. An accuracy of 96% is obtained for voicing detection, 90% for place articulation detection and 86% for the overall classification of stops
Operator State Estimation for Adaptive Aiding in Uninhabited Combat Air Vehicles
This research demonstrated the first closed-loop implementation of adaptive automation using operator functional state in an operationally relevant environment. In the Uninhabited Combat Air Vehicle (UCAV) environment, operators can become cognitively overloaded and their performance may decrease during mission critical events. This research demonstrates an unprecedented closed-loop system, one that adaptively aids UCAV operators based on their cognitive functional state A series of experiments were conducted to 1) determine the best classifiers for estimating operator functional state, 2) determine if physiological measures can be used to develop multiple cognitive models based on information processing demands and task type, 3) determine the salient psychophysiological measures in operator functional state, and 4) demonstrate the benefits of intelligent adaptive aiding using operator functional state. Aiding the operator actually improved performance and increased mission effectiveness by 67%
VISUAL SALIENCY ANALYSIS, PREDICTION, AND VISUALIZATION: A DEEP LEARNING PERSPECTIVE
In the recent years, a huge success has been accomplished in prediction of human eye fixations. Several studies employed deep learning to achieve high accuracy of prediction of human eye fixations. These studies rely on pre-trained deep learning for object classification. They exploit deep learning either as a transfer-learning problem, or the weights of the pre-trained network as the initialization to learn a saliency model. The utilization of such pre-trained neural networks is due to the relatively small datasets of human fixations available to train a deep learning model. Another relatively less prioritized problem is amount of computation of such deep learning models requires expensive hardware. In this dissertation, two approaches are proposed to tackle abovementioned problems. The first approach, codenamed DeepFeat, incorporates the deep features of convolutional neural networks pre-trained for object and scene classifications. This approach is the first approach that uses deep features without further learning. Performance of the DeepFeat model is extensively evaluated over a variety of datasets using a variety of implementations. The second approach is a deep learning saliency model, codenamed ClassNet. Two main differences separate the ClassNet from other deep learning saliency models. The ClassNet model is the only deep learning saliency model that learns its weights from scratch. In addition, the ClassNet saliency model treats prediction of human fixation as a classification problem, while other deep learning saliency models treat the human fixation prediction as a regression problem or as a classification of a regression problem
- …