33,833 research outputs found
Advances in Hyperspectral Image Classification: Earth monitoring with statistical learning methods
Hyperspectral images show similar statistical properties to natural grayscale
or color photographic images. However, the classification of hyperspectral
images is more challenging because of the very high dimensionality of the
pixels and the small number of labeled examples typically available for
learning. These peculiarities lead to particular signal processing problems,
mainly characterized by indetermination and complex manifolds. The framework of
statistical learning has gained popularity in the last decade. New methods have
been presented to account for the spatial homogeneity of images, to include
user's interaction via active learning, to take advantage of the manifold
structure with semisupervised learning, to extract and encode invariances, or
to adapt classifiers and image representations to unseen yet similar scenes.
This tutuorial reviews the main advances for hyperspectral remote sensing image
classification through illustrative examples.Comment: IEEE Signal Processing Magazine, 201
Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations
The success of deep convolutional architectures is often attributed in part
to their ability to learn multiscale and invariant representations of natural
signals. However, a precise study of these properties and how they affect
learning guarantees is still missing. In this paper, we consider deep
convolutional representations of signals; we study their invariance to
translations and to more general groups of transformations, their stability to
the action of diffeomorphisms, and their ability to preserve signal
information. This analysis is carried by introducing a multilayer kernel based
on convolutional kernel networks and by studying the geometry induced by the
kernel mapping. We then characterize the corresponding reproducing kernel
Hilbert space (RKHS), showing that it contains a large class of convolutional
neural networks with homogeneous activation functions. This analysis allows us
to separate data representation from learning, and to provide a canonical
measure of model complexity, the RKHS norm, which controls both stability and
generalization of any learned model. In addition to models in the constructed
RKHS, our stability analysis also applies to convolutional networks with
generic activations such as rectified linear units, and we discuss its
relationship with recent generalization bounds based on spectral norms
Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval
Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval
- …