9,200 research outputs found
Elastic Functional Coding of Riemannian Trajectories
Visual observations of dynamic phenomena, such as human actions, are often
represented as sequences of smoothly-varying features . In cases where the
feature spaces can be structured as Riemannian manifolds, the corresponding
representations become trajectories on manifolds. Analysis of these
trajectories is challenging due to non-linearity of underlying spaces and
high-dimensionality of trajectories. In vision problems, given the nature of
physical systems involved, these phenomena are better characterized on a
low-dimensional manifold compared to the space of Riemannian trajectories. For
instance, if one does not impose physical constraints of the human body, in
data involving human action analysis, the resulting representation space will
have highly redundant features. Learning an effective, low-dimensional
embedding for action representations will have a huge impact in the areas of
search and retrieval, visualization, learning, and recognition. The difficulty
lies in inherent non-linearity of the domain and temporal variability of
actions that can distort any traditional metric between trajectories. To
overcome these issues, we use the framework based on transported square-root
velocity fields (TSRVF); this framework has several desirable properties,
including a rate-invariant metric and vector space representations. We propose
to learn an embedding such that each action trajectory is mapped to a single
point in a low-dimensional Euclidean space, and the trajectories that differ
only in temporal rates map to the same point. We utilize the TSRVF
representation, and accompanying statistical summaries of Riemannian
trajectories, to extend existing coding methods such as PCA, KSVD and Label
Consistent KSVD to Riemannian trajectories or more generally to Riemannian
functions.Comment: Under major revision at IEEE T-PAMI, 201
Graph based manifold regularized deep neural networks for automatic speech recognition
Deep neural networks (DNNs) have been successfully applied to a wide variety
of acoustic modeling tasks in recent years. These include the applications of
DNNs either in a discriminative feature extraction or in a hybrid acoustic
modeling scenario. Despite the rapid progress in this area, a number of
challenges remain in training DNNs. This paper presents an effective way of
training DNNs using a manifold learning based regularization framework. In this
framework, the parameters of the network are optimized to preserve underlying
manifold based relationships between speech feature vectors while minimizing a
measure of loss between network outputs and targets. This is achieved by
incorporating manifold based locality constraints in the objective criterion of
DNNs. Empirical evidence is provided to demonstrate that training a network
with manifold constraints preserves structural compactness in the hidden layers
of the network. Manifold regularization is applied to train bottleneck DNNs for
feature extraction in hidden Markov model (HMM) based speech recognition. The
experiments in this work are conducted on the Aurora-2 spoken digits and the
Aurora-4 read news large vocabulary continuous speech recognition tasks. The
performance is measured in terms of word error rate (WER) on these tasks. It is
shown that the manifold regularized DNNs result in up to 37% reduction in WER
relative to standard DNNs.Comment: 12 pages including citations, 2 figure
Learning with Inadequate and Incorrect Supervision
Practically, we are often in the dilemma that the labeled data at hand are
inadequate to train a reliable classifier, and more seriously, some of these
labeled data may be mistakenly labeled due to the various human factors.
Therefore, this paper proposes a novel semi-supervised learning paradigm that
can handle both label insufficiency and label inaccuracy. To address label
insufficiency, we use a graph to bridge the data points so that the label
information can be propagated from the scarce labeled examples to unlabeled
examples along the graph edges. To address label inaccuracy, Graph Trend
Filtering (GTF) and Smooth Eigenbase Pursuit (SEP) are adopted to filter out
the initial noisy labels. GTF penalizes the l_0 norm of label difference
between connected examples in the graph and exhibits better local adaptivity
than the traditional l_2 norm-based Laplacian smoother. SEP reconstructs the
correct labels by emphasizing the leading eigenvectors of Laplacian matrix
associated with small eigenvalues, as these eigenvectors reflect real label
smoothness and carry rich class separation cues. We term our algorithm as
`Semi-supervised learning under Inadequate and Incorrect Supervision' (SIIS).
Thorough experimental results on image classification, text categorization, and
speech recognition demonstrate that our SIIS is effective in label error
correction, leading to superior performance to the state-of-the-art methods in
the presence of label noise and label scarcity
ROSA: Robust Salient Object Detection against Adversarial Attacks
Recently salient object detection has witnessed remarkable improvement owing
to the deep convolutional neural networks which can harvest powerful features
for images. In particular, state-of-the-art salient object detection methods
enjoy high accuracy and efficiency from fully convolutional network (FCN) based
frameworks which are trained from end to end and predict pixel-wise labels.
However, such framework suffers from adversarial attacks which confuse neural
networks via adding quasi-imperceptible noises to input images without changing
the ground truth annotated by human subjects. To our knowledge, this paper is
the first one that mounts successful adversarial attacks on salient object
detection models and verifies that adversarial samples are effective on a wide
range of existing methods. Furthermore, this paper proposes a novel end-to-end
trainable framework to enhance the robustness for arbitrary FCN-based salient
object detection models against adversarial attacks. The proposed framework
adopts a novel idea that first introduces some new generic noise to destroy
adversarial perturbations, and then learns to predict saliency maps for input
images with the introduced noise. Specifically, our proposed method consists of
a segment-wise shielding component, which preserves boundaries and destroys
delicate adversarial noise patterns and a context-aware restoration component,
which refines saliency maps through global contrast modeling. Experimental
results suggest that our proposed framework improves the performance
significantly for state-of-the-art models on a series of datasets.Comment: To be published in IEEE Transactions on Cybernetic
Robust Sparse Coding via Self-Paced Learning
Sparse coding (SC) is attracting more and more attention due to its
comprehensive theoretical studies and its excellent performance in many signal
processing applications. However, most existing sparse coding algorithms are
nonconvex and are thus prone to becoming stuck into bad local minima,
especially when there are outliers and noisy data. To enhance the learning
robustness, in this paper, we propose a unified framework named Self-Paced
Sparse Coding (SPSC), which gradually include matrix elements into SC learning
from easy to complex. We also generalize the self-paced learning schema into
different levels of dynamic selection on samples, features and elements
respectively. Experimental results on real-world data demonstrate the efficacy
of the proposed algorithms.Comment: submitted to AAAI201
Machine learning in acoustics: theory and applications
Acoustic data provide scientific and engineering insights in fields ranging
from biology and communications to ocean and Earth science. We survey the
recent advances and transformative potential of machine learning (ML),
including deep learning, in the field of acoustics. ML is a broad family of
techniques, which are often based in statistics, for automatically detecting
and utilizing patterns in data. Relative to conventional acoustics and signal
processing, ML is data-driven. Given sufficient training data, ML can discover
complex relationships between features and desired labels or actions, or
between features themselves. With large volumes of training data, ML can
discover models describing complex acoustic phenomena such as human speech and
reverberation. ML in acoustics is rapidly developing with compelling results
and significant future promise. We first introduce ML, then highlight ML
developments in four acoustics research areas: source localization in speech
processing, source localization in ocean acoustics, bioacoustics, and
environmental sounds in everyday scenes.Comment: Published with free access in Journal of the Acoustical Society of
America, 27 Nov. 201
An Exploration of Mimic Architectures for Residual Network Based Spectral Mapping
Spectral mapping uses a deep neural network (DNN) to map directly from noisy
speech to clean speech. Our previous study found that the performance of
spectral mapping improves greatly when using helpful cues from an acoustic
model trained on clean speech. The mapper network learns to mimic the input
favored by the spectral classifier and cleans the features accordingly. In this
study, we explore two new innovations: we replace a DNN-based spectral mapper
with a residual network that is more attuned to the goal of predicting clean
speech. We also examine how integrating long term context in the mimic
criterion (via wide-residual biLSTM networks) affects the performance of
spectral mapping compared to DNNs. Our goal is to derive a model that can be
used as a preprocessor for any recognition system; the features derived from
our model are passed through the standard Kaldi ASR pipeline and achieve a WER
of 9.3%, which is the lowest recorded word error rate for CHiME-2 dataset using
only feature adaptation.Comment: Published in the IEEE 2018 Workshop on Spoken Language Technology
(SLT 2018
State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations
Machine learning promises methods that generalize well from finite labeled
data. However, the brittleness of existing neural net approaches is revealed by
notable failures, such as the existence of adversarial examples that are
misclassified despite being nearly identical to a training example, or the
inability of recurrent sequence-processing nets to stay on track without
teacher forcing. We introduce a method, which we refer to as \emph{state
reification}, that involves modeling the distribution of hidden states over the
training data and then projecting hidden states observed during testing toward
this distribution. Our intuition is that if the network can remain in a
familiar manifold of hidden space, subsequent layers of the net should be well
trained to respond appropriately. We show that this state-reification method
helps neural nets to generalize better, especially when labeled data are
sparse, and also helps overcome the challenge of achieving robust
generalization with adversarial training.Comment: ICML 2019 [full oral]. arXiv admin note: text overlap with
arXiv:1805.0839
Speech Recognition Front End Without Information Loss
Speech representation and modelling in high-dimensional spaces of acoustic
waveforms, or a linear transformation thereof, is investigated with the aim of
improving the robustness of automatic speech recognition to additive noise. The
motivation behind this approach is twofold: (i) the information in acoustic
waveforms that is usually removed in the process of extracting low-dimensional
features might aid robust recognition by virtue of structured redundancy
analogous to channel coding, (ii) linear feature domains allow for exact noise
adaptation, as opposed to representations that involve non-linear processing
which makes noise adaptation challenging. Thus, we develop a generative
framework for phoneme modelling in high-dimensional linear feature domains, and
use it in phoneme classification and recognition tasks. Results show that
classification and recognition in this framework perform better than analogous
PLP and MFCC classifiers below 18 dB SNR. A combination of the high-dimensional
and MFCC features at the likelihood level performs uniformly better than either
of the individual representations across all noise levels
Reconstruction of sequential data with density models
We introduce the problem of reconstructing a sequence of multidimensional
real vectors where some of the data are missing. This problem contains
regression and mapping inversion as particular cases where the pattern of
missing data is independent of the sequence index. The problem is hard because
it involves possibly multivalued mappings at each vector in the sequence, where
the missing variables can take more than one value given the present variables;
and the set of missing variables can vary from one vector to the next. To solve
this problem, we propose an algorithm based on two redundancy assumptions:
vector redundancy (the data live in a low-dimensional manifold), so that the
present variables constrain the missing ones; and sequence redundancy (e.g.
continuity), so that consecutive vectors constrain each other. We capture the
low-dimensional nature of the data in a probabilistic way with a joint density
model, here the generative topographic mapping, which results in a Gaussian
mixture. Candidate reconstructions at each vector are obtained as all the modes
of the conditional distribution of missing variables given present variables.
The reconstructed sequence is obtained by minimising a global constraint, here
the sequence length, by dynamic programming. We present experimental results
for a toy problem and for inverse kinematics of a robot arm.Comment: 30 pages, 9 figures. Original manuscript dated January 27, 2004 and
not updated since. Current author's email address:
[email protected]
- …