4,929 research outputs found
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Feature Selection via L1-Penalized Squared-Loss Mutual Information
Feature selection is a technique to screen out less important features. Many
existing supervised feature selection algorithms use redundancy and relevancy
as the main criteria to select features. However, feature interaction,
potentially a key characteristic in real-world problems, has not received much
attention. As an attempt to take feature interaction into account, we propose
L1-LSMI, an L1-regularization based algorithm that maximizes a squared-loss
variant of mutual information between selected features and outputs. Numerical
results show that L1-LSMI performs well in handling redundancy, detecting
non-linear dependency, and considering feature interaction.Comment: 25 page
Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks
A method for statistical parametric speech synthesis incorporating generative
adversarial networks (GANs) is proposed. Although powerful deep neural networks
(DNNs) techniques can be applied to artificially synthesize speech waveform,
the synthetic speech quality is low compared with that of natural speech. One
of the issues causing the quality degradation is an over-smoothing effect often
observed in the generated speech parameters. A GAN introduced in this paper
consists of two neural networks: a discriminator to distinguish natural and
generated samples, and a generator to deceive the discriminator. In the
proposed framework incorporating the GANs, the discriminator is trained to
distinguish natural and generated speech parameters, while the acoustic models
are trained to minimize the weighted sum of the conventional minimum generation
loss and an adversarial loss for deceiving the discriminator. Since the
objective of the GANs is to minimize the divergence (i.e., distribution
difference) between the natural and generated speech parameters, the proposed
method effectively alleviates the over-smoothing effect on the generated speech
parameters. We evaluated the effectiveness for text-to-speech and voice
conversion, and found that the proposed method can generate more natural
spectral parameters and than conventional minimum generation error
training algorithm regardless its hyper-parameter settings. Furthermore, we
investigated the effect of the divergence of various GANs, and found that a
Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms
of improving synthetic speech quality.Comment: Preprint manuscript of IEEE/ACM Transactions on Audio, Speech and
Language Processin
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Approximated Infomax Early Stopping: Revisiting Gaussian RBMs on Natural Images
We pursue an early stopping technique that helps Gaussian Restricted
Boltzmann Machines (GRBMs) to gain good natural image representations in terms
of overcompleteness and data fitting. GRBMs are widely considered as an
unsuitable model for natural images because they gain non-overcomplete
representations which include uniform filters that do not represent useful
image features. We have recently found that GRBMs once gain and subsequently
lose useful filters during their training, contrary to this common perspective.
We attribute this phenomenon to a tradeoff between overcompleteness of GRBM
representations and data fitting. To gain GRBM representations that are
overcomplete and fit data well, we propose a measure for GRBM representation
quality, approximated mutual information, and an early stopping technique based
on this measure. The proposed method boosts performance of classifiers trained
on GRBM representations.Comment: 9 pages with 1 page appendi
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
Pattern analysis often requires a pre-processing stage for extracting or
selecting features in order to help the classification, prediction, or
clustering stage discriminate or represent the data in a better way. The reason
for this requirement is that the raw data are complex and difficult to process
without extracting or selecting appropriate features beforehand. This paper
reviews theory and motivation of different common methods of feature selection
and extraction and introduces some of their applications. Some numerical
implementations are also shown for these methods. Finally, the methods in
feature selection and extraction are compared.Comment: 14 pages, 1 figure, 2 tables, survey (literature review) pape
Dimensionality Reduction on SPD Manifolds: The Emergence of Geometry-Aware Methods
Representing images and videos with Symmetric Positive Definite (SPD)
matrices, and considering the Riemannian geometry of the resulting space, has
been shown to yield high discriminative power in many visual recognition tasks.
Unfortunately, computation on the Riemannian manifold of SPD matrices
-especially of high-dimensional ones- comes at a high cost that limits the
applicability of existing techniques. In this paper, we introduce algorithms
able to handle high-dimensional SPD matrices by constructing a
lower-dimensional SPD manifold. To this end, we propose to model the mapping
from the high-dimensional SPD manifold to the low-dimensional one with an
orthonormal projection. This lets us formulate dimensionality reduction as the
problem of finding a projection that yields a low-dimensional manifold either
with maximum discriminative power in the supervised scenario, or with maximum
variance of the data in the unsupervised one. We show that learning can be
expressed as an optimization problem on a Grassmann manifold and discuss fast
solutions for special cases. Our evaluation on several classification tasks
evidences that our approach leads to a significant accuracy gain over
state-of-the-art methods.Comment: arXiv admin note: text overlap with arXiv:1407.112
Online Semi-Supervised Learning with Deep Hybrid Boltzmann Machines and Denoising Autoencoders
Two novel deep hybrid architectures, the Deep Hybrid Boltzmann Machine and
the Deep Hybrid Denoising Auto-encoder, are proposed for handling
semi-supervised learning problems. The models combine experts that model
relevant distributions at different levels of abstraction to improve overall
predictive performance on discriminative tasks. Theoretical motivations and
algorithms for joint learning for each are presented. We apply the new models
to the domain of data-streams in work towards life-long learning. The proposed
architectures show improved performance compared to a pseudo-labeled, drop-out
rectifier network
DeepCoder: Semi-parametric Variational Autoencoders for Automatic Facial Action Coding
Human face exhibits an inherent hierarchy in its representations (i.e.,
holistic facial expressions can be encoded via a set of facial action units
(AUs) and their intensity). Variational (deep) auto-encoders (VAE) have shown
great results in unsupervised extraction of hierarchical latent representations
from large amounts of image data, while being robust to noise and other
undesired artifacts. Potentially, this makes VAEs a suitable approach for
learning facial features for AU intensity estimation. Yet, most existing
VAE-based methods apply classifiers learned separately from the encoded
features. By contrast, the non-parametric (probabilistic) approaches, such as
Gaussian Processes (GPs), typically outperform their parametric counterparts,
but cannot deal easily with large amounts of data. To this end, we propose a
novel VAE semi-parametric modeling framework, named DeepCoder, which combines
the modeling power of parametric (convolutional) and nonparametric (ordinal
GPs) VAEs, for joint learning of (1) latent representations at multiple levels
in a task hierarchy1, and (2) classification of multiple ordinal outputs. We
show on benchmark datasets for AU intensity estimation that the proposed
DeepCoder outperforms the state-of-the-art approaches, and related VAEs and
deep learning models.Comment: ICCV 2017 - accepte
Realizing Petabyte Scale Acoustic Modeling
Large scale machine learning (ML) systems such as the Alexa automatic speech
recognition (ASR) system continue to improve with increasing amounts of
manually transcribed training data. Instead of scaling manual transcription to
impractical levels, we utilize semi-supervised learning (SSL) to learn acoustic
models (AM) from the vast firehose of untranscribed audio data. Learning an AM
from 1 Million hours of audio presents unique ML and system design challenges.
We present the design and evaluation of a highly scalable and resource
efficient SSL system for AM. Employing the student/teacher learning paradigm,
we focus on the student learning subsystem: a scalable and robust data pipeline
that generates features and targets from raw audio, and an efficient model
pipeline, including the distributed trainer, that builds a student model. Our
evaluations show that, even without extensive hyper-parameter tuning, we obtain
relative accuracy improvements in the 10 to 20 range, with higher gains in
noisier conditions. The end-to-end processing time of this SSL system was 12
days, and several components in this system can trivially scale linearly with
more compute resources.Comment: 2156-3357 \copyright 2019 IEEE. Personal use is permitted, but
republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications standards/publications/rights/index.html for
more informatio
- …