1,860 research outputs found
PLDA-Based Diarization of Telephone Conversations
This paper investigates the application of the probabilistic linear
discriminant analysis (PLDA) to speaker diarization of telephone conversations.
We introduce using a variational Bayes (VB) approach for inference under a PLDA
model for modeling segmental i-vectors in speaker diarization. Deterministic
annealing (DA) algorithm is imposed in order to avoid local optimal solutions
in VB iterations. We compare our proposed system with a well-known system that
applies k-means clustering on principal component analysis (PCA) coefficients
of segmental i-vectors. We used summed channel telephone data from the National
Institute of Standards and Technology (NIST) 2008 Speaker Recognition
Evaluation (SRE) as the test set in order to evaluate the performance of the
proposed system. We achieve about 20% relative improvement in Diarization Error
Rate (DER) compared to the baseline system
Stochastic Annealing for Variational Inference
We empirically evaluate a stochastic annealing strategy for Bayesian
posterior optimization with variational inference. Variational inference is a
deterministic approach to approximate posterior inference in Bayesian models in
which a typically non-convex objective function is locally optimized over the
parameters of the approximating distribution. We investigate an annealing
method for optimizing this objective with the aim of finding a better local
optimal solution and compare with deterministic annealing methods and no
annealing. We show that stochastic annealing can provide clear improvement on
the GMM and HMM, while performance on LDA tends to favor deterministic
annealing methods
A General Method for Amortizing Variational Filtering
We introduce the variational filtering EM algorithm, a simple,
general-purpose method for performing variational inference in dynamical latent
variable models using information from only past and present variables, i.e.
filtering. The algorithm is derived from the variational objective in the
filtering setting and consists of an optimization procedure at each time step.
By performing each inference optimization procedure with an iterative amortized
inference model, we obtain a computationally efficient implementation of the
algorithm, which we call amortized variational filtering. We present
experiments demonstrating that this general-purpose method improves performance
across several deep dynamical latent variable models.Comment: Advances in Neural Information Processing Systems (NIPS) 201
A Bayesian Model for Generative Transition-based Dependency Parsing
We propose a simple, scalable, fully generative model for transition-based
dependency parsing with high accuracy. The model, parameterized by Hierarchical
Pitman-Yor Processes, overcomes the limitations of previous generative models
by allowing fast and accurate inference. We propose an efficient decoding
algorithm based on particle filtering that can adapt the beam size to the
uncertainty in the model while jointly predicting POS tags and parse trees. The
UAS of the parser is on par with that of a greedy discriminative baseline. As a
language model, it obtains better perplexity than a n-gram model by performing
semi-supervised learning over a large unlabelled corpus. We show that the model
is able to generate locally and syntactically coherent sentences, opening the
door to further applications in language generation.Comment: Depling 201
A State-Space Approach to Dynamic Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) has been actively investigated and
used in a wide range of problems in the past decade. A significant amount of
attention has been given to develop NMF algorithms that are suitable to model
time series with strong temporal dependencies. In this paper, we propose a
novel state-space approach to perform dynamic NMF (D-NMF). In the proposed
probabilistic framework, the NMF coefficients act as the state variables and
their dynamics are modeled using a multi-lag nonnegative vector autoregressive
(N-VAR) model within the process equation. We use expectation maximization and
propose a maximum-likelihood estimation framework to estimate the basis matrix
and the N-VAR model parameters. Interestingly, the N-VAR model parameters are
obtained by simply applying NMF. Moreover, we derive a maximum a posteriori
estimate of the state variables (i.e., the NMF coefficients) that is based on a
prediction step and an update step, similarly to the Kalman filter. We
illustrate the benefits of the proposed approach using different numerical
simulations where D-NMF significantly outperforms its static counterpart.
Experimental results for three different applications show that the proposed
approach outperforms two state-of-the-art NMF approaches that exploit temporal
dependencies, namely a nonnegative hidden Markov model and a frame stacking
approach, while it requires less memory and computational power
Noise Benefits in Expectation-Maximization Algorithms
This dissertation shows that careful injection of noise into sample data can
substantially speed up Expectation-Maximization algorithms.
Expectation-Maximization algorithms are a class of iterative algorithms for
extracting maximum likelihood estimates from corrupted or incomplete data. The
convergence speed-up is an example of a noise benefit or "stochastic resonance"
in statistical signal processing. The dissertation presents derivations of
sufficient conditions for such noise-benefits and demonstrates the speed-up in
some ubiquitous signal-processing algorithms. These algorithms include
parameter estimation for mixture models, the -means clustering algorithm,
the Baum-Welch algorithm for training hidden Markov models, and backpropagation
for training feedforward artificial neural networks. This dissertation also
analyses the effects of data and model corruption on the more general Bayesian
inference estimation framework. The main finding is a theorem guaranteeing that
uniform approximators for Bayesian model functions produce uniform
approximators for the posterior pdf via Bayes theorem. This result also applies
to hierarchical and multidimensional Bayesian models.Comment: A Dissertation Presented to The Faculty of The USC Graduate School
University of Southern California In Partial Fulfillment of the Requirements
for the Degree Doctor of Philosophy (Electrical Engineering) August 2013.
(252 pages, 45 figures), Online:
http://digitallibrary.usc.edu/cdm/ref/collection/p15799coll3/id/29434
Particle Filtering for PLCA model with Application to Music Transcription
Automatic Music Transcription (AMT) consists in automatically estimating the
notes in an audio recording, through three attributes: onset time, duration and
pitch. Probabilistic Latent Component Analysis (PLCA) has become very popular
for this task. PLCA is a spectrogram factorization method, able to model a
magnitude spectrogram as a linear combination of spectral vectors from a
dictionary. Such methods use the Expectation-Maximization (EM) algorithm to
estimate the parameters of the acoustic model. This algorithm presents
well-known inherent defaults (local convergence, initialization dependency),
making EM-based systems limited in their applications to AMT, particularly in
regards to the mathematical form and number of priors. To overcome such limits,
we propose in this paper to employ a different estimation framework based on
Particle Filtering (PF), which consists in sampling the posterior distribution
over larger parameter ranges. This framework proves to be more robust in
parameter estimation, more flexible and unifying in the integration of prior
knowledge in the system. Note-level transcription accuracies of 61.8 and
59.5 were achieved on evaluation sound datasets of two different
instrument repertoires, including the classical piano (from MAPS dataset) and
the marovany zither, and direct comparisons to previous PLCA-based approaches
are provided. Steps for further development are also outlined
Deep Rewiring: Training very sparse deep networks
Neuromorphic hardware tends to pose limits on the connectivity of deep
networks that one can run on them. But also generic hardware and software
implementations of deep learning run more efficiently for sparse networks.
Several methods exist for pruning connections of a neural network after it was
trained without connectivity constraints. We present an algorithm, DEEP R, that
enables us to train directly a sparsely connected neural network. DEEP R
automatically rewires the network during supervised training so that
connections are there where they are most needed for the task, while its total
number is all the time strictly bounded. We demonstrate that DEEP R can be used
to train very sparse feedforward and recurrent neural networks on standard
benchmark tasks with just a minor loss in performance. DEEP R is based on a
rigorous theoretical foundation that views rewiring as stochastic sampling of
network configurations from a posterior.Comment: Accepted for publication at ICLR 2018. 10 pages (12 with references,
24 with appendix), 4 Figures in the main text. Reviews are available at:
https://openreview.net/forum?id=BJ_wN01C- . This recent version contains
minor corrections in the appendi
Vehicular Edge Computing via Deep Reinforcement Learning
The smart vehicles construct Vehicle of Internet which can execute various
intelligent services. Although the computation capability of the vehicle is
limited, multi-type of edge computing nodes provide heterogeneous resources for
vehicular services.When offloading the complicated service to the vehicular
edge computing node, the decision should consider numerous factors.The
offloading decision work mostly formulate the decision to a resource scheduling
problem with single or multiple objective function and some constraints, and
explore customized heuristics algorithms. However, offloading multiple data
dependency tasks in a service is a difficult decision, as an optimal solution
must understand the resource requirement, the access network, the user
mobility, and importantly the data dependency. Inspired by recent advances in
machine learning, we propose a knowledge driven (KD) service offloading
decision framework for Vehicle of Internet, which provides the optimal policy
directly from the environment. We formulate the offloading decision of
multi-task in a service as a long-term planning problem, and explores the
recent deep reinforcement learning to obtain the optimal solution. It considers
the future data dependency of the following tasks when making decision for a
current task from the learned offloading knowledge. Moreover, the framework
supports the pre-training at the powerful edge computing node and continually
online learning when the vehicular service is executed, so that it can adapt
the environment changes and learns policy that are sensible in hindsight. The
simulation results show that KD service offloading decision converges quickly,
adapts to different conditions, and outperforms the greedy offloading decision
algorithm.Comment: Preliminary report of ongoing wor
A Truncated EM Approach for Spike-and-Slab Sparse Coding
We study inference and learning based on a sparse coding model with
`spike-and-slab' prior. As in standard sparse coding, the model used assumes
independent latent sources that linearly combine to generate data points.
However, instead of using a standard sparse prior such as a Laplace
distribution, we study the application of a more flexible `spike-and-slab'
distribution which models the absence or presence of a source's contribution
independently of its strength if it contributes. We investigate two approaches
to optimize the parameters of spike-and-slab sparse coding: a novel truncated
EM approach and, for comparison, an approach based on standard factored
variational distributions. The truncated approach can be regarded as a
variational approach with truncated posteriors as variational distributions. In
applications to source separation we find that both approaches improve the
state-of-the-art in a number of standard benchmarks, which argues for the use
of `spike-and-slab' priors for the corresponding data domains. Furthermore, we
find that the truncated EM approach improves on the standard factored approach
in source separation taskswhich hints to biases introduced by assuming
posterior independence in the factored variational approach. Likewise, on a
standard benchmark for image denoising, we find that the truncated EM approach
improves on the factored variational approach. While the performance of the
factored approach saturates with increasing numbers of hidden dimensions, the
performance of the truncated approach improves the state-of-the-art for higher
noise levels.Comment: To appear in JMLR (2014
- …