Search CORE

1,803 research outputs found

A sticky HDP-HMM with application to speaker diarization

Author: Fox Emily B.
Jordan Michael I.
Sudderth Erik B.
Willsky Alan S.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

DSpace@MIT

ScholarlyCommons@Penn

Deep Learning for Single Image Super-Resolution: A Brief Review

Author: Liao Q
Tian Y
Wang W
Xue J-H
Yang W
Zhang X
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/07/2019
Field of study

Single image super-resolution (SISR) is a notoriously challenging ill-posed problem, which aims to obtain a high-resolution (HR) output from one of its low-resolution (LR) versions. To solve the SISR problem, recently powerful deep learning algorithms have been employed and achieved the state-of-the-art performance. In this survey, we review representative deep learning-based SISR methods, and group them into two categories according to their major contributions to two essential aspects of SISR: the exploration of efficient neural network architectures for SISR, and the development of effective optimization objectives for deep SISR learning. For each category, a baseline is firstly established and several critical limitations of the baseline are summarized. Then representative works on overcoming these limitations are presented based on their original contents as well as our critical understandings and analyses, and relevant comparisons are conducted from a variety of perspectives. Finally we conclude this review with some vital current challenges and future trends in SISR leveraging deep learning algorithms.Comment: Accepted by IEEE Transactions on Multimedia (TMM

arXiv.org e-Print Archive

UCL Discovery

Regularized adaptive long autoregressive spectral analysis

Author: Desodt G.
Giovannelli J. -F.
Idier J.
Muller D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

This paper is devoted to adaptive long autoregressive spectral analysis when (i) very few data are available, (ii) information does exist beforehand concerning the spectral smoothness and time continuity of the analyzed signals. The contribution is founded on two papers by Kitagawa and Gersch. The first one deals with spectral smoothness, in the regularization framework, while the second one is devoted to time continuity, in the Kalman formalism. The present paper proposes an original synthesis of the two contributions: a new regularized criterion is introduced that takes both information into account. The criterion is efficiently optimized by a Kalman smoother. One of the major features of the method is that it is entirely unsupervised: the problem of automatically adjusting the hyperparameters that balance data-based versus prior-based information is solved by maximum likelihood. The improvement is quantified in the field of meteorological radar

arXiv.org e-Print Archive

CiteSeerX

Crossref

Nonparametric Guidance of Autoencoder Representations using Label Information

Author: Adams Ryan Prescott
Larochelle Hugo
Snoek Jasper
Publication venue
Publication date: 13/02/2014
Field of study

While unsupervised learning has long been useful for density modeling, exploratory data analysis and visualization, it has become increasingly important for discovering features that will later be used for discriminative tasks. Discriminative algorithms often work best with highly-informative features; remarkably, such features can often be learned without the labels. One particularly effective way to perform such unsupervised learning has been to use autoencoder neural networks, which find latent representations that are constrained but nevertheless informative for reconstruction. However, pure unsupervised learning with autoencoders can find representations that may or may not be useful for the ultimate discriminative task. It is a continuing challenge to guide the training of an autoencoder so that it finds features which will be useful for predicting labels. Similarly, we often have a priori information regarding what statistical variation will be irrelevant to the ultimate discriminative task, and we would like to be able to use this for guidance as well. Although a typical strategy would be to include a parametric discriminative model as part of the autoencoder training, here we propose a nonparametric approach that uses a Gaussian process to guide the representation. By using a nonparametric model, we can ensure that a useful discriminative function exists for a given set of features, without explicitly instantiating it. We demonstrate the superiority of this guidance mechanism on four data sets, including a real-world application to rehabilitation research. We also show how our proposed approach can learn to explicitly ignore statistically significant covariate information that is label-irrelevant, by evaluating on the small NORB image recognition problem in which pose and lighting labels are available.Engineering and Applied Science

CiteSeerX

Harvard University - DASH

Recommended from our members

Composing Deep Learning and Bayesian Nonparametric Methods

Author: Zhang Aonan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Recent progress in Bayesian methods largely focus on non-conjugate models featured with extensive use of black-box functions: continuous functions implemented with neural networks. Using deep neural networks, Bayesian models can reasonably fit big data while at the same time capturing model uncertainty. This thesis targets at a more challenging problem: how do we model general random objects, including discrete ones, using random functions? Our conclusion is: many (discrete) random objects are in nature a composition of Poisson processes and random functions}. Thus, all discreteness is handled through the Poisson process while random functions captures the rest complexities of the object. Thus the title: composing deep learning and Bayesian nonparametric methods. This conclusion is not a conjecture. In spacial cases such as latent feature models , we can prove this claim by working on infinite dimensional spaces, and that is how Bayesian nonparametric kicks in. Moreover, we will assume some regularity assumptions on random objects such as exchangeability. Then the representations will show up magically using representation theorems. We will see this two times throughout this thesis. One may ask: when a random object is too simple, such as a non-negative random vector in the case of latent feature models, how can we exploit exchangeability? The answer is to aggregate infinite random objects and map them altogether onto an infinite dimensional space. And then assume exchangeability on the infinite dimensional space. We demonstrate two examples of latent feature models by (1) concatenating them as an infinite sequence (Section 2,3) and (2) stacking them as a 2d array (Section 4). Besides, we will see that Bayesian nonparametric methods are useful to model discrete patterns in time series data. We will showcase two examples: (1) using variance Gamma processes to model change points (Section 5), and (2) using Chinese restaurant processes to model speech with switching speakers (Section 6). We also aware that the inference problem can be non-trivial in popular Bayesian nonparametric models. In Section 7, we find a novel solution of online inference for the popular HDP-HMM model

Columbia University Academic Commons