Search CORE

171 research outputs found

Modelling function-valued processes with complex structure

Author: Konzen Evandro
Publication venue: Newcastle University
Publication date: 01/01/2019
Field of study

PhD ThesisExisting approaches to functional principal component analysis (FPCA) usually rely on nonparametric estimation of the covariance structure. When function-valued processes are observed on a multidimensional domain, the nonparametric estimation suffers from the curse of dimensionality, forcing FPCA methods to make restrictive assumptions such as covariance separability. In this thesis, we discuss a general Bayesian framework on modelling function-valued processes by using a Gaussian process (GP) as a prior, enabling us to handle nonseparable and/or nonstationary covariance structure. The nonstationarity is introduced by a convolution-based approach through a varying kernel, whose parameters vary along the input space and are estimated via a local empirical Bayesian method. For the varying anisotropy matrix, we propose to use a spherical parametrisation, leading to unconstrained and interpretable parameters and allowing for interaction between coordinate directions in the covariance function. The unconstrained nature allows the parameters to be modelled as a nonparametric function of time, spatial location and even additional covariates. In the spirit of FPCA, the Bayesian framework can decompose the function-valued processes using the eigenvalues and eigensurfaces calculated from the estimated covariance structure. A finite number of the eigensurfaces can be used to extract some of the most important information involved in data with complex covariance structure. We also extend the methods to handle multivariate function-valued processes. The estimated covariance structure is shown to be important to analyse joint variation in the data and is further used in our proposed multiple functional partial least squares regression model. We show that the interaction between the scalar response variable and function-valued covariates can be explained by fewer terms than in a regression model which uses multivariate functional principal components. Simulation studies and applications to real data show that our proposed approaches provide new insights into the data and excellent prediction results

Newcastle University eTheses

Theoretical Analysis of Nonparametric Filament Estimation

Author: Polonik Wolfgang
Qiao Wanli
Publication venue
Publication date: 24/10/2015
Field of study

This paper provides a rigorous study of the nonparametric estimation of filaments or ridge lines of a probability density

f

. Points on the filament are considered as local extrema of the density when traversing the support of

f

along the integral curve driven by the vector field of second eigenvectors of the Hessian of

f

. We `parametrize' points on the filaments by such integral curves, and thus both the estimation of integral curves and of filaments will be considered via a plug-in method using kernel density estimation. We establish rates of convergence and asymptotic distribution results for the estimation of both the integral curves and the filaments. The main theoretical result establishes the asymptotic distribution of the uniform deviation of the estimated filament from its theoretical counterpart. This result utilizes the extreme value behavior of non-stationary Gaussian processes indexed by manifolds

M_h, h \in(0,1]

h \to 0

.Comment: 55 pages, 1 figur

arXiv.org e-Print Archive

eScholarship - University of California

Mini-Workshop: Semiparametric Modelling of Multivariate Economic Time Series With Changing Dynamics

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2010
Field of study

Modelling multivariate time series of possibly high dimension calls for appropriate dimension-reduction, e.g. by some factor modelling, additive modelling, or some simplified parametric structure for the dynamics (i.e. the serial dependence) of the time series. This workshop aimed to bring together experts in this field in order to discuss recent methodology for multivariate time series dynamics which are changing over time: by an abrupt switch between two (or more) different regimes or rather smoothly evolving over time. The emphasis has been on mathematical methods for semiparametric modelling and estimation, where ”semiparametric” is to be understood in a rather broad sense: parametric models where the parameters are themselves nonparametric functions (of time), regime-switching nonparametric models with a parametric specification of the transition mechanism, and alike. An ultimate goal of these models to be applied to economic and financial time series is prediction. Another emphasis has been on comparing Bayesian with frequentist approaches, and to cover both theoretical aspects of estimation, such as consistency and efficiency, and computational aspects

Repositorium für Naturwissenschaften und Technik

Challenges in Statistical Theory: Complex Data Structures and Algorithmic Optimization

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2009
Field of study

Technological developments have created a constant incoming stream of complex new data structures that need analysis. Modern statistics therefore means mathematically sophisticated new statistical theory that generates or supports innovative data-analytic methodologies for complex data structures. Inherent in many of these methodologies are challenging numerical optimization methods. The proposed workshop intends to bring together experts from mathematical statistics as well as statisticians involved in serious modern applications and computing. The primary goal of this meeting was to advance the mathematical and methodological underpinnings of modern statistics for complex data. Particular focus was given to the advancement of theory and methods under non-stationarity and complex dependence structures including (multivariate) ﬁnancial time series, scientiﬁc data analysis in neurosciences and bio-physics, estimation under shape constraints, and highdimensional discrimination/classiﬁcation

Repositorium für Naturwissenschaften und Technik

Recommended from our members

Geometric Sparsity in High Dimension

Author: Kaslovsky Daniel N.
Publication venue: CU Scholar
Publication date: 01/01/2012
Field of study

While typically complex and high-dimensional, modern data sets often have a concise underlying structure. This thesis explores the sparsity inherent in the geometric structure of many high-dimensional data sets. Constructing an efficient parametrization of a large data set of points lying close to a smooth manifold in high dimension remains a fundamental problem. One approach, guided by geometry, consists in recovering a local parametrization (a chart) using the local tangent plane. In practice, the data are noisy and the estimation of a low-dimensional tangent plane in high dimension becomes ill posed. Principal component analysis (PCA) is often the tool of choice, as it returns an optimal basis in the case of noise-free samples from a linear subspace. To process noisy data, PCA must be applied locally, at a scale small enough such that the manifold is approximately linear, but at a scale large enough such that structure may be discerned from noise. We present an approach that uses the geometry of the data to guide our definition of locality, discovering the optimal balance of this noise-curvature trade-off. Using eigenspace perturbation theory, we study the stability of the subspace estimated by PCA as a function of scale, and bound (with high probability) the angle it forms with the true tangent space. By adaptively selecting the scale that minimizes this bound, our analysis reveals the optimal scale for local tangent plane recovery. Additionally, we are able to accurately and efficiently estimate the curvature of the local neighborhood, and we introduce a geometric uncertainty principle quantifying the limits of noise-curvature perturbation for tangent plane recovery. An algorithm for partitioning a noisy data set is then studied, yielding an appropriate scale for practical tangent plane estimation. Next, we study the interaction of sparsity, scale, and noise from a signal decomposition perspective. Empirical Mode Decomposition is a time-frequency analysis tool for nonstationary data that adaptively defines modes based on the intrinsic frequency scales of a signal. A novel understanding of the scales at which noise corrupts the otherwise sparse frequency decomposition is presented. The thesis concludes with a discussion of future work, including applications to image processing and the continued development of sparse representation from a geometric perspective

CU Scholar Institutional Repository

A computational framework for infinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet flow inverse problems

Author: Ghattas Omar
Martin James
Petra Noemi
Stadler Georg
Publication venue
Publication date: 01/01/2014
Field of study

We address the numerical solution of infinite-dimensional inverse problems in the framework of Bayesian inference. In the Part I companion to this paper (arXiv.org:1308.1313), we considered the linearized infinite-dimensional inverse problem. Here in Part II, we relax the linearization assumption and consider the fully nonlinear infinite-dimensional inverse problem using a Markov chain Monte Carlo (MCMC) sampling method. To address the challenges of sampling high-dimensional pdfs arising from Bayesian inverse problems governed by PDEs, we build on the stochastic Newton MCMC method. This method exploits problem structure by taking as a proposal density a local Gaussian approximation of the posterior pdf, whose construction is made tractable by invoking a low-rank approximation of its data misfit component of the Hessian. Here we introduce an approximation of the stochastic Newton proposal in which we compute the low-rank-based Hessian at just the MAP point, and then reuse this Hessian at each MCMC step. We compare the performance of the proposed method to the original stochastic Newton MCMC method and to an independence sampler. The comparison of the three methods is conducted on a synthetic ice sheet inverse problem. For this problem, the stochastic Newton MCMC method with a MAP-based Hessian converges at least as rapidly as the original stochastic Newton MCMC method, but is far cheaper since it avoids recomputing the Hessian at each step. On the other hand, it is more expensive per sample than the independence sampler; however, its convergence is significantly more rapid, and thus overall it is much cheaper. Finally, we present extensive analysis and interpretation of the posterior distribution, and classify directions in parameter space based on the extent to which they are informed by the prior or the observations.Comment: 31 page

arXiv.org e-Print Archive

CiteSeerX

Non-Negative Matrix Factorization Based Algorithms to Cluster Frequency Basis Functions for Monaural Sound Source Separation.

Author: Jaiswal Amit
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2013
Field of study

Monophonic sound source separation (SSS) refers to a process that separates out audio signals produced from the individual sound sources in a given acoustic mixture, when the mixture signal is recorded using one microphone or is directly recorded onto one reproduction channel. Many audio applications such as pitch modification and automatic music transcription would benefit from the availability of segregated sound sources from the mixture of audio signals for further processing. Recently, Non-negative matrix factorization (NMF) has found application in monaural audio source separation due to its ability to factorize audio spectrograms into additive part-based basis functions, where the parts typically correspond to individual notes or chords in music. An advantage of NMF is that there can be a single basis function for each note played by a given instrument, thereby capturing changes in timbre with pitch for each instrument or source. However, these basis functions need to be clustered to their respective sources for the reconstruction of the individual source signals. Many clustering methods have been proposed to map the separated signals into sources with considerable success. Recently, to avoid the need of clustering, Shifted NMF (SNMF) was proposed, which assumes that the timbre of a note is constant for all the pitches produced by an instrument. SNMF has two drawbacks. Firstly, the assumption that the timbre of the notes played by an instrument remains constant, is not true in general. Secondly, the SNMF method uses the Constant Q transform (CQT) and the lack of a true inverse of the CQT results in compromising on separation quality of the reconstructed signal. The principal aim of this thesis is to attempt to solve the problem of clustering NMF basis functions. Our first major contribution is the use of SNMF as a method of clustering the basis functions obtained via standard NMF. The proposed SNMF clustering method aims to cluster the frequency basis functions obtained via standard NMF to their respective sources by making use of shift invariance in a log-frequency domain. Further, a minor contribution is made by improving the separation performance of the standard SNMF algorithm (here used directly to separate sources) obtained through the use of an improved inverse CQT. Here, the standard SNMF algorithm finds shift-invariance in a CQ spectrogram, that contain the frequency basis functions, obtained directly from the spectrogram of the audio mixture. Our next contribution is an improvement in the SNMF clustering algorithm through the incorporation of the CQT matrix inside the SNMF model in order to avoid the need of an inverse CQT to reconstruct the clustered NMF basis unctions. Another major contribution deals with the incorporation of a constraint called group sparsity (GS) into the SNMF clustering algorithm at two stages to improve clustering. The effect of the GS is evaluated on various SNMF clustering algorithms proposed in this thesis. Finally, we have introduced a new family of masks to reconstruct the original signal from the clustered basis functions and compared their performance to the generalized Wiener filter masks using three different factorisation-based separation algorithms. We show that better separation performance can be achieved by using the proposed family of masks

CiteSeerX

Arrow@TUDublin