1,952 research outputs found
Multitaper estimation on arbitrary domains
Multitaper estimators have enjoyed significant success in estimating spectral
densities from finite samples using as tapers Slepian functions defined on the
acquisition domain. Unfortunately, the numerical calculation of these Slepian
tapers is only tractable for certain symmetric domains, such as rectangles or
disks. In addition, no performance bounds are currently available for the mean
squared error of the spectral density estimate. This situation is inadequate
for applications such as cryo-electron microscopy, where noise models must be
estimated from irregular domains with small sample sizes. We show that the
multitaper estimator only depends on the linear space spanned by the tapers. As
a result, Slepian tapers may be replaced by proxy tapers spanning the same
subspace (validating the common practice of using partially converged solutions
to the Slepian eigenproblem as tapers). These proxies may consequently be
calculated using standard numerical algorithms for block diagonalization. We
also prove a set of performance bounds for multitaper estimators on arbitrary
domains. The method is demonstrated on synthetic and experimental datasets from
cryo-electron microscopy, where it reduces mean squared error by a factor of
two or more compared to traditional methods.Comment: 28 pages, 11 figure
Capturing the zero: a new class of zero-augmented distributions and multiplicative error processes
We propose a novel approach to model serially dependent positive-valued variables which realize a non-trivial proportion of zero outcomes. This is a typical phenomenon in financial time series observed on high frequencies, such as cumulated trading volumes or the time between potentially simultaneously occurring market events. We introduce a flexible pointmass mixture distribution and develop a semiparametric specification test explicitly tailored for such distributions. Moreover, we propose a new type of multiplicative error model (MEM) based on a zero-augmented distribution, which incorporates an autoregressive binary choice component and thus captures the (potentially different) dynamics of both zero occurrences and of strictly positive realizations. Applying the proposed model to high-frequency cumulated trading volumes of liquid NYSE stocks, we show that the model captures both the dynamic and distribution properties of the data very well and is able to correctly predict future distributions. Keywords: High-frequency Data , Point-mass Mixture , Multiplicative Error Model , Excess Zeros , Semiparametric Specification Test , Market Microstructure JEL Classification: C22, C25, C14, C16, C5
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
- …