43,686 research outputs found
Joint asymptotics for semi-nonparametric regression models with partially linear structure
We consider a joint asymptotic framework for studying semi-nonparametric
regression models where (finite-dimensional) Euclidean parameters and
(infinite-dimensional) functional parameters are both of interest. The class of
models in consideration share a partially linear structure and are estimated in
two general contexts: (i) quasi-likelihood and (ii) true likelihood. We first
show that the Euclidean estimator and (pointwise) functional estimator, which
are re-scaled at different rates, jointly converge to a zero-mean Gaussian
vector. This weak convergence result reveals a surprising joint asymptotics
phenomenon: these two estimators are asymptotically independent. A major goal
of this paper is to gain first-hand insights into the above phenomenon.
Moreover, a likelihood ratio testing is proposed for a set of joint local
hypotheses, where a new version of the Wilks phenomenon [Ann. Math. Stat. 9
(1938) 60-62; Ann. Statist. 1 (2001) 153-193] is unveiled. A novel technical
tool, called a joint Bahadur representation, is developed for studying these
joint asymptotics results.Comment: Published at http://dx.doi.org/10.1214/15-AOS1313 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Fusing Censored Dependent Data for Distributed Detection
In this paper, we consider a distributed detection problem for a censoring
sensor network where each sensor's communication rate is significantly reduced
by transmitting only "informative" observations to the Fusion Center (FC), and
censoring those deemed "uninformative". While the independence of data from
censoring sensors is often assumed in previous research, we explore spatial
dependence among observations. Our focus is on designing the fusion rule under
the Neyman-Pearson (NP) framework that takes into account the spatial
dependence among observations. Two transmission scenarios are considered, one
where uncensored observations are transmitted directly to the FC and second
where they are first quantized and then transmitted to further improve
transmission efficiency. Copula-based Generalized Likelihood Ratio Test (GLRT)
for censored data is proposed with both continuous and discrete messages
received at the FC corresponding to different transmission strategies. We
address the computational issues of the copula-based GLRTs involving
multidimensional integrals by presenting more efficient fusion rules, based on
the key idea of injecting controlled noise at the FC before fusion. Although,
the signal-to-noise ratio (SNR) is reduced by introducing controlled noise at
the receiver, simulation results demonstrate that the resulting noise-aided
fusion approach based on adding artificial noise performs very closely to the
exact copula-based GLRTs. Copula-based GLRTs and their noise-aided counterparts
by exploiting the spatial dependence greatly improve detection performance
compared with the fusion rule under independence assumption
Telling Cause from Effect using MDL-based Local and Global Regression
We consider the fundamental problem of inferring the causal direction between
two univariate numeric random variables and from observational data.
The two-variable case is especially difficult to solve since it is not possible
to use standard conditional independence tests between the variables.
To tackle this problem, we follow an information theoretic approach based on
Kolmogorov complexity and use the Minimum Description Length (MDL) principle to
provide a practical solution. In particular, we propose a compression scheme to
encode local and global functional relations using MDL-based regression. We
infer causes in case it is shorter to describe as a function of
than the inverse direction. In addition, we introduce Slope, an efficient
linear-time algorithm that through thorough empirical evaluation on both
synthetic and real world data we show outperforms the state of the art by a
wide margin.Comment: 10 pages, To appear in ICDM1
String and Membrane Gaussian Processes
In this paper we introduce a novel framework for making exact nonparametric
Bayesian inference on latent functions, that is particularly suitable for Big
Data tasks. Firstly, we introduce a class of stochastic processes we refer to
as string Gaussian processes (string GPs), which are not to be mistaken for
Gaussian processes operating on text. We construct string GPs so that their
finite-dimensional marginals exhibit suitable local conditional independence
structures, which allow for scalable, distributed, and flexible nonparametric
Bayesian inference, without resorting to approximations, and while ensuring
some mild global regularity constraints. Furthermore, string GP priors
naturally cope with heterogeneous input data, and the gradient of the learned
latent function is readily available for explanatory analysis. Secondly, we
provide some theoretical results relating our approach to the standard GP
paradigm. In particular, we prove that some string GPs are Gaussian processes,
which provides a complementary global perspective on our framework. Finally, we
derive a scalable and distributed MCMC scheme for supervised learning tasks
under string GP priors. The proposed MCMC scheme has computational time
complexity and memory requirement , where
is the data size and the dimension of the input space. We illustrate the
efficacy of the proposed approach on several synthetic and real-world datasets,
including a dataset with millions input points and attributes.Comment: To appear in the Journal of Machine Learning Research (JMLR), Volume
1
Brownian distance covariance
Distance correlation is a new class of multivariate dependence coefficients
applicable to random vectors of arbitrary and not necessarily equal dimension.
Distance covariance and distance correlation are analogous to product-moment
covariance and correlation, but generalize and extend these classical bivariate
measures of dependence. Distance correlation characterizes independence: it is
zero if and only if the random vectors are independent. The notion of
covariance with respect to a stochastic process is introduced, and it is shown
that population distance covariance coincides with the covariance with respect
to Brownian motion; thus, both can be called Brownian distance covariance. In
the bivariate case, Brownian covariance is the natural extension of
product-moment covariance, as we obtain Pearson product-moment covariance by
replacing the Brownian motion in the definition with identity. The
corresponding statistic has an elegantly simple computing formula. Advantages
of applying Brownian covariance and correlation vs the classical Pearson
covariance and correlation are discussed and illustrated.Comment: This paper discussed in: [arXiv:0912.3295], [arXiv:1010.0822],
[arXiv:1010.0825], [arXiv:1010.0828], [arXiv:1010.0836], [arXiv:1010.0838],
[arXiv:1010.0839]. Rejoinder at [arXiv:1010.0844]. Published in at
http://dx.doi.org/10.1214/09-AOAS312 the Annals of Applied Statistics
(http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics
(http://www.imstat.org
- …