43,686 research outputs found

    Joint asymptotics for semi-nonparametric regression models with partially linear structure

    Full text link
    We consider a joint asymptotic framework for studying semi-nonparametric regression models where (finite-dimensional) Euclidean parameters and (infinite-dimensional) functional parameters are both of interest. The class of models in consideration share a partially linear structure and are estimated in two general contexts: (i) quasi-likelihood and (ii) true likelihood. We first show that the Euclidean estimator and (pointwise) functional estimator, which are re-scaled at different rates, jointly converge to a zero-mean Gaussian vector. This weak convergence result reveals a surprising joint asymptotics phenomenon: these two estimators are asymptotically independent. A major goal of this paper is to gain first-hand insights into the above phenomenon. Moreover, a likelihood ratio testing is proposed for a set of joint local hypotheses, where a new version of the Wilks phenomenon [Ann. Math. Stat. 9 (1938) 60-62; Ann. Statist. 1 (2001) 153-193] is unveiled. A novel technical tool, called a joint Bahadur representation, is developed for studying these joint asymptotics results.Comment: Published at http://dx.doi.org/10.1214/15-AOS1313 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fusing Censored Dependent Data for Distributed Detection

    Full text link
    In this paper, we consider a distributed detection problem for a censoring sensor network where each sensor's communication rate is significantly reduced by transmitting only "informative" observations to the Fusion Center (FC), and censoring those deemed "uninformative". While the independence of data from censoring sensors is often assumed in previous research, we explore spatial dependence among observations. Our focus is on designing the fusion rule under the Neyman-Pearson (NP) framework that takes into account the spatial dependence among observations. Two transmission scenarios are considered, one where uncensored observations are transmitted directly to the FC and second where they are first quantized and then transmitted to further improve transmission efficiency. Copula-based Generalized Likelihood Ratio Test (GLRT) for censored data is proposed with both continuous and discrete messages received at the FC corresponding to different transmission strategies. We address the computational issues of the copula-based GLRTs involving multidimensional integrals by presenting more efficient fusion rules, based on the key idea of injecting controlled noise at the FC before fusion. Although, the signal-to-noise ratio (SNR) is reduced by introducing controlled noise at the receiver, simulation results demonstrate that the resulting noise-aided fusion approach based on adding artificial noise performs very closely to the exact copula-based GLRTs. Copula-based GLRTs and their noise-aided counterparts by exploiting the spatial dependence greatly improve detection performance compared with the fusion rule under independence assumption

    Telling Cause from Effect using MDL-based Local and Global Regression

    Get PDF
    We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables XX and YY from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables. To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a compression scheme to encode local and global functional relations using MDL-based regression. We infer XX causes YY in case it is shorter to describe YY as a function of XX than the inverse direction. In addition, we introduce Slope, an efficient linear-time algorithm that through thorough empirical evaluation on both synthetic and real world data we show outperforms the state of the art by a wide margin.Comment: 10 pages, To appear in ICDM1

    String and Membrane Gaussian Processes

    Full text link
    In this paper we introduce a novel framework for making exact nonparametric Bayesian inference on latent functions, that is particularly suitable for Big Data tasks. Firstly, we introduce a class of stochastic processes we refer to as string Gaussian processes (string GPs), which are not to be mistaken for Gaussian processes operating on text. We construct string GPs so that their finite-dimensional marginals exhibit suitable local conditional independence structures, which allow for scalable, distributed, and flexible nonparametric Bayesian inference, without resorting to approximations, and while ensuring some mild global regularity constraints. Furthermore, string GP priors naturally cope with heterogeneous input data, and the gradient of the learned latent function is readily available for explanatory analysis. Secondly, we provide some theoretical results relating our approach to the standard GP paradigm. In particular, we prove that some string GPs are Gaussian processes, which provides a complementary global perspective on our framework. Finally, we derive a scalable and distributed MCMC scheme for supervised learning tasks under string GP priors. The proposed MCMC scheme has computational time complexity O(N)\mathcal{O}(N) and memory requirement O(dN)\mathcal{O}(dN), where NN is the data size and dd the dimension of the input space. We illustrate the efficacy of the proposed approach on several synthetic and real-world datasets, including a dataset with 66 millions input points and 88 attributes.Comment: To appear in the Journal of Machine Learning Research (JMLR), Volume 1

    Brownian distance covariance

    Full text link
    Distance correlation is a new class of multivariate dependence coefficients applicable to random vectors of arbitrary and not necessarily equal dimension. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but generalize and extend these classical bivariate measures of dependence. Distance correlation characterizes independence: it is zero if and only if the random vectors are independent. The notion of covariance with respect to a stochastic process is introduced, and it is shown that population distance covariance coincides with the covariance with respect to Brownian motion; thus, both can be called Brownian distance covariance. In the bivariate case, Brownian covariance is the natural extension of product-moment covariance, as we obtain Pearson product-moment covariance by replacing the Brownian motion in the definition with identity. The corresponding statistic has an elegantly simple computing formula. Advantages of applying Brownian covariance and correlation vs the classical Pearson covariance and correlation are discussed and illustrated.Comment: This paper discussed in: [arXiv:0912.3295], [arXiv:1010.0822], [arXiv:1010.0825], [arXiv:1010.0828], [arXiv:1010.0836], [arXiv:1010.0838], [arXiv:1010.0839]. Rejoinder at [arXiv:1010.0844]. Published in at http://dx.doi.org/10.1214/09-AOAS312 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore