2,277 research outputs found

    Characteristic Kernels and Infinitely Divisible Distributions

    Full text link
    We connect shift-invariant characteristic kernels to infinitely divisible distributions on Rd\mathbb{R}^{d}. Characteristic kernels play an important role in machine learning applications with their kernel means to distinguish any two probability measures. The contribution of this paper is two-fold. First, we show, using the L\'evy-Khintchine formula, that any shift-invariant kernel given by a bounded, continuous and symmetric probability density function (pdf) of an infinitely divisible distribution on Rd\mathbb{R}^d is characteristic. We also present some closure property of such characteristic kernels under addition, pointwise product, and convolution. Second, in developing various kernel mean algorithms, it is fundamental to compute the following values: (i) kernel mean values mP(x)m_P(x), xXx \in \mathcal{X}, and (ii) kernel mean RKHS inner products mP,mQH{\left\langle m_P, m_Q \right\rangle_{\mathcal{H}}}, for probability measures P,QP, Q. If P,QP, Q, and kernel kk are Gaussians, then computation (i) and (ii) results in Gaussian pdfs that is tractable. We generalize this Gaussian combination to more general cases in the class of infinitely divisible distributions. We then introduce a {\it conjugate} kernel and {\it convolution trick}, so that the above (i) and (ii) have the same pdf form, expecting tractable computation at least in some cases. As specific instances, we explore α\alpha-stable distributions and a rich class of generalized hyperbolic distributions, where the Laplace, Cauchy and Student-t distributions are included

    Towards a Learning Theory of Cause-Effect Inference

    Full text link
    We pose causal inference as the problem of learning to classify probability distributions. In particular, we assume access to a collection {(Si,li)}i=1n\{(S_i,l_i)\}_{i=1}^n, where each SiS_i is a sample drawn from the probability distribution of Xi×YiX_i \times Y_i, and lil_i is a binary label indicating whether "XiYiX_i \to Y_i" or "XiYiX_i \leftarrow Y_i". Given these data, we build a causal inference rule in two steps. First, we featurize each SiS_i using the kernel mean embedding associated with some characteristic kernel. Second, we train a binary classifier on such embeddings to distinguish between causal directions. We present generalization bounds showing the statistical consistency and learning rates of the proposed approach, and provide a simple implementation that achieves state-of-the-art cause-effect inference. Furthermore, we extend our ideas to infer causal relationships between more than two variables