research

Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem

Abstract

The analysis of incomplete data is a long-standing challenge in practical statistics. When, as is typical, data objects are represented by points in R^d , incomplete data objects correspond to affine subspaces (lines or Δ-flats).With this motivation we study the problem of finding the minimum intersection radius r(L) of a set of lines or Δ-flats L: the least r such that there is a ball of radius r intersecting every flat in L. Known algorithms for finding the minimum enclosing ball for a point set (or clustering by several balls) do not easily extend to higher dimensional flats, primarily because “distances” between flats do not satisfy the triangle inequality. In this paper we show how to restore geometry (i.e., a substitute for the triangle inequality) to the problem, through a new analog of Helly’s theorem. This “intrinsic-dimension” Helly theorem states: for any family L of Δ-dimensional convex sets in a Hilbert space, there exist Δ + 2 sets L' ⊆ L such that r(L) ≤ 2r(L'). Based upon this we present an algorithm that computes a (1+ε)-core set L' ⊆ L, |L'| = O(Δ^4/ε), such that the ball centered at a point c with radius (1 +ε)r(L') intersects every element of L. The running time of the algorithm is O(n^(Δ+1)dpoly(Δ/ε)). For the case of lines or line segments (Δ = 1), the (expected) running time of the algorithm can be improved to O(ndpoly(1/ε)).We note that the size of the core set depends only on the dimension of the input objects and is independent of the input size n and the dimension d of the ambient space

    Similar works