97,267 research outputs found

    Indexing the Earth Mover's Distance Using Normal Distributions

    Full text link
    Querying uncertain data sets (represented as probability distributions) presents many challenges due to the large amount of data involved and the difficulties comparing uncertainty between distributions. The Earth Mover's Distance (EMD) has increasingly been employed to compare uncertain data due to its ability to effectively capture the differences between two distributions. Computing the EMD entails finding a solution to the transportation problem, which is computationally intensive. In this paper, we propose a new lower bound to the EMD and an index structure to significantly improve the performance of EMD based K-nearest neighbor (K-NN) queries on uncertain databases. We propose a new lower bound to the EMD that approximates the EMD on a projection vector. Each distribution is projected onto a vector and approximated by a normal distribution, as well as an accompanying error term. We then represent each normal as a point in a Hough transformed space. We then use the concept of stochastic dominance to implement an efficient index structure in the transformed space. We show that our method significantly decreases K-NN query time on uncertain databases. The index structure also scales well with database cardinality. It is well suited for heterogeneous data sets, helping to keep EMD based queries tractable as uncertain data sets become larger and more complex.Comment: VLDB201

    Combining information from independent sources through confidence distributions

    Full text link
    This paper develops new methodology, together with related theories, for combining information from independent studies through confidence distributions. A formal definition of a confidence distribution and its asymptotic counterpart (i.e., asymptotic confidence distribution) are given and illustrated in the context of combining information. Two general combination methods are developed: the first along the lines of combining p-values, with some notable differences in regard to optimality of Bahadur type efficiency; the second by multiplying and normalizing confidence densities. The latter approach is inspired by the common approach of multiplying likelihood functions for combining parametric information. The paper also develops adaptive combining methods, with supporting asymptotic theory which should be of practical interest. The key point of the adaptive development is that the methods attempt to combine only the correct information, downweighting or excluding studies containing little or wrong information about the true parameter of interest. The combination methodologies are illustrated in simulated and real data examples with a variety of applications.Comment: Published at http://dx.doi.org/10.1214/009053604000001084 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Confidence distribution (CD) -- distribution estimator of a parameter

    Full text link
    The notion of confidence distribution (CD), an entirely frequentist concept, is in essence a Neymanian interpretation of Fisher's Fiducial distribution. It contains information related to every kind of frequentist inference. In this article, a CD is viewed as a distribution estimator of a parameter. This leads naturally to consideration of the information contained in CD, comparison of CDs and optimal CDs, and connection of the CD concept to the (profile) likelihood function. A formal development of a multiparameter CD is also presented.Comment: Published at http://dx.doi.org/10.1214/074921707000000102 in the IMS Lecture Notes Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fast Spinning Pulsars as Probes of Massive Black Holes' Gravity

    Get PDF
    Dwarf galaxies and globular clusters may contain intermediate mass black holes (10310^{3} to 10510^{5} solar masses) in their cores. Estimates of ~10310^{3} neutron stars in the central parsec of the Galaxy and similar numbers in small elliptical galaxies and globular clusters along with an estimated high probability of ms-pulsar formation in those environments has led many workers to propose the use of ms-pulsar timing to measure the mass and spin of intermediate mass black holes. Models of pulsar motion around a rotating black hole generally assume geodesic motion of a "test" particle in the Kerr metric. These approaches account for well-known effects like de Sitter precession and the Lense-Thirring effect but they do not account for the non-linear effect of the pulsar's stress-energy tensor on the space-time metric. Here we model the motion of a pulsar near a black hole with the Mathisson-Papapetrou-Dixon (MPD) equations. Numerical integration of the MPD equations for black holes of mass 2 X 10610^{6}, 10510^{5} and 10310^{3} solar masses shows that the pulsar will not remain in an orbital plane with motion vertical to the plane being largest relative to the orbit's radial dimensions for the lower mass black holes. The pulsar's out of plane motion will lead to timing variations that are up to ~10 microseconds different from those predicted by planar orbit models. Such variations might be detectable in long term observations of millisecond pulsars. If pulsar signals are used to measure the mass and spin of intermediate mass black holes on the basis of dynamical models of the received pulsar signal then the out of plane motion of the pulsar should be part of that model.Comment: Accepted by MNRAS March 27, 201
    • …
    corecore