56,299 research outputs found

    On the Failure of the Bootstrap for Matching Estimators

    Get PDF
    Matching estimators are widely used for the evaluation of programs or treatments. Often researchers use bootstrapping methods for inference. However, no formal justification for the use of the bootstrap has been provided. Here we show that the bootstrap is in general not valid, even in the simple case with a single continuous covariate when the estimator is root-N consistent and asymptotically normally distributed with zero asymptotic bias. Due to the extreme non-smoothness of nearest neighbor matching, the standard conditions for the bootstrap are not satisfied, leading the bootstrap variance to diverge from the actual variance. Simulations confirm the difference between actual and nominal coverage rates for bootstrap confidence intervals predicted by the theoretical calculations. To our knowledge, this is the first example of a root-N consistent and asymptotically normal estimator for which the bootstrap fails to work.

    Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning

    Full text link
    This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neural-network, decision-tree, rule-based, and case-based classification techniques. The specific problem tested involves disambiguating six senses of the word ``line'' using the words in the current and proceeding sentence as context. The statistical and neural-network methods perform the best on this particular problem and we discuss a potential reason for this observed difference. We also discuss the role of bias in machine learning and its importance in explaining performance differences observed on specific problems.Comment: 10 page

    Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy

    Get PDF
    Astrophysics and cosmology are rich with data. The advent of wide-area digital cameras on large aperture telescopes has led to ever more ambitious surveys of the sky. Data volumes of entire surveys a decade ago can now be acquired in a single night and real-time analysis is often desired. Thus, modern astronomy requires big data know-how, in particular it demands highly efficient machine learning and image analysis algorithms. But scalability is not the only challenge: Astronomy applications touch several current machine learning research questions, such as learning from biased data and dealing with label and measurement noise. We argue that this makes astronomy a great domain for computer science research, as it pushes the boundaries of data analysis. In the following, we will present this exciting application area for data scientists. We will focus on exemplary results, discuss main challenges, and highlight some recent methodological advancements in machine learning and image analysis triggered by astronomical applications

    Seen and unseen tidal caustics in the Andromeda galaxy

    Full text link
    Indirect detection of high-energy particles from dark matter interactions is a promising avenue for learning more about dark matter, but is hampered by the frequent coincidence of high-energy astrophysical sources of such particles with putative high-density regions of dark matter. We calculate the boost factor and gamma-ray flux from dark matter associated with two shell-like caustics of luminous tidal debris recently discovered around the Andromeda galaxy, under the assumption that dark matter is its own supersymmetric antiparticle. These shell features could be a good candidate for indirect detection of dark matter via gamma rays because they are located far from the primary confusion sources at the galaxy's center, and because the shapes of the shells indicate that most of the mass has piled up near apocenter. Using a numerical estimator specifically calibrated to estimate densities in N-body representations with sharp features and a previously determined N-body model of the shells, we find that the largest boost factors do occur in the shells but are only a few percent. We also find that the gamma-ray flux is an order of magnitude too low to be detected with Fermi for likely dark matter parameters, and about 2 orders of magnitude less than the signal that would have come from the dwarf galaxy that produces the shells in the N-body model. We further show that the radial density profiles and relative radial spacing of the shells, in either dark or luminous matter, is relatively insensitive to the details of the potential of the host galaxy but depends in a predictable way on the velocity dispersion of the progenitor galaxy.Comment: ApJ accepte

    Predictive mean matching imputation in survey sampling

    Get PDF
    Predictive mean matching imputation is popular for handling item nonresponse in survey sampling. In this article, we study the asymptotic properties of the predictive mean matching estimator of the population mean. For variance estimation, the conventional bootstrap inference for matching estimators with fixed matches has been shown to be invalid due to the nonsmoothness nature of the matching estimator. We propose asymptotically valid replication variance estimation. The key strategy is to construct replicates of the estimator directly based on linear terms, instead of individual records of variables. Extension to nearest neighbor imputation is also discussed. A simulation study confirms that the new procedure provides valid variance estimation.Comment: 20 pages, 0 figure, 1 tabl
    corecore