56,299 research outputs found
On the Failure of the Bootstrap for Matching Estimators
Matching estimators are widely used for the evaluation of programs or treatments. Often researchers use bootstrapping methods for inference. However, no formal justification for the use of the bootstrap has been provided. Here we show that the bootstrap is in general not valid, even in the simple case with a single continuous covariate when the estimator is root-N consistent and asymptotically normally distributed with zero asymptotic bias. Due to the extreme non-smoothness of nearest neighbor matching, the standard conditions for the bootstrap are not satisfied, leading the bootstrap variance to diverge from the actual variance. Simulations confirm the difference between actual and nominal coverage rates for bootstrap confidence intervals predicted by the theoretical calculations. To our knowledge, this is the first example of a root-N consistent and asymptotically normal estimator for which the bootstrap fails to work.
Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning
This paper describes an experimental comparison of seven different learning
algorithms on the problem of learning to disambiguate the meaning of a word
from context. The algorithms tested include statistical, neural-network,
decision-tree, rule-based, and case-based classification techniques. The
specific problem tested involves disambiguating six senses of the word ``line''
using the words in the current and proceeding sentence as context. The
statistical and neural-network methods perform the best on this particular
problem and we discuss a potential reason for this observed difference. We also
discuss the role of bias in machine learning and its importance in explaining
performance differences observed on specific problems.Comment: 10 page
Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy
Astrophysics and cosmology are rich with data. The advent of wide-area
digital cameras on large aperture telescopes has led to ever more ambitious
surveys of the sky. Data volumes of entire surveys a decade ago can now be
acquired in a single night and real-time analysis is often desired. Thus,
modern astronomy requires big data know-how, in particular it demands highly
efficient machine learning and image analysis algorithms. But scalability is
not the only challenge: Astronomy applications touch several current machine
learning research questions, such as learning from biased data and dealing with
label and measurement noise. We argue that this makes astronomy a great domain
for computer science research, as it pushes the boundaries of data analysis. In
the following, we will present this exciting application area for data
scientists. We will focus on exemplary results, discuss main challenges, and
highlight some recent methodological advancements in machine learning and image
analysis triggered by astronomical applications
Seen and unseen tidal caustics in the Andromeda galaxy
Indirect detection of high-energy particles from dark matter interactions is
a promising avenue for learning more about dark matter, but is hampered by the
frequent coincidence of high-energy astrophysical sources of such particles
with putative high-density regions of dark matter. We calculate the boost
factor and gamma-ray flux from dark matter associated with two shell-like
caustics of luminous tidal debris recently discovered around the Andromeda
galaxy, under the assumption that dark matter is its own supersymmetric
antiparticle. These shell features could be a good candidate for indirect
detection of dark matter via gamma rays because they are located far from the
primary confusion sources at the galaxy's center, and because the shapes of the
shells indicate that most of the mass has piled up near apocenter. Using a
numerical estimator specifically calibrated to estimate densities in N-body
representations with sharp features and a previously determined N-body model of
the shells, we find that the largest boost factors do occur in the shells but
are only a few percent. We also find that the gamma-ray flux is an order of
magnitude too low to be detected with Fermi for likely dark matter parameters,
and about 2 orders of magnitude less than the signal that would have come from
the dwarf galaxy that produces the shells in the N-body model. We further show
that the radial density profiles and relative radial spacing of the shells, in
either dark or luminous matter, is relatively insensitive to the details of the
potential of the host galaxy but depends in a predictable way on the velocity
dispersion of the progenitor galaxy.Comment: ApJ accepte
Predictive mean matching imputation in survey sampling
Predictive mean matching imputation is popular for handling item nonresponse
in survey sampling. In this article, we study the asymptotic properties of the
predictive mean matching estimator of the population mean. For variance
estimation, the conventional bootstrap inference for matching estimators with
fixed matches has been shown to be invalid due to the nonsmoothness nature of
the matching estimator. We propose asymptotically valid replication variance
estimation. The key strategy is to construct replicates of the estimator
directly based on linear terms, instead of individual records of variables.
Extension to nearest neighbor imputation is also discussed. A simulation study
confirms that the new procedure provides valid variance estimation.Comment: 20 pages, 0 figure, 1 tabl
- …