970 research outputs found
The Limits of Post-Selection Generalization
While statistics and machine learning offers numerous methods for ensuring
generalization, these methods often fail in the presence of adaptivity---the
common practice in which the choice of analysis depends on previous
interactions with the same dataset. A recent line of work has introduced
powerful, general purpose algorithms that ensure post hoc generalization (also
called robust or post-selection generalization), which says that, given the
output of the algorithm, it is hard to find any statistic for which the data
differs significantly from the population it came from.
In this work we show several limitations on the power of algorithms
satisfying post hoc generalization. First, we show a tight lower bound on the
error of any algorithm that satisfies post hoc generalization and answers
adaptively chosen statistical queries, showing a strong barrier to progress in
post selection data analysis. Second, we show that post hoc generalization is
not closed under composition, despite many examples of such algorithms
exhibiting strong composition properties
Robust Online Hamiltonian Learning
In this work we combine two distinct machine learning methodologies,
sequential Monte Carlo and Bayesian experimental design, and apply them to the
problem of inferring the dynamical parameters of a quantum system. We design
the algorithm with practicality in mind by including parameters that control
trade-offs between the requirements on computational and experimental
resources. The algorithm can be implemented online (during experimental data
collection), avoiding the need for storage and post-processing. Most
importantly, our algorithm is capable of learning Hamiltonian parameters even
when the parameters change from experiment-to-experiment, and also when
additional noise processes are present and unknown. The algorithm also
numerically estimates the Cramer-Rao lower bound, certifying its own
performance.Comment: 24 pages, 12 figures; to appear in New Journal of Physic
Likelihood Adaptively Modified Penalties
A new family of penalty functions, adaptive to likelihood, is introduced for
model selection in general regression models. It arises naturally through
assuming certain types of prior distribution on the regression parameters. To
study stability properties of the penalized maximum likelihood estimator, two
types of asymptotic stability are defined. Theoretical properties, including
the parameter estimation consistency, model selection consistency, and
asymptotic stability, are established under suitable regularity conditions. An
efficient coordinate-descent algorithm is proposed. Simulation results and real
data analysis show that the proposed method has competitive performance in
comparison with existing ones.Comment: 42 pages, 4 figure
Semiparametric Robust Estimation of Truncated and Censored Regression Models
Many estimation methods of truncated and censored regression models such as the maximum likelihood and symmetrically censored least squares (SCLS) are sensitive to outliers and data contamination as we document. Therefore, we propose a semipara- metric general trimmed estimator (GTE) of truncated and censored regression, which is highly robust and relatively imprecise. To improve its performance, we also propose data-adaptive and one-step trimmed estimators. We derive the robust and asymptotic properties of all proposed estimators and show that the one-step estimators (e.g., one-step SCLS) are as robust as GTE and are asymptotically equivalent to the original estimator (e.g., SCLS). The infinite-sample properties of existing and proposed estimators are studied by means of Monte Carlo simulations.Asymptotic normality;censored regression;one-step estimation;robust esti- mation;trimming;truncated regression
Improving population-specific allele frequency estimates by adapting supplemental data: an empirical Bayes approach
Estimation of the allele frequency at genetic markers is a key ingredient in
biological and biomedical research, such as studies of human genetic variation
or of the genetic etiology of heritable traits. As genetic data becomes
increasingly available, investigators face a dilemma: when should data from
other studies and population subgroups be pooled with the primary data? Pooling
additional samples will generally reduce the variance of the frequency
estimates; however, used inappropriately, pooled estimates can be severely
biased due to population stratification. Because of this potential bias, most
investigators avoid pooling, even for samples with the same ethnic background
and residing on the same continent. Here, we propose an empirical Bayes
approach for estimating allele frequencies of single nucleotide polymorphisms.
This procedure adaptively incorporates genotypes from related samples, so that
more similar samples have a greater influence on the estimates. In every
example we have considered, our estimator achieves a mean squared error (MSE)
that is smaller than either pooling or not, and sometimes substantially
improves over both extremes. The bias introduced is small, as is shown by a
simulation study that is carefully matched to a real data example. Our method
is particularly useful when small groups of individuals are genotyped at a
large number of markers, a situation we are likely to encounter in a
genome-wide association study.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS121 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …