35,806 research outputs found
Dark Quest. I. Fast and Accurate Emulation of Halo Clustering Statistics and Its Application to Galaxy Clustering
We perform an ensemble of -body simulations with particles for
101 flat CDM cosmological models sampled based on a maximin-distance Sliced
Latin Hypercube Design. By using the halo catalogs extracted at multiple
redshifts in the range of , we develop Dark Emulator, which enables
fast and accurate computations of the halo mass function, halo-matter
cross-correlation, and halo auto-correlation as a function of halo masses,
redshift, separations and cosmological models, based on the Principal Component
Analysis and the Gaussian Process Regression for the large-dimensional input
and output data vector. We assess the performance of the emulator using a
validation set of -body simulations that are not used in training the
emulator. We show that, for typical halos hosting CMASS galaxies in the Sloan
Digital Sky Survey, the emulator predicts the halo-matter cross correlation,
relevant for galaxy-galaxy weak lensing, with an accuracy better than and
the halo auto-correlation, relevant for galaxy clustering correlation, with an
accuracy better than . We give several demonstrations of the emulator. It
can be used to study properties of halo mass density profiles such as the
mass-concentration relation and splashback radius for different cosmologies.
The emulator outputs can be combined with an analytical prescription of
halo-galaxy connection such as the halo occupation distribution at the equation
level, instead of using the mock catalogs, to make accurate predictions of
galaxy clustering statistics such as the galaxy-galaxy weak lensing and the
projected correlation function for any model within the CDM cosmologies, in
a few CPU seconds.Comment: 46 pages, 47 figures; version accepted for publication in Ap
Stratification Trees for Adaptive Randomization in Randomized Controlled Trials
This paper proposes an adaptive randomization procedure for two-stage
randomized controlled trials. The method uses data from a first-wave experiment
in order to determine how to stratify in a second wave of the experiment, where
the objective is to minimize the variance of an estimator for the average
treatment effect (ATE). We consider selection from a class of stratified
randomization procedures which we call stratification trees: these are
procedures whose strata can be represented as decision trees, with differing
treatment assignment probabilities across strata. By using the first wave to
estimate a stratification tree, we simultaneously select which covariates to
use for stratification, how to stratify over these covariates, as well as the
assignment probabilities within these strata. Our main result shows that using
this randomization procedure with an appropriate estimator results in an
asymptotic variance which is minimal in the class of stratification trees.
Moreover, the results we present are able to accommodate a large class of
assignment mechanisms within strata, including stratified block randomization.
In a simulation study, we find that our method, paired with an appropriate
cross-validation procedure ,can improve on ad-hoc choices of stratification. We
conclude by applying our method to the study in Karlan and Wood (2017), where
we estimate stratification trees using the first wave of their experiment
New statistical method identifes cytokines that distinguish stool microbiomes
Regressing an outcome or dependent variable onto a set of input or independent variables allows the analyst to measure associations between the two so that changes in the outcome can be described by and predicted by changes in the inputs. While there are many ways of doing this in classical statistics, where the dependent variable has certain properties (e.g., a scalar, survival time, count), little progress on regression where the dependent variable are microbiome taxa counts has been made that do not impose extremely strict conditions on the data. In this paper, we propose and apply a new regression model combining the Dirichlet-multinomial distribution with recursive partitioning providing a fully non-parametric regression model. This model, called DM-RPart, is applied to cytokine data and microbiome taxa count data and is applicable to any microbiome taxa count/metadata, is automatically fit, and intuitively interpretable. This is a model which can be applied to any microbiome or other compositional data and software (R package HMP) available through the R CRAN website
The composite absolute penalties family for grouped and hierarchical variable selection
Extracting useful information from high-dimensional data is an important
focus of today's statistical research and practice. Penalized loss function
minimization has been shown to be effective for this task both theoretically
and empirically. With the virtues of both regularization and sparsity, the
-penalized squared error minimization method Lasso has been popular in
regression models and beyond. In this paper, we combine different norms
including to form an intelligent penalty in order to add side information
to the fitting of a regression or classification model to obtain reasonable
estimates. Specifically, we introduce the Composite Absolute Penalties (CAP)
family, which allows given grouping and hierarchical relationships between the
predictors to be expressed. CAP penalties are built by defining groups and
combining the properties of norm penalties at the across-group and within-group
levels. Grouped selection occurs for nonoverlapping groups. Hierarchical
variable selection is reached by defining groups with particular overlapping
patterns. We propose using the BLASSO and cross-validation to compute CAP
estimates in general. For a subfamily of CAP estimates involving only the
and norms, we introduce the iCAP algorithm to trace the entire
regularization path for the grouped selection problem. Within this subfamily,
unbiased estimates of the degrees of freedom (df) are derived so that the
regularization parameter is selected without cross-validation. CAP is shown to
improve on the predictive performance of the LASSO in a series of simulated
experiments, including cases with and possibly mis-specified
groupings. When the complexity of a model is properly calculated, iCAP is seen
to be parsimonious in the experiments.Comment: Published in at http://dx.doi.org/10.1214/07-AOS584 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Monte Carlo for the LHC
I review the status of the general-purpose Monte Carlo event generators for
the LHC, with emphasis on areas of recent physics developments. There has been
great progress, especially in multi-jet simulation, but I mention some question
marks that have recently arisen.Comment: 10 pages, to appear in the proceedings of Physics at the LHC 2010,
DESY, Hamburg, 7-12 June 201
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
As the field of data science continues to grow, there will be an
ever-increasing demand for tools that make machine learning accessible to
non-experts. In this paper, we introduce the concept of tree-based pipeline
optimization for automating one of the most tedious parts of machine
learning---pipeline design. We implement an open source Tree-based Pipeline
Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a
series of simulated and real-world benchmark data sets. In particular, we show
that TPOT can design machine learning pipelines that provide a significant
improvement over a basic machine learning analysis while requiring little to no
input nor prior knowledge from the user. We also address the tendency for TPOT
to design overly complex pipelines by integrating Pareto optimization, which
produces compact pipelines without sacrificing classification accuracy. As
such, this work represents an important step toward fully automating machine
learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet
made from reviewer comment
THE TOOLS AND MONTE CARLO WORKING GROUP Summary Report from the Les Houches 2009 Workshop on TeV Colliders
This is the summary and introduction to the proceedings contributions for the
Les Houches 2009 "Tools and Monte Carlo" working group.Comment: 144 Pages. Workshop site
http://wwwlapp.in2p3.fr/conferences/LesHouches/Houches2009/ . Conveners were
Butterworth, Maltoni, Moortgat, Richardson, Schumann and Skand
- …