412,670 research outputs found
Blending of Cepheids in M33
A precise and accurate determination of the Hubble constant based on Cepheid
variables requires proper characterization of many sources of systematic error.
One of these is stellar blending, which biases the measured fluxes of Cepheids
and the resulting distance estimates. We study the blending of 149 Cepheid
variables in M33 by matching archival Hubble Space Telescope data with images
obtained at the WIYN 3.5-m telescope, which differ by a factor of 10 in angular
resolution.
We find that 55+-4% of the Cepheids have no detectable nearby companions that
could bias the WIYN V-band photometry, while the fraction of Cepheids affected
below the 10% level is 73+-4%. The corresponding values for the I band are
60+-4% and 72+-4%, respectively. We find no statistically significant
difference in blending statistics as a function of period or surface
brightness. Additionally, we report all the detected companions within 2
arcseconds of the Cepheids (equivalent to 9 pc at the distance of M33) which
may be used to derive empirical blending corrections for Cepheids at larger
distances.Comment: v2: Fixed incorrect description of Figure 2 in text. Accepted for
publication in AJ. Full data tables can be found in ASCII format as part of
the source distribution. A version of the paper with higher-resolution
figures can be found at
http://faculty.physics.tamu.edu/lmacri/papers/chavez12.pd
Predicting the Clustering of X-Ray Selected Galaxy Clusters in Flux-Limited Surveys
(abridged) We present a model to predict the clustering properties of X-ray
clusters in flux-limited surveys. Our technique correctly accounts for past
light-cone effects on the observed clustering and follows the non-linear
evolution in redshift of the underlying DM correlation function and cluster
bias factor. The conversion of the limiting flux of a survey into the
corresponding minimum mass of the hosting DM haloes is obtained by using
theoretical and empirical relations between mass, temperature and X-ray
luminosity of clusters. Finally, our model is calibrated to reproduce the
observed cluster counts adopting a temperature-luminosity relation moderately
evolving with redshift. We apply our technique to three existing catalogues:
BCS, XBACs and REFLEX samples. Moreover, we consider an example of possible
future space missions with fainter limiting flux. In general, we find that the
amplitude of the spatial correlation function is a decreasing function of the
limiting flux and that the EdS models always give smaller correlation
amplitudes than open or flat models with low matter density parameter. In the
case of XBACs, the comparison with previous estimates of the observational
spatial correlation shows that only the predictions of models with Omega_0m=0.3
are in good agreement with the data, while the EdS models have too low a
correlation strength. Finally, we use our technique to discuss the best
strategy for future surveys. Our results show that the choice of a wide area
catalogue, even with a brighter limiting flux, is preferable to a deeper, but
with smaller area, survey.Comment: 20 pages, Latex using MN style, 11 figures enclosed. Version accepted
for publication in MNRA
The First Comparison Between Swarm-C Accelerometer-Derived Thermospheric Densities and Physical and Empirical Model Estimates
The first systematic comparison between Swarm-C accelerometer-derived
thermospheric density and both empirical and physics-based model results using
multiple model performance metrics is presented. This comparison is performed
at the satellite's high temporal 10-s resolution, which provides a meaningful
evaluation of the models' fidelity for orbit prediction and other space weather
forecasting applications. The comparison against the physical model is
influenced by the specification of the lower atmospheric forcing, the
high-latitude ionospheric plasma convection, and solar activity. Some insights
into the model response to thermosphere-driving mechanisms are obtained through
a machine learning exercise. The results of this analysis show that the
short-timescale variations observed by Swarm-C during periods of high solar and
geomagnetic activity were better captured by the physics-based model than the
empirical models. It is concluded that Swarm-C data agree well with the
climatologies inherent within the models and are, therefore, a useful data set
for further model validation and scientific research.Comment: https://goo.gl/n4QvU
Reducing the number of inputs in nonlocal games
In this work we show how a vector-valued version of Schechtman's empirical
method can be used to reduce the number of inputs in a nonlocal game while
preserving the quotient of the quantum over the classical
bias. We apply our method to the Khot-Vishnoi game, with exponentially many
questions per player, to produce another game with polynomially many () questions so that the quantum over the classical bias is
A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining
Big data comes in various ways, types, shapes, forms and sizes. Indeed,
almost all areas of science, technology, medicine, public health, economics,
business, linguistics and social science are bombarded by ever increasing flows
of data begging to analyzed efficiently and effectively. In this paper, we
propose a rough idea of a possible taxonomy of big data, along with some of the
most commonly used tools for handling each particular category of bigness. The
dimensionality p of the input space and the sample size n are usually the main
ingredients in the characterization of data bigness. The specific statistical
machine learning technique used to handle a particular big data set will depend
on which category it falls in within the bigness taxonomy. Large p small n data
sets for instance require a different set of tools from the large n small p
variety. Among other tools, we discuss Preprocessing, Standardization,
Imputation, Projection, Regularization, Penalization, Compression, Reduction,
Selection, Kernelization, Hybridization, Parallelization, Aggregation,
Randomization, Replication, Sequentialization. Indeed, it is important to
emphasize right away that the so-called no free lunch theorem applies here, in
the sense that there is no universally superior method that outperforms all
other methods on all categories of bigness. It is also important to stress the
fact that simplicity in the sense of Ockham's razor non plurality principle of
parsimony tends to reign supreme when it comes to massive data. We conclude
with a comparison of the predictive performance of some of the most commonly
used methods on a few data sets.Comment: 18 pages, 2 figures 3 table
Experimental Comparison of Empirical Material Decomposition Methods for Spectral CT
Material composition can be estimated from spectral information acquired using photon counting x-ray detectors with pulse height analysis. Non-ideal effects in photon counting x-ray detectors such as charge-sharing, k-escape, and pulse-pileup distort the detected spectrum, which can cause material decomposition errors. This work compared the performance of two empirical decomposition methods: a neural network estimator and a linearized maximum likelihood estimator with correction (A-table method). The two investigated methods differ in how they model the nonlinear relationship between the spectral measurements and material decomposition estimates. The bias and standard deviation of material decomposition estimates were compared for the two methods, using both simulations and experiments with a photon-counting x-ray detector. Both the neural network and A-table methods demonstrated a similar performance for the simulated data. The neural network had lower standard deviation for nearly all thicknesses of the test materials in the collimated (low scatter) and uncollimated (higher scatter) experimental data. In the experimental study of Teflon thicknesses, non-ideal detector effects demonstrated a potential bias of 11–28%, which was reduced to 0.1–11% using the proposed empirical methods. Overall, the results demonstrated preliminary experimental feasibility of empirical material decomposition for spectral CT using photon-counting detectors
Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification
We consider the high energy physics unfolding problem where the goal is to
estimate the spectrum of elementary particles given observations distorted by
the limited resolution of a particle detector. This important statistical
inverse problem arising in data analysis at the Large Hadron Collider at CERN
consists in estimating the intensity function of an indirectly observed Poisson
point process. Unfolding typically proceeds in two steps: one first produces a
regularized point estimate of the unknown intensity and then uses the
variability of this estimator to form frequentist confidence intervals that
quantify the uncertainty of the solution. In this paper, we propose forming the
point estimate using empirical Bayes estimation which enables a data-driven
choice of the regularization strength through marginal maximum likelihood
estimation. Observing that neither Bayesian credible intervals nor standard
bootstrap confidence intervals succeed in achieving good frequentist coverage
in this problem due to the inherent bias of the regularized point estimate, we
introduce an iteratively bias-corrected bootstrap technique for constructing
improved confidence intervals. We show using simulations that this enables us
to achieve nearly nominal frequentist coverage with only a modest increase in
interval length. The proposed methodology is applied to unfolding the boson
invariant mass spectrum as measured in the CMS experiment at the Large Hadron
Collider.Comment: Published at http://dx.doi.org/10.1214/15-AOAS857 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org). arXiv admin note:
substantial text overlap with arXiv:1401.827
On statistical approaches to generate Level 3 products from satellite remote sensing retrievals
Satellite remote sensing of trace gases such as carbon dioxide (CO) has
increased our ability to observe and understand Earth's climate. However, these
remote sensing data, specifically~Level 2 retrievals, tend to be irregular in
space and time, and hence, spatio-temporal prediction is required to infer
values at any location and time point. Such inferences are not only required to
answer important questions about our climate, but they are also needed for
validating the satellite instrument, since Level 2 retrievals are generally not
co-located with ground-based remote sensing instruments. Here, we discuss
statistical approaches to construct Level 3 products from Level 2 retrievals,
placing particular emphasis on the strengths and potential pitfalls when using
statistical prediction in this context. Following this discussion, we use a
spatio-temporal statistical modelling framework known as fixed rank kriging
(FRK) to obtain global predictions and prediction standard errors of
column-averaged carbon dioxide based on Version 7r and Version 8r retrievals
from the Orbiting Carbon Observatory-2 (OCO-2) satellite. The FRK predictions
allow us to validate statistically the Level 2 retrievals globally even though
the data are at locations and at time points that do not coincide with
validation data. Importantly, the validation takes into account the prediction
uncertainty, which is dependent both on the temporally-varying density of
observations around the ground-based measurement sites and on the
spatio-temporal high-frequency components of the trace gas field that are not
explicitly modelled. Here, for validation of remotely-sensed CO data, we
use observations from the Total Carbon Column Observing Network. We demonstrate
that the resulting FRK product based on Version 8r compares better with TCCON
data than that based on Version 7r.Comment: 28 pages, 10 figures, 4 table
- …