292,494 research outputs found
Adaptive Data Depth via Multi-Armed Bandits
Data depth, introduced by Tukey (1975), is an important tool in data science,
robust statistics, and computational geometry. One chief barrier to its broader
practical utility is that many common measures of depth are computationally
intensive, requiring on the order of operations to exactly compute the
depth of a single point within a data set of points in -dimensional
space. Often however, we are not directly interested in the absolute depths of
the points, but rather in their relative ordering. For example, we may want to
find the most central point in a data set (a generalized median), or to
identify and remove all outliers (points on the fringe of the data set with low
depth). With this observation, we develop a novel and instance-adaptive
algorithm for adaptive data depth computation by reducing the problem of
exactly computing depths to an -armed stochastic multi-armed bandit
problem which we can efficiently solve. We focus our exposition on simplicial
depth, developed by Liu (1990), which has emerged as a promising notion of
depth due to its interpretability and asymptotic properties. We provide general
instance-dependent theoretical guarantees for our proposed algorithms, which
readily extend to many other common measures of data depth including majority
depth, Oja depth, and likelihood depth. When specialized to the case where the
gaps in the data follow a power law distribution with parameter , we
show that we can reduce the complexity of identifying the deepest point in the
data set (the simplicial median) from to
, where suppresses logarithmic
factors. We corroborate our theoretical results with numerical experiments on
synthetic data, showing the practical utility of our proposed methods.Comment: Keywords: multi-armed bandits, data depth, adaptivity, large-scale
computation, simplicial dept
An In-Depth Look at Recent Influenza Seasons and Vaccine Effectiveness
This paper aims to present an in-depth exploration of immunology, the influenza virus, vaccination, and vaccinationâs effectiveness with respect to influenza. It also delves into the possible causes behind the large increase in early childhood deaths during the 2003-2004 influenza season, which was a turning point in terms of influenza incident reporting. Finally, data analysis on the relationship between childhood flu vaccine coverage and childhood outpatient ILI (influenza-like illness) visits by region is presented as a measurement of vaccine effectiveness and identifier of trends. Although this relationship was not statistically significant (alpha=0.05) regionally, this simply points to alternate factors that exist among the relationship between vaccine coverage and outpatient visits in children. The same comparison made over time with national statistics did prove statistically significant (p=0.02), however, other variables are hypothesized to be present in this relationship as well
Covariance-Aware Private Mean Estimation Without Private Covariance Estimation
We present two sample-efficient differentially private mean estimators for
-dimensional (sub)Gaussian distributions with unknown covariance.
Informally, given samples from such a distribution with
mean and covariance , our estimators output such that
, where is
the Mahalanobis distance. All previous estimators with the same guarantee
either require strong a priori bounds on the covariance matrix or require
samples.
Each of our estimators is based on a simple, general approach to designing
differentially private mechanisms, but with novel technical steps to make the
estimator private and sample-efficient. Our first estimator samples a point
with approximately maximum Tukey depth using the exponential mechanism, but
restricted to the set of points of large Tukey depth. Proving that this
mechanism is private requires a novel analysis. Our second estimator perturbs
the empirical mean of the data set with noise calibrated to the empirical
covariance, without releasing the covariance itself. Its sample complexity
guarantees hold more generally for subgaussian distributions, albeit with a
slightly worse dependence on the privacy parameter. For both estimators,
careful preprocessing of the data is required to satisfy differential privacy
Reionization constraints using Principal Component Analysis
Using a semi-analytical model developed by Choudhury & Ferrara (2005) we
study the observational constraints on reionization via a principal component
analysis (PCA). Assuming that reionization at z>6 is primarily driven by
stellar sources, we decompose the unknown function N_{ion}(z), representing the
number of photons in the IGM per baryon in collapsed objects, into its
principal components and constrain the latter using the photoionization rate
obtained from Ly-alpha forest Gunn-Peterson optical depth, the WMAP7 electron
scattering optical depth and the redshift distribution of Lyman-limit systems
at z \sim 3.5. The main findings of our analysis are: (i) It is sufficient to
model N_{ion}(z) over the redshift range 2<z<14 using 5 parameters to extract
the maximum information contained within the data. (ii) All quantities related
to reionization can be severely constrained for z<6 because of a large number
of data points whereas constraints at z>6 are relatively loose. (iii) The weak
constraints on N_{ion}(z) at z>6 do not allow to disentangle different feedback
models with present data. There is a clear indication that N_{ion}(z) must
increase at z>6, thus ruling out reionization by a single stellar population
with non-evolving IMF, and/or star-forming efficiency, and/or photon escape
fraction. The data allows for non-monotonic N_{ion}(z) which may contain sharp
features around z \sim 7. (iv) The PCA implies that reionization must be 99%
completed between 5.8<z<10.3 (95% confidence level) and is expected to be 50%
complete at z \approx 9.5-12. With future data sets, like those obtained by
Planck, the z>6 constraints will be significantly improved.Comment: Accepted in MNRAS. Revised to match the accepted versio
Reionization constraints using Principal Component Analysis
Using a semi-analytical model developed by Choudhury & Ferrara (2005) we
study the observational constraints on reionization via a principal component
analysis (PCA). Assuming that reionization at z>6 is primarily driven by
stellar sources, we decompose the unknown function N_{ion}(z), representing the
number of photons in the IGM per baryon in collapsed objects, into its
principal components and constrain the latter using the photoionization rate
obtained from Ly-alpha forest Gunn-Peterson optical depth, the WMAP7 electron
scattering optical depth and the redshift distribution of Lyman-limit systems
at z \sim 3.5. The main findings of our analysis are: (i) It is sufficient to
model N_{ion}(z) over the redshift range 2<z<14 using 5 parameters to extract
the maximum information contained within the data. (ii) All quantities related
to reionization can be severely constrained for z<6 because of a large number
of data points whereas constraints at z>6 are relatively loose. (iii) The weak
constraints on N_{ion}(z) at z>6 do not allow to disentangle different feedback
models with present data. There is a clear indication that N_{ion}(z) must
increase at z>6, thus ruling out reionization by a single stellar population
with non-evolving IMF, and/or star-forming efficiency, and/or photon escape
fraction. The data allows for non-monotonic N_{ion}(z) which may contain sharp
features around z \sim 7. (iv) The PCA implies that reionization must be 99%
completed between 5.8<z<10.3 (95% confidence level) and is expected to be 50%
complete at z \approx 9.5-12. With future data sets, like those obtained by
Planck, the z>6 constraints will be significantly improved.Comment: Accepted in MNRAS. Revised to match the accepted versio
A Monte Carlo simulation of the Sudbury Neutrino Observatory proportional counters
The third phase of the Sudbury Neutrino Observatory (SNO) experiment added an
array of 3He proportional counters to the detector. The purpose of this Neutral
Current Detection (NCD) array was to observe neutrons resulting from
neutral-current solar neutrino-deuteron interactions. We have developed a
detailed simulation of the current pulses from the NCD array proportional
counters, from the primary neutron capture on 3He through the NCD array
signal-processing electronics. This NCD array Monte Carlo simulation was used
to model the alpha-decay background in SNO's third-phase 8B solar-neutrino
measurement.Comment: 38 pages; submitted to the New Journal of Physic
- âŠ