292,494 research outputs found

    Adaptive Data Depth via Multi-Armed Bandits

    Full text link
    Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of ndn^d operations to exactly compute the depth of a single point within a data set of nn points in dd-dimensional space. Often however, we are not directly interested in the absolute depths of the points, but rather in their relative ordering. For example, we may want to find the most central point in a data set (a generalized median), or to identify and remove all outliers (points on the fringe of the data set with low depth). With this observation, we develop a novel and instance-adaptive algorithm for adaptive data depth computation by reducing the problem of exactly computing nn depths to an nn-armed stochastic multi-armed bandit problem which we can efficiently solve. We focus our exposition on simplicial depth, developed by Liu (1990), which has emerged as a promising notion of depth due to its interpretability and asymptotic properties. We provide general instance-dependent theoretical guarantees for our proposed algorithms, which readily extend to many other common measures of data depth including majority depth, Oja depth, and likelihood depth. When specialized to the case where the gaps in the data follow a power law distribution with parameter α<2\alpha<2, we show that we can reduce the complexity of identifying the deepest point in the data set (the simplicial median) from O(nd)O(n^d) to O~(nd−(d−1)α/2)\tilde{O}(n^{d-(d-1)\alpha/2}), where O~\tilde{O} suppresses logarithmic factors. We corroborate our theoretical results with numerical experiments on synthetic data, showing the practical utility of our proposed methods.Comment: Keywords: multi-armed bandits, data depth, adaptivity, large-scale computation, simplicial dept

    An In-Depth Look at Recent Influenza Seasons and Vaccine Effectiveness

    Get PDF
    This paper aims to present an in-depth exploration of immunology, the influenza virus, vaccination, and vaccination’s effectiveness with respect to influenza. It also delves into the possible causes behind the large increase in early childhood deaths during the 2003-2004 influenza season, which was a turning point in terms of influenza incident reporting. Finally, data analysis on the relationship between childhood flu vaccine coverage and childhood outpatient ILI (influenza-like illness) visits by region is presented as a measurement of vaccine effectiveness and identifier of trends. Although this relationship was not statistically significant (alpha=0.05) regionally, this simply points to alternate factors that exist among the relationship between vaccine coverage and outpatient visits in children. The same comparison made over time with national statistics did prove statistically significant (p=0.02), however, other variables are hypothesized to be present in this relationship as well

    Covariance-Aware Private Mean Estimation Without Private Covariance Estimation

    Full text link
    We present two sample-efficient differentially private mean estimators for dd-dimensional (sub)Gaussian distributions with unknown covariance. Informally, given n≳d/α2n \gtrsim d/\alpha^2 samples from such a distribution with mean ÎŒ\mu and covariance ÎŁ\Sigma, our estimators output ÎŒ~\tilde\mu such that ∄Ό~âˆ’ÎŒâˆ„ÎŁâ‰€Î±\| \tilde\mu - \mu \|_{\Sigma} \leq \alpha, where âˆ„â‹…âˆ„ÎŁ\| \cdot \|_{\Sigma} is the Mahalanobis distance. All previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require Ω(d3/2)\Omega(d^{3/2}) samples. Each of our estimators is based on a simple, general approach to designing differentially private mechanisms, but with novel technical steps to make the estimator private and sample-efficient. Our first estimator samples a point with approximately maximum Tukey depth using the exponential mechanism, but restricted to the set of points of large Tukey depth. Proving that this mechanism is private requires a novel analysis. Our second estimator perturbs the empirical mean of the data set with noise calibrated to the empirical covariance, without releasing the covariance itself. Its sample complexity guarantees hold more generally for subgaussian distributions, albeit with a slightly worse dependence on the privacy parameter. For both estimators, careful preprocessing of the data is required to satisfy differential privacy

    Reionization constraints using Principal Component Analysis

    Full text link
    Using a semi-analytical model developed by Choudhury & Ferrara (2005) we study the observational constraints on reionization via a principal component analysis (PCA). Assuming that reionization at z>6 is primarily driven by stellar sources, we decompose the unknown function N_{ion}(z), representing the number of photons in the IGM per baryon in collapsed objects, into its principal components and constrain the latter using the photoionization rate obtained from Ly-alpha forest Gunn-Peterson optical depth, the WMAP7 electron scattering optical depth and the redshift distribution of Lyman-limit systems at z \sim 3.5. The main findings of our analysis are: (i) It is sufficient to model N_{ion}(z) over the redshift range 2<z<14 using 5 parameters to extract the maximum information contained within the data. (ii) All quantities related to reionization can be severely constrained for z<6 because of a large number of data points whereas constraints at z>6 are relatively loose. (iii) The weak constraints on N_{ion}(z) at z>6 do not allow to disentangle different feedback models with present data. There is a clear indication that N_{ion}(z) must increase at z>6, thus ruling out reionization by a single stellar population with non-evolving IMF, and/or star-forming efficiency, and/or photon escape fraction. The data allows for non-monotonic N_{ion}(z) which may contain sharp features around z \sim 7. (iv) The PCA implies that reionization must be 99% completed between 5.8<z<10.3 (95% confidence level) and is expected to be 50% complete at z \approx 9.5-12. With future data sets, like those obtained by Planck, the z>6 constraints will be significantly improved.Comment: Accepted in MNRAS. Revised to match the accepted versio

    Reionization constraints using Principal Component Analysis

    Get PDF
    Using a semi-analytical model developed by Choudhury & Ferrara (2005) we study the observational constraints on reionization via a principal component analysis (PCA). Assuming that reionization at z>6 is primarily driven by stellar sources, we decompose the unknown function N_{ion}(z), representing the number of photons in the IGM per baryon in collapsed objects, into its principal components and constrain the latter using the photoionization rate obtained from Ly-alpha forest Gunn-Peterson optical depth, the WMAP7 electron scattering optical depth and the redshift distribution of Lyman-limit systems at z \sim 3.5. The main findings of our analysis are: (i) It is sufficient to model N_{ion}(z) over the redshift range 2<z<14 using 5 parameters to extract the maximum information contained within the data. (ii) All quantities related to reionization can be severely constrained for z<6 because of a large number of data points whereas constraints at z>6 are relatively loose. (iii) The weak constraints on N_{ion}(z) at z>6 do not allow to disentangle different feedback models with present data. There is a clear indication that N_{ion}(z) must increase at z>6, thus ruling out reionization by a single stellar population with non-evolving IMF, and/or star-forming efficiency, and/or photon escape fraction. The data allows for non-monotonic N_{ion}(z) which may contain sharp features around z \sim 7. (iv) The PCA implies that reionization must be 99% completed between 5.8<z<10.3 (95% confidence level) and is expected to be 50% complete at z \approx 9.5-12. With future data sets, like those obtained by Planck, the z>6 constraints will be significantly improved.Comment: Accepted in MNRAS. Revised to match the accepted versio

    A Monte Carlo simulation of the Sudbury Neutrino Observatory proportional counters

    Get PDF
    The third phase of the Sudbury Neutrino Observatory (SNO) experiment added an array of 3He proportional counters to the detector. The purpose of this Neutral Current Detection (NCD) array was to observe neutrons resulting from neutral-current solar neutrino-deuteron interactions. We have developed a detailed simulation of the current pulses from the NCD array proportional counters, from the primary neutron capture on 3He through the NCD array signal-processing electronics. This NCD array Monte Carlo simulation was used to model the alpha-decay background in SNO's third-phase 8B solar-neutrino measurement.Comment: 38 pages; submitted to the New Journal of Physic
    • 

    corecore