5,068 research outputs found
Thermodynamics and concentration
We show that the thermal subadditivity of entropy provides a common basis to
derive a strong form of the bounded difference inequality and related results
as well as more recent inequalities applicable to convex Lipschitz functions,
random symmetric matrices, shortest travelling salesmen paths and weakly
self-bounding functions. We also give two new concentration inequalities.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ341 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Settling the Sample Complexity of Single-parameter Revenue Maximization
This paper settles the sample complexity of single-parameter revenue
maximization by showing matching upper and lower bounds, up to a
poly-logarithmic factor, for all families of value distributions that have been
considered in the literature. The upper bounds are unified under a novel
framework, which builds on the strong revenue monotonicity by Devanur, Huang,
and Psomas (STOC 2016), and an information theoretic argument. This is
fundamentally different from the previous approaches that rely on either
constructing an -net of the mechanism space, explicitly or implicitly
via statistical learning theory, or learning an approximately accurate version
of the virtual values. To our knowledge, it is the first time information
theoretical arguments are used to show sample complexity upper bounds, instead
of lower bounds. Our lower bounds are also unified under a meta construction of
hard instances.Comment: 49 pages, Accepted by STOC1
Recommended from our members
Sequential and Adaptive Inference Based on Martingale Concentration
Randomized experiments hold a well-deserved place at the top of the hierarchy of scientific evidence, and as such have received a great deal of attention from the statistical research community. In the simplest setting, a fixed group of subjects is available to the experimenter, who assigns one of two treatments to each subject via randomization, then observes corresponding outcomes. The goal is to draw inference about the effect of the experimental treatment on the observed outcome.Classical, frequentist statistical inference provides a powerful set of tools for this fixed-sample setting. We begin with an observed sample of some deterministic size and seek procedures which yield valid hypothesis tests, p-values, and confidence intervals---for example, a t-test of the null hypothesis that the experimental treatment has no effect, on average, or a corresponding confidence interval for the average treatment effect. The fixed-sample paradigm demands that we plan the experiment ahead of time, including the size of the experimental sample and the exact hypotheses to be tested, and that we adhere rigidly to this plan.In contrast, modern data analysis demands adaptivity. In particular, often the sample we choose to analyze is itself selected on the basis of observed data. For example, in an online A/B test, we may observe an ongoing stream of visitors enrolled into an experiment, so that the experimental sample is growing over time. The final experimental sample will include all of the visitors observed up to the time we decide to stop the experiment. The decision to stop could be made adaptively, by monitoring observed results and stopping early if a strong effect is observed, later if not. This is the realm of sequential, as opposed to fixed-sample, analysis.There are many other kinds of adaptivity that arise in practice. A second example is in the analysis of nonrandomized, or observational, studies of causal effects. In testing for statistical evidence of an effect, we may choose to focus on a subpopulation which we believe to be highly affected by the treatment of interest. For example, in studying the effect of fish consumption on mercury levels in the blood, we may focus on individuals whose diets are especially high in fish. Classical statistics requires that we define precisely which diets will be classified as "especially high in fish" before we analyze outcomes, but experimenters may prefer for this choice to be guided by the observed outcomes themselves.In both of the above examples---the sequential stopping of a randomized experiment and the adaptive choice of subgroup in an observational study---the use of fixed-sample methods, which do not account for adaptivity, will lead to violations of statistical guarantees such as false positive control. These violations are commonly included under the label "p-hacking" and have received much blame for the lack of reproducibility in various fields of scientific research. Fortunately, alternative statistical methods are available, methods that explicitly account for adaptivity to yield robust inference while placing fewer restrictions on the researcher. Such methods are the ultimate aim of the present work.This thesis develops a framework for constructing sequential and adaptive statistical procedures by taking advantage of the time-uniform concentration properties of certain martingales. Chapter 1 begins by laying out a mathematical framework for the derivation of time-uniform concentration inequalities for various classes of martingales. This framework unifies and strengthens a plethora of results from the exponential concentration literature and provides a toolbox for developing sequential and adaptive statistical procedures. The remaining three chapters develop such procedures.Chapter 2 builds upon the techniques of Chapter 1 to develop uniform concentration bounds which are somewhat more analytically and computationally complex but are much more useful for statistical applications. We frame these methods in terms of confidence sequences, that is, sequences of confidence intervals that are uniformly valid over an unbounded time horizon. One of the key results of this work is an empirical-Bernstein confidence sequence which provides a time-uniform, nonparametric, and non-asymptotic analogue of the t-test applicable to any distribution with bounded support. We explore applications to sequential estimation of average treatment effects in a randomized experiment, our first example above, as well as sequential estimation of a covariance matrix.Chapter 3 applies ideas from Chapters 1 and 2 to develop methods for the two related problems of estimating quantiles and estimating the entire cumulative distribution function, based on i.i.d. samples. We present confidence sequences for these estimands which are valid uniformly over time for any distribution, and we explore applications to A/B testing and best-arm identification when objectives are based on quantiles rather than means. Finally, Chapter 4 explores an application of uniform martingale concentration to the second example given above, the adaptive choice of subgroup within the analysis of an observational study. We introduce Rosenbaum's sensitivity analysis framework for observational studies, and show how our procedure yields qualitative improvements over existing methods within this framework.The martingale-based inferential methods we explore in this work trace their origins to Abraham Wald's work on the sequential probability ratio test during the 1940s, as well as to pioneering extensions developed in the late 1960s and early 1970s by Herbert Robbins, Donald Darling, David Siegmund, and Tze Leung Lai, not to mention many others. However, despite the decades of relevant literature, we believe most of the potential of the core ideas has yet to be realized. The key to unlocking this potential, we hope, is a fuller understanding of the nonparametric applicability of these methods, a detailed study of their implementation and tuning in practice, and an exploration of their utility beyond the sequential setting. While we propose several procedures that have immediate practical utility, we hope the larger contribution of the work will be as a first step towards a deeper appreciation of the power of martingale-based methods for adaptive inference, and ultimately to the development of a new class of statistical procedures which permit the kinds of adaptivity contemporary data analysts desire
Inflammation, insulin resistance, and diabetes-mendelian randomization using CRP haplotypes points upstream
Background
Raised C-reactive protein (CRP) is a risk factor for type 2 diabetes. According to the Mendelian randomization method, the association is likely to be causal if genetic variants that affect CRP level are associated with markers of diabetes development and diabetes. Our objective was to examine the nature of the association between CRP phenotype and diabetes development using CRP haplotypes as instrumental variables.
Methods and Findings
We genotyped three tagging SNPs (CRP + 2302G > A; CRP + 1444T > C; CRP + 4899T > G) in the CRP gene and measured serum CRP in 5,274 men and women at mean ages 49 and 61 y (Whitehall II Study). Homeostasis model assessment-insulin resistance (HOMA-IR) and hemoglobin A1c (HbA1c) were measured at age 61 y. Diabetes was ascertained by glucose tolerance test and self-report. Common major haplotypes were strongly associated with serum CRP levels, but unrelated to obesity, blood pressure, and socioeconomic position, which may confound the association between CRP and diabetes risk. Serum CRP was associated with these potential confounding factors. After adjustment for age and sex, baseline serum CRP was associated with incident diabetes (hazard ratio = 1.39 [95% confidence interval 1.29-1.51], HOMA-IR, and HbA1c, but the associations were considerably attenuated on adjustment for potential confounding factors. In contrast, CRP haplotypes were not associated with HOMA-IR or HbA1c (p=0.52-0.92). The associations of CRP with HOMA-IR and HbA1c were all null when examined using instrumental variables analysis, with genetic variants as the instrument for serum CRP. Instrumental variables estimates differed from the directly observed associations (p=0.007-0.11). Pooled analysis of CRP haplotypes and diabetes in Whitehall II and Northwick Park Heart Study II produced null findings (p=0.25-0.88). Analyses based on the Wellcome Trust Case Control Consortium (1,923 diabetes cases, 2,932 controls) using three SNPs in tight linkage disequilibrium with our tagging SNPs also demonstrated null associations.
Conclusions
Observed associations between serum CRP and insulin resistance, glycemia, and diabetes are likely to be noncausal. Inflammation may play a causal role via upstream effectors rather than the downstream marker CRP
- …