377,799 research outputs found

    What Can We Learn Privately?

    Full text link
    Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (non-private) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private PAC learner for the class of parity functions. Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms.Comment: 35 pages, 2 figure

    Causal and Design Issues in Clinical Trials

    Get PDF
    The first part of my dissertation focuses on post-randomization modification of intent-to-treat effects. For example, in the field of behavioral science, investigations involve the estimation of the effects of behavioral interventions on final outcomes for individuals stratified by post-randomization moderators measured during the early stages of the intervention (e.g., landmark analyses in cancer research). Motivated by this, we address several questions on the use of standard and causal approaches to assessing the modification of intent-to-treat effects of a randomized intervention by a post-randomization factor. First, we show analytically the bias of the estimators of the corresponding interaction and meaningful main effects for the standard regression model under different combinations of assumptions. Such results show that the assumption of independence between two factors involved in an interaction, which has been assumed in the literature, is not necessary for unbiased estimation. Then, we present a structural nested distribution model estimated with G-estimation equations, which does not assume that the post-randomization variable is effectively randomized to individuals. We show how to obtain efficient estimators of the parameters of the structural distribution model. Finally, we confirm with simulations the performance of these optimal estimators and further assess our approach with data from a randomized cognitive therapy trial. The second part of my dissertation is on optimal and adaptive designs for dose-finding experiments in clinical trials with multiple correlated responses. For instance, in phase I/II studies, efficacy and toxicity are often the primary endpoints which are observed simultaneously and need to be evaluated together. Accordingly, we focus on bivariate responses with one continuous and one categorical. We adopt the bivariate probit dose-response model and study locally optimal, two-stage optimal, and fully adaptive designs under different cost constraints. We assess the performance of the different designs through simulations and suggest that the two-stage designs are as efficient as and may be more efficient than the fully adaptive deigns under a moderate sample size in the initial stage. In addition, two-stage designs are easier to construct and implement, and thus can be a useful approach in practice

    Uplift Modeling with Multiple Treatments and General Response Types

    Full text link
    Randomized experiments have been used to assist decision-making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant heterogeneity in response to treatments. The problem of customizing treatment assignment based on subject characteristics is known as uplift modeling, differential response analysis, or personalized treatment learning in literature. A key feature for uplift modeling is that the data is unlabeled. It is impossible to know whether the chosen treatment is optimal for an individual subject because response under alternative treatments is unobserved. This presents a challenge to both the training and the evaluation of uplift models. In this paper we describe how to obtain an unbiased estimate of the key performance metric of an uplift model, the expected response. We present a new uplift algorithm which creates a forest of randomized trees. The trees are built with a splitting criterion designed to directly optimize their uplift performance based on the proposed evaluation method. Both the evaluation method and the algorithm apply to arbitrary number of treatments and general response types. Experimental results on synthetic data and industry-provided data show that our algorithm leads to significant performance improvement over other applicable methods

    How to ask sensitive questions in conservation: A review of specialized questioning techniques

    Get PDF
    Tools for social research are critical for developing an understanding of conservation problems and assessing the feasibility of conservation actions. Social surveys are an essential tool frequently applied in conservation to assess both people’s behaviour and to understand its drivers. However, little attention has been given to the weaknesses and strengths of different survey tools. When topics of conservation concern are illegal or otherwise sensitive, data collected using direct questions are likely to be affected by non-response and social desirability biases, reducing their validity. These sources of bias associated with using direct questions on sensitive topics have long been recognised in the social sciences but have been poorly considered in conservation and natural resource management. We reviewed specialized questioning techniques developed in a number of disciplines specifically for investigating sensitive topics. These methods ensure respondent anonymity, increase willingness to answer, and critically, make it impossible to directly link incriminating data to an individual. We describe each method and report their main characteristics, such as data requirements, possible data outputs, availability of evidence that they can be adapted for use in illiterate communities, and summarize their main advantages and disadvantages. Recommendations for their application in conservation are given. We suggest that the conservation toolbox should be expanded by incorporating specialized questioning techniques, developed specifically to increase response accuracy. By considering the limitations of each survey technique, we will ultimately contribute to more effective evaluations of conservation interventions and more robust policy decisions

    Experimental designs for multiple-level responses, with application to a large-scale educational intervention

    Full text link
    Educational research often studies subjects that are in naturally clustered groups of classrooms or schools. When designing a randomized experiment to evaluate an intervention directed at teachers, but with effects on teachers and their students, the power or anticipated variance for the treatment effect needs to be examined at both levels. If the treatment is applied to clusters, power is usually reduced. At the same time, a cluster design decreases the probability of contamination, and contamination can also reduce power to detect a treatment effect. Designs that are optimal at one level may be inefficient for estimating the treatment effect at another level. In this paper we study the efficiency of three designs and their ability to detect a treatment effect: randomize schools to treatment, randomize teachers within schools to treatment, and completely randomize teachers to treatment. The three designs are compared for both the teacher and student level within the mixed model framework, and a simulation study is conducted to compare expected treatment variances for the three designs with various levels of correlation within and between clusters. We present a computer program that study designers can use to explore the anticipated variances of treatment effects under proposed experimental designs and settings.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS216 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore