1,094 research outputs found

    Second-Order Inference for the Mean of a Variable Missing at Random

    Get PDF
    We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE improved the coverage probability of a confidence interval by up to 85%. In addition, we present a first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In our simulations, the proposed first-order estimator improved the coverage probability by up to 90%. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator

    Gravitational Lensing and the Power Spectrum of Dark Matter Substructure: Insights from the ETHOS N-body Simulations

    Full text link
    Strong gravitational lensing has been identified as a promising astrophysical probe to study the particle nature of dark matter. In this paper we present a detailed study of the power spectrum of the projected mass density (convergence) field of substructure in a Milky Way-sized halo. This power spectrum has been suggested as a key observable that can be extracted from strongly lensed images and yield important clues about the matter distribution within the lens galaxy. We use two different NN-body simulations from the ETHOS framework: one with cold dark matter and another with self-interacting dark matter and a cutoff in the initial power spectrum. Despite earlier works that identified k100 k \gtrsim 100 kpc1^{-1} as the most promising scales to learn about the particle nature of dark matter we find that even at lower wavenumbers - which are actually within reach of observations in the near future - we can gain important information about dark matter. Comparing the amplitude and slope of the power spectrum on scales 0.1k/0.1 \lesssim k/kpc110^{-1} \lesssim 10 from lenses at different redshifts can help us distinguish between cold dark matter and other exotic dark matter scenarios that alter the abundance and central densities of subhalos. Furthermore, by considering the contribution of different mass bins to the power spectrum we find that subhalos in the mass range 10710810^7 - 10^8 M_{\odot} are on average the largest contributors to the power spectrum signal on scales 2k/2 \lesssim k/kpc115^{-1} \lesssim 15, despite the numerous subhalos with masses >108> 10^8 M_{\odot} in a typical lens galaxy. Finally, by comparing the power spectra obtained from the subhalo catalogs to those from the particle data in the simulation snapshots we find that the seemingly-too-simple halo model is in fact a fairly good approximation to the much more complex array of substructure in the lens.Comment: 13 pages + appendices, 7 figure

    Targeted Data Adaptive Estimation of the Causal Dose Response Curve

    Get PDF
    Estimation of the causal dose-response curve is an old problem in statistics. In a non parametric model, if the treatment is continuous, the dose-response curve is not a pathwise differentiable parameter, and no root-n-consistent estimator is available. However, the risk of a candidate algorithm for estimation of the dose response curve is a pathwise differentiable parameter, whose consistent and efficient estimation is possible. In this work, we review the cross validated augmented inverse probability of treatment weighted estimator (CV A-IPTW) of the risk, and present a cross validated targeted minimum loss based estimator (CV-TMLE) counterpart. These estimators are proven consistent an efficient under certain consistency and regularity conditions on the initial estimators of the outcome and treatment mechanism. We also present a methodology that uses these estimated risks to select among a library of candidate algorithms. These selectors are proven optimal in the sense that they are asymptotically equivalent to the oracle selector under certain consistency conditions on the estimators of the treatment and outcome mechanisms. Because the CV-TMLE is a substitution estimator, it is more robust than the CV-AIPTW against empirical violations of the positivity assumption. This and other small sample size differences between the CV-TMLE and the CV-A-IPTW are explored in a simulation study

    Sensitivity Analysis for Causal Inference Under Unmeasured Confounding and Measurement Error Problems

    Get PDF
    In this paper we present a sensitivity analysis for drawing inferences about parameters that are not estimable from observed data without additional assumptions. We present the methodology using two different examples: a causal parameter that is not identifiable due to violations of the randomization assumption, and a parameter that is not estimable in the nonparametric model due to measurement error. Existing methods for tackling these problems assume a parametric model for the type of violation to the identifiability assumption, and require the development of new estimators and inference for every new model. The method we present can be used in conjunction with any existing asymptotically linear estimator of an observed data parameter that approximates the unidentifiable full data parameter, and does not require the study of additional models

    Assessing the Causal Effect of Policies: An Approach Based on Stochastic Interventions

    Get PDF
    Stochastic interventions are a powerful tool to define parameters that measure the causal effect of a realistic intervention that intends to alter the population distribution of an exposure. In this paper we follow the approach described in D\\u27iaz and van der Laan (2011) to define and estimate the effect of an intervention that is expected to cause a truncation in the population distribution of the exposure. The observed data parameter that identifies the causal parameter of interest is established, as well as its efficient influence function under the non parametric model. Inverse probability of treatment weighted (IPTW), augmented IPTW and targeted minimum loss based estimators (TMLE) are proposed, their consistency and efficiency properties are determined. An extension to longitudinal data structures is presented and its use is demonstrated with a real data example

    Microarray Generation of Thousand-Member Oligonucleotide Libraries

    Get PDF
    The ability to efficiently and economically generate libraries of defined pieces of DNA would have a myriad of applications, not least in the area of defined or directed sequencing and synthetic biology, but also in applications associated with encoding and tagging. In this manuscript DNA microarrays were used to allow the linear amplification of immobilized DNA sequences from the array followed by PCR amplification. Arrays of increasing sophistication (1, 10, 3,875, 10,000 defined sequences) were used to validate the process, with sequences verified by selective hybridization to a complementary DNA microarray and DNA sequencing, which demonstrated a PCR error rate of 9.7×10−3/site/duplication. This technique offers an economical and efficient way of producing specific DNA libraries of hundreds to thousands of members with the DNA-arrays being used as “factories” allowing specific DNA oligonucleotide pools to be generated. We also found substantial variance observed between the sequence frequencies found via Solexa sequencing and microarray analysis, highlighting the care needed in the interpretation of profiling data

    Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates

    Full text link
    Perception of offensiveness is inherently subjective, shaped by the lived experiences and socio-cultural values of the perceivers. Recent years have seen substantial efforts to build AI-based tools that can detect offensive language at scale, as a means to moderate social media platforms, and to ensure safety of conversational AI technologies such as ChatGPT and Bard. However, existing approaches treat this task as a technical endeavor, built on top of data annotated for offensiveness by a global crowd workforce without any attention to the crowd workers' provenance or the values their perceptions reflect. We argue that cultural and psychological factors play a vital role in the cognitive processing of offensiveness, which is critical to consider in this context. We re-frame the task of determining offensiveness as essentially a matter of moral judgment -- deciding the boundaries of ethically wrong vs. right language within an implied set of socio-cultural norms. Through a large-scale cross-cultural study based on 4309 participants from 21 countries across 8 cultural regions, we demonstrate substantial cross-cultural differences in perceptions of offensiveness. More importantly, we find that individual moral values play a crucial role in shaping these variations: moral concerns about Care and Purity are significant mediating factors driving cross-cultural differences. These insights are of crucial importance as we build AI models for the pluralistic world, where the values they espouse should aim to respect and account for moral values in diverse geo-cultural contexts

    Analysis of the multidimensionality of hallucination-like experiences in clinical and nonclinical Spanish samples and their relation to clinical symptoms: Implications for the model of continuity

    Get PDF
    Numerous studies have found that hallucinatory experiences occur in the general population. But to date, few studies have been conducted to compare clinical and nonclinical groups across a broad array of clinical symptoms that may co-occur with hallucinations. Likewise, hallucination-like experiences are measured as a multidimensional construct, with clinical and subclinical components related to vivid daydreams, intrusive thoughts, perceptual disturbance, and clinical hallucinatory experiences. Nevertheless, these individual subcomponents have not been examined across a broad spectrum of clinically disordered and nonclinical groups. The goal of the present study was to analyze the differences and similarities in the distribution of responses to hallucination-like experience in clinical and nonclinical populations and to determine the relation of these hallucination-like experiences with various clinical symptoms. These groups included patients with schizophrenia, non-psychotic clinically disordered patients, and a group of individuals with no psychiatric diagnoses. The results revealed that hallucination-like experiences are related to various clinical symptoms across diverse groups of individuals. Regression analysis found that the Psychoticism dimension of the Symptom Check List (SCL-90-R) was the most important predictor of hallucination-like experiences. Additionally, increased auditory and visual hallucination was the only subcomponent that differentiated schizophrenic patients from other groups. This distribution of responses in the dimensions of hallucination-like experiences suggests that not all the dimensions are characteristic of people hearing voices. Vivid daydreams, intrusive thoughts, and auditory distortions and visual perceptual distortions may represent a state of general vulnerability that does not denote a specific risk for clinical hallucinations. Overall, these results support the notion that hallucination-like experiences are closer to a quasi-continuum approach and that total scores on these scales explain a state of vulnerability to general perceptual disturbance

    Higher-order Targeted Minimum Loss-based Estimation

    Get PDF
    Common approaches to parametric statistical inference often encounter difficulties in the context of infinite-dimensional models. The framework of targeted maximum likelihood estimation (TMLE), introduced in van der Laan & Rubin (2006), is a principled approach for constructing asymptotically linear and efficient substitution estimators in rich infinite-dimensional models. The mechanics of TMLE hinge upon first-order approximations of the parameter of interest as a mapping on the space of probability distributions. For such approximations to hold, a second-order remainder term must tend to zero sufficiently fast. In practice, this means an initial estimator of the underlying data-generating distribution with a sufficiently large rate of convergence must be available -- in many cases, this requirement is prohibitively difficult to satisfy. In this article, we propose a generalization of TMLE utilizing a higher-order approximation of the target parameter. This approach yields asymptotically linear and efficient estimators when a higher-order remainder term is asymptotically negligible. The latter condition is often much less stringent than that arising in a regular first-order TMLE. Beyond relaxing regularity conditions, use of a higher-order TMLE can improve inference accuracy in finite samples due to its explicit reliance on a higher-order approximation. We provide the theoretical foundations of higher-order TMLE and study its use for estimating a counterfactual mean when all potential confounders have been measured. We show, in particular, that the implementation of a higher-order TMLE is nearly identical to that of a regular first-order TMLE. Since higher-order TMLE requires higher-order differentiability of the target parameter, a requirement that often fails to hold, we also discuss and study practicable approximation strategies that allow us to circumvent this failure in applications
    corecore