1,094 research outputs found
Second-Order Inference for the Mean of a Variable Missing at Random
We present a second-order estimator of the mean of a variable subject to
missingness, under the missing at random assumption. The estimator improves
upon existing methods by using an approximate second-order expansion of the
parameter functional, in addition to the first-order expansion employed by
standard doubly robust methods. This results in weaker assumptions about the
convergence rates necessary to establish consistency, local efficiency, and
asymptotic linearity. The general estimation strategy is developed under the
targeted minimum loss-based estimation (TMLE) framework. We present a
simulation comparing the sensitivity of the first and second order estimators
to the convergence rate of the initial estimators of the outcome regression and
missingness score. In our simulation, the second-order TMLE improved the
coverage probability of a confidence interval by up to 85%. In addition, we
present a first-order estimator inspired by a second-order expansion of the
parameter functional. This estimator only requires one-dimensional smoothing,
whereas implementation of the second-order TMLE generally requires kernel
smoothing on the covariate space. The first-order estimator proposed is
expected to have improved finite sample performance compared to existing
first-order estimators. In our simulations, the proposed first-order estimator
improved the coverage probability by up to 90%. We provide an illustration of
our methods using a publicly available dataset to determine the effect of an
anticoagulant on health outcomes of patients undergoing percutaneous coronary
intervention. We provide R code implementing the proposed estimator
Gravitational Lensing and the Power Spectrum of Dark Matter Substructure: Insights from the ETHOS N-body Simulations
Strong gravitational lensing has been identified as a promising astrophysical
probe to study the particle nature of dark matter. In this paper we present a
detailed study of the power spectrum of the projected mass density
(convergence) field of substructure in a Milky Way-sized halo. This power
spectrum has been suggested as a key observable that can be extracted from
strongly lensed images and yield important clues about the matter distribution
within the lens galaxy. We use two different -body simulations from the
ETHOS framework: one with cold dark matter and another with self-interacting
dark matter and a cutoff in the initial power spectrum. Despite earlier works
that identified kpc as the most promising scales to
learn about the particle nature of dark matter we find that even at lower
wavenumbers - which are actually within reach of observations in the near
future - we can gain important information about dark matter. Comparing the
amplitude and slope of the power spectrum on scales kpc from lenses at different redshifts can help us distinguish between
cold dark matter and other exotic dark matter scenarios that alter the
abundance and central densities of subhalos. Furthermore, by considering the
contribution of different mass bins to the power spectrum we find that subhalos
in the mass range M are on average the largest
contributors to the power spectrum signal on scales kpc, despite the numerous subhalos with masses M in
a typical lens galaxy. Finally, by comparing the power spectra obtained from
the subhalo catalogs to those from the particle data in the simulation
snapshots we find that the seemingly-too-simple halo model is in fact a fairly
good approximation to the much more complex array of substructure in the lens.Comment: 13 pages + appendices, 7 figure
Targeted Data Adaptive Estimation of the Causal Dose Response Curve
Estimation of the causal dose-response curve is an old problem in statistics. In a non parametric model, if the treatment is continuous, the dose-response curve is not a pathwise differentiable parameter, and no root-n-consistent estimator is available. However, the risk of a candidate algorithm for estimation of the dose response curve is a pathwise differentiable parameter, whose consistent and efficient estimation is possible. In this work, we review the cross validated augmented inverse probability of treatment weighted estimator (CV A-IPTW) of the risk, and present a cross validated targeted minimum loss based estimator (CV-TMLE) counterpart. These estimators are proven consistent an efficient under certain consistency and regularity conditions on the initial estimators of the outcome and treatment mechanism. We also present a methodology that uses these estimated risks to select among a library of candidate algorithms. These selectors are proven optimal in the sense that they are asymptotically equivalent to the oracle selector under certain consistency conditions on the estimators of the treatment and outcome mechanisms. Because the CV-TMLE is a substitution estimator, it is more robust than the CV-AIPTW against empirical violations of the positivity assumption. This and other small sample size differences between the CV-TMLE and the CV-A-IPTW are explored in a simulation study
Sensitivity Analysis for Causal Inference Under Unmeasured Confounding and Measurement Error Problems
In this paper we present a sensitivity analysis for drawing inferences about parameters that are not estimable from observed data without additional assumptions. We present the methodology using two different examples: a causal parameter that is not identifiable due to violations of the randomization assumption, and a parameter that is not estimable in the nonparametric model due to measurement error. Existing methods for tackling these problems assume a parametric model for the type of violation to the identifiability assumption, and require the development of new estimators and inference for every new model. The method we present can be used in conjunction with any existing asymptotically linear estimator of an observed data parameter that approximates the unidentifiable full data parameter, and does not require the study of additional models
Assessing the Causal Effect of Policies: An Approach Based on Stochastic Interventions
Stochastic interventions are a powerful tool to define parameters that measure the causal effect of a realistic intervention that intends to alter the population distribution of an exposure. In this paper we follow the approach described in D\\u27iaz and van der Laan (2011) to define and estimate the effect of an intervention that is expected to cause a truncation in the population distribution of the exposure. The observed data parameter that identifies the causal parameter of interest is established, as well as its efficient influence function under the non parametric model. Inverse probability of treatment weighted (IPTW), augmented IPTW and targeted minimum loss based estimators (TMLE) are proposed, their consistency and efficiency properties are determined. An extension to longitudinal data structures is presented and its use is demonstrated with a real data example
Microarray Generation of Thousand-Member Oligonucleotide Libraries
The ability to efficiently and economically generate libraries of defined pieces of DNA would have a myriad of applications, not least in the area of defined or directed sequencing and synthetic biology, but also in applications associated with encoding and tagging. In this manuscript DNA microarrays were used to allow the linear amplification of immobilized DNA sequences from the array followed by PCR amplification. Arrays of increasing sophistication (1, 10, 3,875, 10,000 defined sequences) were used to validate the process, with sequences verified by selective hybridization to a complementary DNA microarray and DNA sequencing, which demonstrated a PCR error rate of 9.7×10−3/site/duplication. This technique offers an economical and efficient way of producing specific DNA libraries of hundreds to thousands of members with the DNA-arrays being used as “factories” allowing specific DNA oligonucleotide pools to be generated. We also found substantial variance observed between the sequence frequencies found via Solexa sequencing and microarray analysis, highlighting the care needed in the interpretation of profiling data
Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates
Perception of offensiveness is inherently subjective, shaped by the lived
experiences and socio-cultural values of the perceivers. Recent years have seen
substantial efforts to build AI-based tools that can detect offensive language
at scale, as a means to moderate social media platforms, and to ensure safety
of conversational AI technologies such as ChatGPT and Bard. However, existing
approaches treat this task as a technical endeavor, built on top of data
annotated for offensiveness by a global crowd workforce without any attention
to the crowd workers' provenance or the values their perceptions reflect. We
argue that cultural and psychological factors play a vital role in the
cognitive processing of offensiveness, which is critical to consider in this
context. We re-frame the task of determining offensiveness as essentially a
matter of moral judgment -- deciding the boundaries of ethically wrong vs.
right language within an implied set of socio-cultural norms. Through a
large-scale cross-cultural study based on 4309 participants from 21 countries
across 8 cultural regions, we demonstrate substantial cross-cultural
differences in perceptions of offensiveness. More importantly, we find that
individual moral values play a crucial role in shaping these variations: moral
concerns about Care and Purity are significant mediating factors driving
cross-cultural differences. These insights are of crucial importance as we
build AI models for the pluralistic world, where the values they espouse should
aim to respect and account for moral values in diverse geo-cultural contexts
Analysis of the multidimensionality of hallucination-like experiences in clinical and nonclinical Spanish samples and their relation to clinical symptoms: Implications for the model of continuity
Numerous studies have found that hallucinatory experiences occur in the general population. But to date, few studies have been conducted to compare clinical and nonclinical groups across a broad array of clinical symptoms that may co-occur with hallucinations. Likewise, hallucination-like experiences are measured as a multidimensional construct, with clinical and subclinical components related to vivid daydreams, intrusive thoughts, perceptual disturbance, and clinical hallucinatory experiences. Nevertheless, these individual subcomponents have not been examined across a broad spectrum of clinically disordered and nonclinical groups. The goal of the present study was to analyze the differences and similarities in the distribution of responses to hallucination-like experience in clinical and nonclinical populations and to determine the relation of these hallucination-like experiences with various clinical symptoms. These groups included patients with schizophrenia, non-psychotic clinically disordered patients, and a group of individuals with no psychiatric diagnoses. The results revealed that hallucination-like experiences are related to various clinical symptoms across diverse groups of individuals. Regression analysis found that the Psychoticism dimension of the Symptom Check List (SCL-90-R) was the most important predictor of hallucination-like experiences. Additionally, increased auditory and visual hallucination was the only subcomponent that differentiated schizophrenic patients from other groups. This distribution of responses in the dimensions of hallucination-like experiences suggests that not all the dimensions are characteristic of people hearing voices. Vivid daydreams, intrusive thoughts, and auditory distortions and visual perceptual distortions may represent a state of general vulnerability that does not denote a specific risk for clinical hallucinations. Overall, these results support the notion that hallucination-like experiences are closer to a quasi-continuum approach and that total scores on these scales explain a state of vulnerability to general perceptual disturbance
Higher-order Targeted Minimum Loss-based Estimation
Common approaches to parametric statistical inference often encounter difficulties in the context of infinite-dimensional models. The framework of targeted maximum likelihood estimation (TMLE), introduced in van der Laan & Rubin (2006), is a principled approach for constructing asymptotically linear and efficient substitution estimators in rich infinite-dimensional models. The mechanics of TMLE hinge upon first-order approximations of the parameter of interest as a mapping on the space of probability distributions. For such approximations to hold, a second-order remainder term must tend to zero sufficiently fast. In practice, this means an initial estimator of the underlying data-generating distribution with a sufficiently large rate of convergence must be available -- in many cases, this requirement is prohibitively difficult to satisfy. In this article, we propose a generalization of TMLE utilizing a higher-order approximation of the target parameter. This approach yields asymptotically linear and efficient estimators when a higher-order remainder term is asymptotically negligible. The latter condition is often much less stringent than that arising in a regular first-order TMLE. Beyond relaxing regularity conditions, use of a higher-order TMLE can improve inference accuracy in finite samples due to its explicit reliance on a higher-order approximation. We provide the theoretical foundations of higher-order TMLE and study its use for estimating a counterfactual mean when all potential confounders have been measured. We show, in particular, that the implementation of a higher-order TMLE is nearly identical to that of a regular first-order TMLE. Since higher-order TMLE requires higher-order differentiability of the target parameter, a requirement that often fails to hold, we also discuss and study practicable approximation strategies that allow us to circumvent this failure in applications
- …