Search CORE

1,037 research outputs found

Second-Order Inference for the Mean of a Variable Missing at Random

Author: Carone Marco
Díaz Iván
van der Laan Mark J.
Publication venue
Publication date: 26/05/2015
Field of study

We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE improved the coverage probability of a confidence interval by up to 85%. In addition, we present a first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In our simulations, the proposed first-order estimator improved the coverage probability by up to 90%. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator

arXiv.org e-Print Archive

Collection Of Biostatistics Research Archive

Gravitational Lensing and the Power Spectrum of Dark Matter Substructure: Insights from the ETHOS N-body Simulations

Author: Cyr-Racine Francis-Yan
Dvorkin Cora
Rivero Ana Díaz
Vogelsberger Mark
Zavala Jesús
Publication venue: 'American Physical Society (APS)'
Publication date: 31/08/2018
Field of study

Strong gravitational lensing has been identified as a promising astrophysical probe to study the particle nature of dark matter. In this paper we present a detailed study of the power spectrum of the projected mass density (convergence) field of substructure in a Milky Way-sized halo. This power spectrum has been suggested as a key observable that can be extracted from strongly lensed images and yield important clues about the matter distribution within the lens galaxy. We use two different

N

-body simulations from the ETHOS framework: one with cold dark matter and another with self-interacting dark matter and a cutoff in the initial power spectrum. Despite earlier works that identified

k \gtrsim 100

kpc

^{-1}

as the most promising scales to learn about the particle nature of dark matter we find that even at lower wavenumbers - which are actually within reach of observations in the near future - we can gain important information about dark matter. Comparing the amplitude and slope of the power spectrum on scales

0.1 \lesssim k/

kpc

^{-1} \lesssim 10

from lenses at different redshifts can help us distinguish between cold dark matter and other exotic dark matter scenarios that alter the abundance and central densities of subhalos. Furthermore, by considering the contribution of different mass bins to the power spectrum we find that subhalos in the mass range

10^7 - 10^8

_{\odot}

are on average the largest contributors to the power spectrum signal on scales

2 \lesssim k/

kpc

^{-1} \lesssim 15

, despite the numerous subhalos with masses

> 10^8

_{\odot}

in a typical lens galaxy. Finally, by comparing the power spectra obtained from the subhalo catalogs to those from the particle data in the simulation snapshots we find that the seemingly-too-simple halo model is in fact a fairly good approximation to the much more complex array of substructure in the lens.Comment: 13 pages + appendices, 7 figure

arXiv.org e-Print Archive

DSpace@MIT

Targeted Data Adaptive Estimation of the Causal Dose Response Curve

Author: Díaz Iván
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 24/01/2013
Field of study

Estimation of the causal dose-response curve is an old problem in statistics. In a non parametric model, if the treatment is continuous, the dose-response curve is not a pathwise differentiable parameter, and no root-n-consistent estimator is available. However, the risk of a candidate algorithm for estimation of the dose response curve is a pathwise differentiable parameter, whose consistent and efficient estimation is possible. In this work, we review the cross validated augmented inverse probability of treatment weighted estimator (CV A-IPTW) of the risk, and present a cross validated targeted minimum loss based estimator (CV-TMLE) counterpart. These estimators are proven consistent an efficient under certain consistency and regularity conditions on the initial estimators of the outcome and treatment mechanism. We also present a methodology that uses these estimated risks to select among a library of candidate algorithms. These selectors are proven optimal in the sense that they are asymptotically equivalent to the oracle selector under certain consistency conditions on the estimators of the treatment and outcome mechanisms. Because the CV-TMLE is a substitution estimator, it is more robust than the CV-AIPTW against empirical violations of the positivity assumption. This and other small sample size differences between the CV-TMLE and the CV-A-IPTW are explored in a simulation study

Directory of Open Access Journals

Collection Of Biostatistics Research Archive

Sensitivity Analysis for Causal Inference Under Unmeasured Confounding and Measurement Error Problems

Author: Díaz Iván
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 05/12/2012
Field of study

In this paper we present a sensitivity analysis for drawing inferences about parameters that are not estimable from observed data without additional assumptions. We present the methodology using two different examples: a causal parameter that is not identifiable due to violations of the randomization assumption, and a parameter that is not estimable in the nonparametric model due to measurement error. Existing methods for tackling these problems assume a parametric model for the type of violation to the identifiability assumption, and require the development of new estimators and inference for every new model. The method we present can be used in conjunction with any existing asymptotically linear estimator of an observed data parameter that approximates the unidentifiable full data parameter, and does not require the study of additional models

Collection Of Biostatistics Research Archive

Assessing the Causal Effect of Policies: An Approach Based on Stochastic Interventions

Author: Díaz Iván
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 08/10/2012
Field of study

Stochastic interventions are a powerful tool to define parameters that measure the causal effect of a realistic intervention that intends to alter the population distribution of an exposure. In this paper we follow the approach described in D\\u27iaz and van der Laan (2011) to define and estimate the effect of an intervention that is expected to cause a truncation in the population distribution of the exposure. The observed data parameter that identifies the causal parameter of interest is established, as well as its efficient influence function under the non parametric model. Inverse probability of treatment weighted (IPTW), augmented IPTW and targeted minimum loss based estimators (TMLE) are proposed, their consistency and efficiency properties are determined. An extension to longitudinal data structures is presented and its use is demonstrated with a real data example

Collection Of Biostatistics Research Archive

Microarray Generation of Thousand-Member Oligonucleotide Libraries

Author: Bradley Mark
Díaz-Mochón Juan José
Svensen Nina
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The ability to efficiently and economically generate libraries of defined pieces of DNA would have a myriad of applications, not least in the area of defined or directed sequencing and synthetic biology, but also in applications associated with encoding and tagging. In this manuscript DNA microarrays were used to allow the linear amplification of immobilized DNA sequences from the array followed by PCR amplification. Arrays of increasing sophistication (1, 10, 3,875, 10,000 defined sequences) were used to validate the process, with sequences verified by selective hybridization to a complementary DNA microarray and DNA sequencing, which demonstrated a PCR error rate of 9.7×10−3/site/duplication. This technique offers an economical and efficient way of producing specific DNA libraries of hundreds to thousands of members with the DNA-arrays being used as “factories” allowing specific DNA oligonucleotide pools to be generated. We also found substantial variance observed between the sequence frequencies found via Solexa sequencing and microarray analysis, highlighting the care needed in the interpretation of profiling data

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates

Author: Baker Dylan
Davani Aida
Díaz Mark
Prabhakaran Vinodkumar
Publication venue
Publication date: 11/12/2023
Field of study

Perception of offensiveness is inherently subjective, shaped by the lived experiences and socio-cultural values of the perceivers. Recent years have seen substantial efforts to build AI-based tools that can detect offensive language at scale, as a means to moderate social media platforms, and to ensure safety of conversational AI technologies such as ChatGPT and Bard. However, existing approaches treat this task as a technical endeavor, built on top of data annotated for offensiveness by a global crowd workforce without any attention to the crowd workers' provenance or the values their perceptions reflect. We argue that cultural and psychological factors play a vital role in the cognitive processing of offensiveness, which is critical to consider in this context. We re-frame the task of determining offensiveness as essentially a matter of moral judgment -- deciding the boundaries of ethically wrong vs. right language within an implied set of socio-cultural norms. Through a large-scale cross-cultural study based on 4309 participants from 21 countries across 8 cultural regions, we demonstrate substantial cross-cultural differences in perceptions of offensiveness. More importantly, we find that individual moral values play a crucial role in shaping these variations: moral concerns about Care and Purity are significant mediating factors driving cross-cultural differences. These insights are of crucial importance as we build AI models for the pluralistic world, where the values they espouse should aim to respect and account for moral values in diverse geo-cultural contexts

arXiv.org e-Print Archive

Analysis of the multidimensionality of hallucination-like experiences in clinical and nonclinical Spanish samples and their relation to clinical symptoms: Implications for the model of continuity

Author: Cangas Díaz Adolfo Javier
Langer Álvaro I.
Serper Mark
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2011
Field of study

Numerous studies have found that hallucinatory experiences occur in the general population. But to date, few studies have been conducted to compare clinical and nonclinical groups across a broad array of clinical symptoms that may co-occur with hallucinations. Likewise, hallucination-like experiences are measured as a multidimensional construct, with clinical and subclinical components related to vivid daydreams, intrusive thoughts, perceptual disturbance, and clinical hallucinatory experiences. Nevertheless, these individual subcomponents have not been examined across a broad spectrum of clinically disordered and nonclinical groups. The goal of the present study was to analyze the differences and similarities in the distribution of responses to hallucination-like experience in clinical and nonclinical populations and to determine the relation of these hallucination-like experiences with various clinical symptoms. These groups included patients with schizophrenia, non-psychotic clinically disordered patients, and a group of individuals with no psychiatric diagnoses. The results revealed that hallucination-like experiences are related to various clinical symptoms across diverse groups of individuals. Regression analysis found that the Psychoticism dimension of the Symptom Check List (SCL-90-R) was the most important predictor of hallucination-like experiences. Additionally, increased auditory and visual hallucination was the only subcomponent that differentiated schizophrenic patients from other groups. This distribution of responses in the dimensions of hallucination-like experiences suggests that not all the dimensions are characteristic of people hearing voices. Vivid daydreams, intrusive thoughts, and auditory distortions and visual perceptual distortions may represent a state of general vulnerability that does not denote a specific risk for clinical hallucinations. Overall, these results support the notion that hallucination-like experiences are closer to a quasi-continuum approach and that total scores on these scales explain a state of vulnerability to general perceptual disturbance

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositorio Institucional de la Universidad de Almería (Spain)

Higher-order Targeted Minimum Loss-based Estimation

Author: Carone Marco
Díaz Iván
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 11/12/2014
Field of study

Common approaches to parametric statistical inference often encounter difficulties in the context of infinite-dimensional models. The framework of targeted maximum likelihood estimation (TMLE), introduced in van der Laan & Rubin (2006), is a principled approach for constructing asymptotically linear and efficient substitution estimators in rich infinite-dimensional models. The mechanics of TMLE hinge upon first-order approximations of the parameter of interest as a mapping on the space of probability distributions. For such approximations to hold, a second-order remainder term must tend to zero sufficiently fast. In practice, this means an initial estimator of the underlying data-generating distribution with a sufficiently large rate of convergence must be available -- in many cases, this requirement is prohibitively difficult to satisfy. In this article, we propose a generalization of TMLE utilizing a higher-order approximation of the target parameter. This approach yields asymptotically linear and efficient estimators when a higher-order remainder term is asymptotically negligible. The latter condition is often much less stringent than that arising in a regular first-order TMLE. Beyond relaxing regularity conditions, use of a higher-order TMLE can improve inference accuracy in finite samples due to its explicit reliance on a higher-order approximation. We provide the theoretical foundations of higher-order TMLE and study its use for estimating a counterfactual mean when all potential confounders have been measured. We show, in particular, that the implementation of a higher-order TMLE is nearly identical to that of a regular first-order TMLE. Since higher-order TMLE requires higher-order differentiability of the target parameter, a requirement that often fails to hold, we also discuss and study practicable approximation strategies that allow us to circumvent this failure in applications

Collection Of Biostatistics Research Archive