117 research outputs found
Simpson’s Paradox for the Cox Model
In the context of survival analysis, we define a covariate X as protective (detrimental) for the failure time T if the conditional distribution of [T | X = x] is stochastically increasing (decreasing) as a function of x. In the presence of another covariate Y, there exist situations where [T | X = x, Y = y] is stochastically decreasing in x for each fixed y, but [T | X = x] is stochastically increasing. When studying causal effects and influence of covariates on a failure time, this state of affairs appears paradoxical and raises the question of whether X should be considered protective or detrimental. In a biomedical framework, for instance when X is a treatment dose, such a question has obvious practical importance. Situations of this kind may be seen as a version of Simpson’s paradox. In this paper we study this phenomenon in terms of the well-known Cox model. More specifically, we analyze conditions on the parameters of the model and the type of dependence between X and Y required for the paradox to hold. Among other things, we show that the paradox may hold for residual failure times conditioned on T > t even when the covariates X and Y are independent. This is due to the fact that independent covariates may become dependent when conditioned on the failure time being larger than t.In the context of survival analysis, we define a covariate X as protective (detrimental) for the failure time T if the conditional distribution of [T | X = x] is stochastically increasing (decreasing) as a function of x. In the presence of another covariate Y, there exist situations where [T | X = x, Y = y] is stochastically decreasing in x for each fixed y, but [T | X = x] is stochastically increasing. When studying causal effects and influence of covariates on a failure time, this state of affairs appears paradoxical and raises the question of whether X should be considered protective or detrimental. In a biomedical framework, for instance when X is a treatment dose, such a question has obvious practical importance. Situations of this kind may be seen as a version of Simpson’s paradox. In this paper we study this phenomenon in terms of the well-known Cox model. More specifically, we analyze conditions on the parameters of the model and the type of dependence between X and Y required for the paradox to hold. Among other things, we show that the paradox may hold for residual failure times conditioned on T > t even when the covariates X and Y are independent. This is due to the fact that independent covariates may become dependent when conditioned on the failure time being larger than t.Non-Refereed Working Papers / of national relevance onl
Investigating Determinants of Multiple Sclerosis in Longitunal Studies: A Bayesian Approach
Modelling data from Multiple Sclerosis longitudinal studies is a challenging topic since the phenotype of interest is typically ordinal; time intervals between two consecutive measurements are nonconstant and they can vary among individuals. Due to these unobservable sources of heterogeneity statistical models for analysis of Multiple Sclerosis severity evolve as a difficult feature. A few proposals have been provided in the biostatistical literature (Heijtan (1991); Albert, (1994)) to address the issue of investigating Multiple Sclerosis course. In this paper Bayesian P-Splines (Brezger and Lang, (2006); Fahrmeir and Lang (2001)) are indicated as an appropriate tool since they account for nonlinear smooth effects of covariates on the change in Multiple Sclerosis disability. By means of Bayesian P-Spline model we investigate both the randomness affecting Multiple Sclerosis data as well as the ordinal nature of the response variable
Bayesian P-Splines to investigate the impact of covariates on Multiple Sclerosis clinical course
This paper aims at proposing suitable statistical tools to address heterogeneity in repeated measures, within a Multiple Sclerosis (MS) longitudinal study. Indeed, due to unobservable sources of heterogeneity, modelling the effect of covariates on MS severity evolves as a very difficult feature. Bayesian P-Splines are suggested for modelling non linear smooth effects of covariates within generalized additive models. Thus, based on a pooled MS data set, we show how extending bayesian P-splines (Lang and Brezger, 2001) to mixed effects models, represents an attractive statistical approach to investigate the role of prognostic factors in affecting individual change in disability
Le nuove sfide dell’epidemilogia in contesti politici e sociali
Abstract: Questo contributo è rivolto a fornire una panoramica ad ampio spettro su come l’evoluzione dell’informazioni biomediche abbia inevitabilmente portato ad un cambio di strumenti e modalità di analisi sia dal punto di vista dell’epidemiologia che della sanità pubblica. Per affrontare le sfide legate al cambiamento delle dinamiche di popolazione, ai nuovi grandi flussi migratori, al cambiamento delle dinamiche di invecchiamento, e all’evoluzione rapidissima della medicina nella direzione della «precision medicine» è necessario che la risposta socio-organizzativa del sistema sociale e sanitario si adegui tempestivamente ai mutamenti in corso e alle nuove esigenze. Nel l’articolo mostreremo come passando da una prospettiva individuale a quella di popolazione tipica dell’epidemiologia, cambino anche molti paradigmi legati ad una natura del dato, e ci si collochi in un modello tipo «global-system» in cui l’integrazione tra fonti diverse di informazione e interventi di prevenzione diventano imprescindibili
Multivariate determinants of self-management in Health Care: assessing Health Empowerment Model by comparison between structural equation and graphical models approaches
Backgroung. In public health one debated issue is related to consequences of improper self-management in health care. Some theoretical models have been proposed in Health Communication theory which highlight how components such general literacy and specific knowledge of the disease might be very important for effective actions in healthcare system.
Methods. This paper aims at investigating the consistency of Health Empowerment Model by means of both graphical models approach, which is a “data driven” method and a Structural Equation Modeling (SEM) approach, which is instead “theory driven”, showing the different information pattern that can be revealed in a health care research context.
The analyzed dataset provides data on the relationship between the Health Empowerment Model constructs and the behavioral and health status in 263 chronic low back pain (cLBP) patients. We used the graphical models approach to evaluate the dependence structure in a “blind” way, thus learning the structure from the data.
Results. From the estimation results dependence structure confirms links design assumed in SEM approach directly from researchers, thus validating the hypotheses which generated the Health Empowerment Model constructs.
Conclusions. This models comparison helps in avoiding confirmation bias. In Structural Equation Modeling, we used SPSS AMOS 21 software. Graphical modeling algorithms were implemented in a R software environment
Penalized inference of the hematopoietic cell differentiation network via high-dimensional clonal tracking
Abstract Background During their lifespan, stem- or progenitor cells have the ability to differentiate into more committed cell lineages. Understanding this process can be key in treating certain diseases. However, up until now only limited information about the cell differentiation process is known. Aim The goal of this paper is to present a statistical framework able to describe the cell differentiation process at the single clone level and to provide a corresponding inferential procedure for parameters estimation and structure reconstruction of the differentiation network. Approach We propose a multidimensional, continuous-time Markov model with density-dependent transition probabilities linear in sub-population sizes and rates. The inferential procedure is based on an iterative calculation of approximated solutions for two systems of ordinary differential equations, describing process moments evolution over time, that are analytically derived from the process' master equation. Network sparsity is induced by adding a SCAD-based penalization term in the generalized least squares objective function. Results The methods proposed here have been tested by means of a simulation study and then applied to a data set derived from a gene therapy clinical trial, in order to investigate hematopoiesis in humans, in-vivo. The hematopoietic structure estimated contradicts the classical dichotomy theory of cell differentiation and supports a novel myeloid-based model recently proposed in the literature
Can Bayesian Network empower propensity score estimation from Real World Data?
A new method, based on Bayesian Networks, to estimate propensity scores is
proposed with the purpose to draw causal inference from real world data on the
average treatment effect in case of a binary outcome and discrete covariates.
The proposed method ensures maximum likelihood properties to the estimated
propensity score, i.e. asymptotic efficiency, thus outperforming other
available approach. Two point estimators via inverse probability weighting are
then proposed, and their main distributional properties are derived for
constructing confidence interval and for testing the hypotheses of absence of
the treatment effect. Empirical evidence of the substantial improvements
offered by the proposed methodology versus standard logistic modelling of
propensity score is provided in simulation settings that mimic the
characteristics of a real dataset of prostate cancer patients from Milan San
Raffaele Hospital
531. Computational Pipeline for the Identification of Integration Sites and Novel Method for the Quantification of Clone Sizes in Clonal Tracking Studies
Gene-corrected cells in Gene Therapy (GT) treated patients can be tracked in vivo by means of vector integration site (IS) analysis, since each engineered clone becomes univocally and stably marked by an individual IS. As the proper IS identification and quantification is crucial to accurately perform clonal tracking studies, we designed a customizable and tailored pipeline to analyze LAM-PCR amplicons sequenced by Illumina MiSeq/HiSeq technology. The sequencing data are initially processed through a series of quality filters and cleaned from vector and Linker Cassette (LC) sequences with customizable settings. Demultiplexing is then performed according to the recognition of specific barcodes combination used upon library preparation and the sequences are aligned to the reference genome. Importantly, the human genome assembly Hg19 is composed of 93 contigs, among which the mitochondrial genome, unlocalized and unplaced contigs and some alternative haplotypes of chr6. While previous approaches aligned IS sequences only to the standard 24 human chromosomes, using the whole assembled genome allowed improving alignment accuracy and concomitantly increased the amount of detectable ISs. To date, we have processed 28 independent human sample sets retrieving 260,994 ISs from 189,270,566 sequencing reads. Although, sequencing read counts at each IS have been widely used to estimate the relative IS abundance, this method carries inherent accuracy constraints due to the rounds of exponential amplification required by LAM-PCR that might generate unbalances on the original clonal representation. More recently, a method based on genomic sonication has been proposed exploiting shear site counts to tag the number of original fragments belonging to each IS before PCR amplification. However, the number of cells composing a given clone could far exceed the number of fragments of different lengths that can be generated upon fragmentation in proximity of that given IS. This would rapidly saturate the available diversity of shear sites and progressively generate more and more same-site shearing on independent genomes. In order to overcome the described biases and reliably quantify ISs, we designed and tested a new LC encoding random barcodes. The new LC is composed of a known sequence of 29nt used as binding site for the primers upon amplification steps, a 6nt-random barcode, a fixed-anchor sequence of 6nt, a second 6nt-random barcode and a final known sequence of 22nt containing sticky ends for the three main restriction enzymes in use (MluI, HpyCH4IV and AciI). This peculiar design allowed increasing the accuracy of clonal diversity estimation since the fixed-anchor sequence acts as a control for sequencing reliability in the barcode area. The theoretical number of different available barcodes per clone (412=16,777,216) far exceeds the requirements for not saturating the original diversity of the analyzed sample (on average composed by around 50.000 cells). We validated this novel approach by performing assays on serial dilutions of individual clones carrying known ISs. The precision rate obtained was averagely around 99.3%, while the worst error rate reaches at most the 1.86%, confirming the reliability of IS quantification. We successfully applied the barcoded-LC system to the analysis of clinical samples from a Wiskott Aldrich Syndrome GT patient, collecting to date 50,215 barcoded ISs from 94,052,785 sequencing reads
- …