1,569 research outputs found
Estimation of constant and time-varying dynamic parameters of HIV infection in a nonlinear differential equation model
Modeling viral dynamics in HIV/AIDS studies has resulted in a deep
understanding of pathogenesis of HIV infection from which novel antiviral
treatment guidance and strategies have been derived. Viral dynamics models
based on nonlinear differential equations have been proposed and well developed
over the past few decades. However, it is quite challenging to use experimental
or clinical data to estimate the unknown parameters (both constant and
time-varying parameters) in complex nonlinear differential equation models.
Therefore, investigators usually fix some parameter values, from the literature
or by experience, to obtain only parameter estimates of interest from clinical
or experimental data. However, when such prior information is not available, it
is desirable to determine all the parameter estimates from data. In this paper
we intend to combine the newly developed approaches, a multi-stage
smoothing-based (MSSB) method and the spline-enhanced nonlinear least squares
(SNLS) approach, to estimate all HIV viral dynamic parameters in a nonlinear
differential equation model. In particular, to the best of our knowledge, this
is the first attempt to propose a comparatively thorough procedure, accounting
for both efficiency and accuracy, to rigorously estimate all key kinetic
parameters in a nonlinear differential equation model of HIV dynamics from
clinical data. These parameters include the proliferation rate and death rate
of uninfected HIV-targeted cells, the average number of virions produced by an
infected cell, and the infection rate which is related to the antiviral
treatment effect and is time-varying. To validate the estimation methods, we
verified the identifiability of the HIV viral dynamic model and performed
simulation studies.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS290 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error
This article considers estimation of constant and time-varying coefficients
in nonlinear ordinary differential equation (ODE) models where analytic
closed-form solutions are not available. The numerical solution-based nonlinear
least squares (NLS) estimator is investigated in this study. A numerical
algorithm such as the Runge--Kutta method is used to approximate the ODE
solution. The asymptotic properties are established for the proposed estimators
considering both numerical error and measurement error. The B-spline is used to
approximate the time-varying coefficients, and the corresponding asymptotic
theories in this case are investigated under the framework of the sieve
approach. Our results show that if the maximum step size of the -order
numerical algorithm goes to zero at a rate faster than , the
numerical error is negligible compared to the measurement error. This result
provides a theoretical guidance in selection of the step size for numerical
evaluations of ODEs. Moreover, we have shown that the numerical solution-based
NLS estimator and the sieve NLS estimator are strongly consistent. The sieve
estimator of constant parameters is asymptotically normal with the same
asymptotic co-variance as that of the case where the true ODE solution is
exactly known, while the estimator of the time-varying parameter has the
optimal convergence rate under some regularity conditions. The theoretical
results are also developed for the case when the step size of the ODE numerical
solver does not go to zero fast enough or the numerical error is comparable to
the measurement error. We illustrate our approach with both simulation studies
and clinical data on HIV viral dynamics.Comment: Published in at http://dx.doi.org/10.1214/09-AOS784 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Primer on Causality in Data Science
Many questions in Data Science are fundamentally causal in that our objective
is to learn the effect of some exposure, randomized or not, on an outcome
interest. Even studies that are seemingly non-causal, such as those with the
goal of prediction or prevalence estimation, have causal elements, including
differential censoring or measurement. As a result, we, as Data Scientists,
need to consider the underlying causal mechanisms that gave rise to the data,
rather than simply the pattern or association observed in those data. In this
work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to
provide an introduction to some key concepts in causal inference. Similar to
other causal frameworks, the steps of the Roadmap include clearly stating the
scientific question, defining of the causal model, translating the scientific
question into a causal parameter, assessing the assumptions needed to express
the causal parameter as a statistical estimand, implementation of statistical
estimators including parametric and semi-parametric methods, and interpretation
of our findings. We believe that using such a framework in Data Science will
help to ensure that our statistical analyses are guided by the scientific
question driving our research, while avoiding over-interpreting our results. We
focus on the effect of an exposure occurring at a single time point and
highlight the use of targeted maximum likelihood estimation (TMLE) with Super
Learner.Comment: 26 pages (with references); 4 figure
Optimal Rate of Direct Estimators in Systems of Ordinary Differential Equations Linear in Functions of the Parameters
Many processes in biology, chemistry, physics, medicine, and engineering are
modeled by a system of differential equations. Such a system is usually
characterized via unknown parameters and estimating their 'true' value is thus
required. In this paper we focus on the quite common systems for which the
derivatives of the states may be written as sums of products of a function of
the states and a function of the parameters.
For such a system linear in functions of the unknown parameters we present a
necessary and sufficient condition for identifiability of the parameters. We
develop an estimation approach that bypasses the heavy computational burden of
numerical integration and avoids the estimation of system states derivatives,
drawbacks from which many classic estimation methods suffer. We also suggest an
experimental design for which smoothing can be circumvented. The optimal rate
of the proposed estimators, i.e., their -consistency, is proved and
simulation results illustrate their excellent finite sample performance and
compare it to other estimation approaches
A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure
We often seek to estimate the impact of an exposure naturally occurring or
randomly assigned at the cluster-level. For example, the literature on
neighborhood determinants of health continues to grow. Likewise, community
randomized trials are applied to learn about real-world implementation,
sustainability, and population effects of interventions with proven
individual-level efficacy. In these settings, individual-level outcomes are
correlated due to shared cluster-level factors, including the exposure, as well
as social or biological interactions between individuals. To flexibly and
efficiently estimate the effect of a cluster-level exposure, we present two
targeted maximum likelihood estimators (TMLEs). The first TMLE is developed
under a non-parametric causal model, which allows for arbitrary interactions
between individuals within a cluster. These interactions include direct
transmission of the outcome (i.e. contagion) and influence of one individual's
covariates on another's outcome (i.e. covariate interference). The second TMLE
is developed under a causal sub-model assuming the cluster-level and
individual-specific covariates are sufficient to control for confounding.
Simulations compare the alternative estimators and illustrate the potential
gains from pairing individual-level risk factors and outcomes during
estimation, while avoiding unwarranted assumptions. Our results suggest that
estimation under the sub-model can result in bias and misleading inference in
an observational setting. Incorporating working assumptions during estimation
is more robust than assuming they hold in the underlying causal model. We
illustrate our approach with an application to HIV prevention and treatment
Efficient Principally Stratified Treatment Effect Estimation in Crossover Studies with Absorbent Binary Endpoints
Suppose one wishes to estimate the effect of a binary treatment on a binary
endpoint conditional on a post-randomization quantity in a counterfactual world
in which all subjects received treatment. It is generally difficult to identify
this parameter without strong, untestable assumptions. It has been shown that
identifiability assumptions become much weaker under a crossover design in
which subjects not receiving treatment are later given treatment. Under the
assumption that the post-treatment biomarker observed in these crossover
subjects is the same as would have been observed had they received treatment at
the start of the study, one can identify the treatment effect with only mild
additional assumptions. This remains true if the endpoint is absorbent, i.e. an
endpoint such as death or HIV infection such that the post-crossover treatment
biomarker is not meaningful if the endpoint has already occurred. In this work,
we review identifiability results for a parameter of the distribution of the
data observed under a crossover design with the principally stratified
treatment effect of interest. We describe situations in which these assumptions
would be falsifiable, and show that these assumptions are not otherwise
falsifiable. We then provide a targeted minimum loss-based estimator for the
setting that makes no assumptions on the distribution that generated the data.
When the semiparametric efficiency bound is well defined, for which the primary
condition is that the biomarker is discrete-valued, this estimator is efficient
among all regular and asymptotically linear estimators. We also present a
version of this estimator for situations in which the biomarker is continuous.
Implications to closeout designs for vaccine trials are discussed
A Sensitivity Matrix Methodology for Inverse Problem Formulation
We propose an algorithm to select parameter subset combinations that can be estimated using an ordinary least-squares (OLS) inverse problem formulation with a given data set. First, the algorithm selects the parameter combinations that correspond to sensitivity matrices with full rank. Second, the algorithm involves uncertainty quantification by using the inverse of the Fisher Information Matrix. Nominal values of parameters are used to construct synthetic data sets, and explore the effects of removing certain parameters from those to be estimated using OLS procedures. We quantify these effects in a score for a vector parameter defined using the norm of the vector of standard errors for components of estimates divided by the estimates. In some cases the method leads to reduction of the standard error for a parameter to less than 1% of the estimate
A generalized linear mixed model for longitudinal binary data with a marginal logit link function
Longitudinal studies of a binary outcome are common in the health, social,
and behavioral sciences. In general, a feature of random effects logistic
regression models for longitudinal binary data is that the marginal functional
form, when integrated over the distribution of the random effects, is no longer
of logistic form. Recently, Wang and Louis [Biometrika 90 (2003) 765--775]
proposed a random intercept model in the clustered binary data setting where
the marginal model has a logistic form. An acknowledged limitation of their
model is that it allows only a single random effect that varies from cluster to
cluster. In this paper we propose a modification of their model to handle
longitudinal data, allowing separate, but correlated, random intercepts at each
measurement occasion. The proposed model allows for a flexible correlation
structure among the random intercepts, where the correlations can be
interpreted in terms of Kendall's . For example, the marginal
correlations among the repeated binary outcomes can decline with increasing
time separation, while the model retains the property of having matching
conditional and marginal logit link functions. Finally, the proposed method is
used to analyze data from a longitudinal study designed to monitor cardiac
abnormalities in children born to HIV-infected women.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS390 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …