5,318 research outputs found
Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines
DNA copy number and mRNA expression are widely used data types in cancer
studies, which combined provide more insight than separately. Whereas in
existing literature the form of the relationship between these two types of
markers is fixed a priori, in this paper we model their association. We employ
piecewise linear regression splines (PLRS), which combine good interpretation
with sufficient flexibility to identify any plausible type of relationship. The
specification of the model leads to estimation and model selection in a
constrained, nonstandard setting. We provide methodology for testing the effect
of DNA on mRNA and choosing the appropriate model. Furthermore, we present a
novel approach to obtain reliable confidence bands for constrained PLRS, which
incorporates model uncertainty. The procedures are applied to colorectal and
breast cancer data. Common assumptions are found to be potentially misleading
for biologically relevant genes. More flexible models may bring more insight in
the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The Cross-Validated Adaptive Epsilon-Net Estimator
Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space typically results in ill-defined or too variable estimators of the parameter of interest (i.e., the risk minimizer for the true data generating distribution). In this article, we propose a cross-validated epsilon-net estimation methodology that covers a broad class of estimation problems, including multivariate outcome prediction and multivariate density estimation. An epsilon-net sieve of a subspace of the parameter space is defined as a collection of finite sets of points, the epsilon-nets indexed by epsilon, which approximate the subspace up till a resolution of epsilon. Given a collection of subspaces of the parameter space, one constructs an epsilon-net sieve for each of the subspaces. For each choice of subspace and each value of the resolution epsilon, one defines a candidate estimator as the minimizer of the empirical risk over the corresponding epsilon-net. The cross-validated epsilon-net estimator is then defined as the candidate estimator corresponding to the choice of subspace and epsilon-value minimizing the cross-validated empirical risk. We derive a finite sample inequality which proves that the proposed estimator achieves the adaptive optimal minimax rate of convergence, where the adaptivity is achieved by considering epsilon-net sieves for various subspaces. We also address the implementation of the cross-validated epsilon-net estimation procedure. In the context of a linear regression model, we present results of a preliminary simulation study comparing the cross-validated epsilon-net estimator to the cross-validated L^1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS). Finally, we discuss generalizations of the proposed estimation methodology to censored data structures
Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data
The PROmotion of Breastfeeding Intervention Trial (PROBIT) cluster-randomized
a program encouraging breastfeeding to new mothers in hospital centers. The
original studies indicated that this intervention successfully increased
duration of breastfeeding and lowered rates of gastrointestinal tract
infections in newborns. Additional scientific and popular interest lies in
determining the causal effect of longer breastfeeding on gastrointestinal
infection. In this study, we estimate the expected infection count under
various lengths of breastfeeding in order to estimate the effect of
breastfeeding duration on infection. Due to the presence of baseline and
time-dependent confounding, specialized "causal" estimation methods are
required. We demonstrate the double-robust method of Targeted Maximum
Likelihood Estimation (TMLE) in the context of this application and review some
related methods and the adjustments required to account for clustering. We
compare TMLE (implemented both parametrically and using a data-adaptive
algorithm) to other causal methods for this example. In addition, we conduct a
simulation study to determine (1) the effectiveness of controlling for
clustering indicators when cluster-specific confounders are unmeasured and (2)
the importance of using data-adaptive TMLE.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS727 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Urocortin 3 marks mature human primary and embryonic stem cell-derived pancreatic alpha and beta cells.
The peptide hormone Urocortin 3 (Ucn 3) is abundantly and exclusively expressed in mouse pancreatic beta cells where it regulates insulin secretion. Here we demonstrate that Ucn 3 first appears at embryonic day (E) 17.5 and, from approximately postnatal day (p) 7 and onwards throughout adult life, becomes a unifying and exclusive feature of mouse beta cells. These observations identify Ucn 3 as a potential beta cell maturation marker. To determine whether Ucn 3 is similarly restricted to beta cells in humans, we conducted comprehensive immunohistochemistry and gene expression experiments on macaque and human pancreas and sorted primary human islet cells. This revealed that Ucn 3 is not restricted to the beta cell lineage in primates, but is also expressed in alpha cells. To substantiate these findings, we analyzed human embryonic stem cell (hESC)-derived pancreatic endoderm that differentiates into mature endocrine cells upon engraftment in mice. Ucn 3 expression in hESC-derived grafts increased robustly upon differentiation into mature endocrine cells and localized to both alpha and beta cells. Collectively, these observations confirm that Ucn 3 is expressed in adult beta cells in both mouse and human and appears late in beta cell differentiation. Expression of Pdx1, Nkx6.1 and PC1/3 in hESC-derived Ucn 3(+) beta cells supports this. However, the expression of Ucn 3 in primary and hESC-derived alpha cells demonstrates that human Ucn 3 is not exclusive to the beta cell lineage but is a general marker for both the alpha and beta cell lineages. Ucn 3(+) hESC-derived alpha cells do not express Nkx6.1, Pdx1 or PC1/3 in agreement with the presence of a separate population of Ucn 3(+) alpha cells. Our study highlights important species differences in Ucn 3 expression, which have implications for its utility as a marker to identify mature beta cells in (re)programming strategies
Nonparametric efficient causal estimation of the intervention-specific expected number of recurrent events with continuous-time targeted maximum likelihood and highly adaptive lasso estimation
Longitudinal settings involving outcome, competing risks and censoring events
occurring and recurring in continuous time are common in medical research, but
are often analyzed with methods that do not allow for taking post-baseline
information into account. In this work, we define statistical and causal target
parameters via the g-computation formula by carrying out interventions directly
on the product integral representing the observed data distribution in a
continuous-time counting process model framework. In recurrent events settings
our target parameter identifies the expected number of recurrent events also in
settings where the censoring mechanism or post-baseline treatment decisions
depend on past information of post-baseline covariates such as the recurrent
event process. We propose a flexible estimation procedure based on targeted
maximum likelihood estimation coupled with highly adaptive lasso estimation to
provide a novel approach for double robust and nonparametric inference for the
considered target parameter. We illustrate the methods in a simulation study
- …