5,318 research outputs found

    Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines

    Get PDF
    DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationship. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effect of DNA on mRNA and choosing the appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Cross-Validated Adaptive Epsilon-Net Estimator

    Get PDF
    Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space typically results in ill-defined or too variable estimators of the parameter of interest (i.e., the risk minimizer for the true data generating distribution). In this article, we propose a cross-validated epsilon-net estimation methodology that covers a broad class of estimation problems, including multivariate outcome prediction and multivariate density estimation. An epsilon-net sieve of a subspace of the parameter space is defined as a collection of finite sets of points, the epsilon-nets indexed by epsilon, which approximate the subspace up till a resolution of epsilon. Given a collection of subspaces of the parameter space, one constructs an epsilon-net sieve for each of the subspaces. For each choice of subspace and each value of the resolution epsilon, one defines a candidate estimator as the minimizer of the empirical risk over the corresponding epsilon-net. The cross-validated epsilon-net estimator is then defined as the candidate estimator corresponding to the choice of subspace and epsilon-value minimizing the cross-validated empirical risk. We derive a finite sample inequality which proves that the proposed estimator achieves the adaptive optimal minimax rate of convergence, where the adaptivity is achieved by considering epsilon-net sieves for various subspaces. We also address the implementation of the cross-validated epsilon-net estimation procedure. In the context of a linear regression model, we present results of a preliminary simulation study comparing the cross-validated epsilon-net estimator to the cross-validated L^1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS). Finally, we discuss generalizations of the proposed estimation methodology to censored data structures

    Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data

    Full text link
    The PROmotion of Breastfeeding Intervention Trial (PROBIT) cluster-randomized a program encouraging breastfeeding to new mothers in hospital centers. The original studies indicated that this intervention successfully increased duration of breastfeeding and lowered rates of gastrointestinal tract infections in newborns. Additional scientific and popular interest lies in determining the causal effect of longer breastfeeding on gastrointestinal infection. In this study, we estimate the expected infection count under various lengths of breastfeeding in order to estimate the effect of breastfeeding duration on infection. Due to the presence of baseline and time-dependent confounding, specialized "causal" estimation methods are required. We demonstrate the double-robust method of Targeted Maximum Likelihood Estimation (TMLE) in the context of this application and review some related methods and the adjustments required to account for clustering. We compare TMLE (implemented both parametrically and using a data-adaptive algorithm) to other causal methods for this example. In addition, we conduct a simulation study to determine (1) the effectiveness of controlling for clustering indicators when cluster-specific confounders are unmeasured and (2) the importance of using data-adaptive TMLE.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS727 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Urocortin 3 marks mature human primary and embryonic stem cell-derived pancreatic alpha and beta cells.

    Get PDF
    The peptide hormone Urocortin 3 (Ucn 3) is abundantly and exclusively expressed in mouse pancreatic beta cells where it regulates insulin secretion. Here we demonstrate that Ucn 3 first appears at embryonic day (E) 17.5 and, from approximately postnatal day (p) 7 and onwards throughout adult life, becomes a unifying and exclusive feature of mouse beta cells. These observations identify Ucn 3 as a potential beta cell maturation marker. To determine whether Ucn 3 is similarly restricted to beta cells in humans, we conducted comprehensive immunohistochemistry and gene expression experiments on macaque and human pancreas and sorted primary human islet cells. This revealed that Ucn 3 is not restricted to the beta cell lineage in primates, but is also expressed in alpha cells. To substantiate these findings, we analyzed human embryonic stem cell (hESC)-derived pancreatic endoderm that differentiates into mature endocrine cells upon engraftment in mice. Ucn 3 expression in hESC-derived grafts increased robustly upon differentiation into mature endocrine cells and localized to both alpha and beta cells. Collectively, these observations confirm that Ucn 3 is expressed in adult beta cells in both mouse and human and appears late in beta cell differentiation. Expression of Pdx1, Nkx6.1 and PC1/3 in hESC-derived Ucn 3(+) beta cells supports this. However, the expression of Ucn 3 in primary and hESC-derived alpha cells demonstrates that human Ucn 3 is not exclusive to the beta cell lineage but is a general marker for both the alpha and beta cell lineages. Ucn 3(+) hESC-derived alpha cells do not express Nkx6.1, Pdx1 or PC1/3 in agreement with the presence of a separate population of Ucn 3(+) alpha cells. Our study highlights important species differences in Ucn 3 expression, which have implications for its utility as a marker to identify mature beta cells in (re)programming strategies

    Nonparametric efficient causal estimation of the intervention-specific expected number of recurrent events with continuous-time targeted maximum likelihood and highly adaptive lasso estimation

    Full text link
    Longitudinal settings involving outcome, competing risks and censoring events occurring and recurring in continuous time are common in medical research, but are often analyzed with methods that do not allow for taking post-baseline information into account. In this work, we define statistical and causal target parameters via the g-computation formula by carrying out interventions directly on the product integral representing the observed data distribution in a continuous-time counting process model framework. In recurrent events settings our target parameter identifies the expected number of recurrent events also in settings where the censoring mechanism or post-baseline treatment decisions depend on past information of post-baseline covariates such as the recurrent event process. We propose a flexible estimation procedure based on targeted maximum likelihood estimation coupled with highly adaptive lasso estimation to provide a novel approach for double robust and nonparametric inference for the considered target parameter. We illustrate the methods in a simulation study
    • …
    corecore