Collection Of Biostatistics Research Archive
Not a member yet
    1589 research outputs found

    Using molecular diet analysis to inform invasive species management: A case study of introduced rats consuming endemic New Zealand frogs

    No full text
    The decline of amphibians has been of international concern for more than two decades, and the global spread of introduced fauna is a major factor in this decline. Conservation management decisions to implement control of introduced fauna are often based on diet studies. One of the most common metrics to report in diet studies is Frequency of Occurrence (FO), but this can be difficult to interpret, as it does not include a temporal perspective. Here, we examine the potential for FO data derived from molecular diet analysis to inform invasive species management, using invasive ship rats (Rattus rattus) and endemic frogs (Leiopelma spp.) in New Zealand as a case study. Only two endemic frog species persist on the mainland. One of these, Leiopelma archeyi, is Critically Endangered (IUCN 2017) and ranked as the world\u27s most evolutionarily distinct and globally endangered amphibian (EDGE, 2018). Ship rat stomach contents were collected by kill-trapping and subjected to three methods of diet analysis (one morphological and two DNA-based). A new primer pair was developed targeting all anuran species that exhibits good coverage, high taxonomic resolution, and reasonable specificity. Incorporating a temporal parameter allowed us to calculate the minimum number of ingestion events per rat per night, providing a more intuitive metric than the more commonly reported FO. We are not aware of other DNA-based diet studies that have incorporated a temporal parameter into FO data. The usefulness of such a metric will depend on the study system, in particular the feeding ecology of the predator. Ship rats are consuming both species of native frogs present on mainland New Zealand, and this study provides the first detections of remains of these species in mammalian stomach contents

    Supervised Dimension Reduction for Large-scale Omics Data with Censored Survival Outcomes Under Possible Non-proportional Hazards

    Get PDF
    The past two decades have witnessed significant advances in high-throughput ``omics technologies such as genomics, proteomics, metabolomics, transcriptomics and radiomics. These technologies have enabled simultaneous measurement of the expression levels of tens of thousands of features from individual patient samples and have generated enormous amounts of data that require analysis and interpretation. One specific area of interest has been in studying the relationship between these features and patient outcomes, such as overall and recurrence-free survival, with the goal of developing a predictive ``omics profile. Large-scale studies often suffer from the presence of a large fraction of censored observations and potential time-varying effects of features, and methods for handling them have been lacking. In this paper, we propose supervised methods for feature selection and survival prediction that simultaneously deal with both issues. Our approach utilizes continuum power regression (CPR) - a framework that includes a variety of regression methods - in conjunction with the parametric or semi-parametric accelerated failure time (AFT) model. Both CPR and AFT fall within the linear models framework and, unlike black-box models, the proposed prognostic index has a simple yet useful interpretation. We demonstrate the utility of our methods using simulated and publicly available cancer genomics data

    Statistical Inference for Networks of High-Dimensional Point Processes

    Get PDF
    Fueled in part by recent applications in neuroscience, high-dimensional Hawkes process have become a popular tool for modeling the network of interactions among multivariate point process data. While evaluating the uncertainty of the network estimates is critical in scientific applications, existing methodological and theoretical work have only focused on estimation. To bridge this gap, this paper proposes a high-dimensional statistical inference procedure with theoretical guarantees for multivariate Hawkes process. Key to this inference procedure is a new concentration inequality on the first- and second-order statistics for integrated stochastic processes, which summarizes the entire history of the process. We apply this concentration inequality, combining a recent result on martingale central limit theory, to give an upper bounds for the convergence rate of the test statistics. We verify our theoretical results with extensive simulation and an application to a neuron spike train data set

    Variance Estimation in Inverse Probability Weighted Cox Models

    Get PDF
    Inverse probability weighted Cox models can be used to estimate marginal hazard ratios under different treatments interventions in observational studies. To obtain variance estimates, the robust sandwich variance estimator is often recommended to account for the induced correlation among weighted observations. However, this estimator does not incorporate the uncertainty in estimating the weights and tends to overestimate the variance, leading to inefficient inference. Here we propose a new variance estimator that combines the estimation procedures for the hazard ratio and weights using stacked estimating equations, with additional adjustments for the sum of non-independent and identically distributed terms in a Cox partial likelihood score equation. We prove analytically that the robust sandwich variance estimator is conservative and establish the asymptotic equivalence between the proposed variance estimator and one obtained through linearization by Hajage et al., 2018. In addition, we extend our proposed variance estimator to accommodate clustered data. We compare the finite sample performance of the proposed method with alternative methods through simulation studies. We illustrate these different variance methods in an inverse probability weighted application to estimate the marginal hazard ratio for postoperative hospitalization under sleeve gastrectomy versus Roux-en-Y gastric bypass in a large medical claims and billing database. To facilitate implementation of the proposed method, we have developed an R package ipwCoxCSV

    Generalized interventional approach for causal mediation analysis with causally ordered multiple mediators

    Get PDF
    Causal mediation analysis has demonstrated the advantage of mechanism investigation. In conditions with causally ordered mediators, path-specific effects (PSEs) are introduced for specifying the effect subject to a certain combination of mediators. However, most PSEs are unidentifiable. To address this, an alternative approach termed interventional analogue of PSE (iPSE), is widely applied to effect decomposition. Previous studies that have considered multiple mediators have mainly focused on two-mediator cases due to the complexity of the mediation formula. This study proposes a generalized interventional approach for the settings, with the arbitrary number of ordered multiple mediators to study the causal parameter identification as well as statistical estimation. It provides a general definition of iPSEs with a recursive formula, assumptions for nonparametric identification, a regression-based method, and a g-computation algorithm to estimate all iPSEs. We demonstrate that each iPSE reduces to the result of linear structural equation modeling subject to linear or log-linear models. This approach is applied to a Taiwanese cohort study for exploring the mechanism by which hepatitis C virus infection affects mortality through hepatitis B virus infection, liver function, and hepatocellular carcinoma. Software based on a g-computation algorithm allows users to easily apply this method for data analysis subject to various model choices according to the substantive knowledge for each variable. All methods and software proposed in this study contribute to comprehensively decompose a causal effect confirmed by data science and help disentangling causal mechanisms when the natural pathways are complicated

    General approach of causal mediation analysis with causally ordered multiple mediators and survival outcome

    Get PDF
    Causal mediation analysis with multiple mediators (causal multi-mediation analysis) is critical in understanding why an intervention works, especially in medical research. Deriving the path-specific effects (PSEs) of exposure on the outcome through a certain set of mediators can detail the causal mechanism of interest. However, the existing models of causal multi-mediation analysis are usually restricted to partial decomposition, which can only evaluate the cumulative effect of several paths. Moreover, the general form of PSEs for an arbitrary number of mediators has not been proposed. In this study, we provide a generalized definition of PSE for partial decomposition (partPSE) and for complete decomposition, which are extended to the survival outcome. We apply the interventional analogues of PSE (iPSE) for complete decomposition to address the difficulty of non-identifiability. Based on Aalen’s additive hazards model and Cox’s proportional hazards model, we derive the generalized analytic forms and illustrate asymptotic property for both iPSEs and partPSEs for survival outcome. The simulation is conducted to evaluate the performance of estimation in several scenarios. We apply the new methodology to investigate the mechanism of methylation signals on mortality mediated through the expression of three nested genes among lung cancer patients

    Unified Methods for Feature Selection in Large-Scale Genomic Studies with Censored Survival Outcomes

    Get PDF
    One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease\u27s process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each feature. When applied to genomic features exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. We propose a broad array of marginal screening techniques that aid in feature ranking and selection by accommodating various forms of NPH. First, we develop an approach based on Kullback-Leibler information divergence and the Yang-Prentice model that includes methods for the PH and proportional odds (PO) models as special cases. Next, we propose R2 indices for the PH and PO models that can be interpreted in terms of explained variation. Lastly, we propose a generalized pseudo-R2 measure that includes PH, PO, crossing hazards and crossing odds models as special cases and can be interpreted as the percentage of separability between subjects experiencing the event and not experiencing the event according to feature expression. We evaluate the performance of our measures using extensive simulation studies and publicly available data sets in cancer genomics. We demonstrate that the proposed methods successfully address the issue of NPH in genomic feature selection and outperform existing methods. The proposed information divergence, R2 and pseudo-R2 measures were implemented in R (www.R-project.org) and code is available upon request

    Inferring a consensus problem list using penalized multistage models for ordered data

    Get PDF
    A patient\u27s medical problem list describes his or her current health status and aids in the coordination and transfer of care between providers, among other things. Because a problem list is generated once and then subsequently modified or updated, what is not usually observable is the provider-effect. That is, to what extent does a patient\u27s problem in the electronic medical record actually reflect a consensus communication of that patient\u27s current health status? To that end, we report on and analyze a unique interview-based design in which multiple medical providers independently generate problem lists for each of three patient case abstracts of varying clinical difficulty. Due to the uniqueness of both our data and the scientific objectives of our analysis, we apply and extend so-called multistage models for ordered lists and equip the models with variable selection penalties to induce sparsity. Each problem has a corresponding non-negative parameter estimate, interpreted as a relative log-odds ratio, with larger values suggesting greater importance and zero values suggesting unimportant problems. We use these fitted penalized models to quantify and report the extent of consensus. For the three case abstracts, the proportions of problems with model-estimated non-zero log-odds ratios were 10/28, 16/47, and 13/30. Physicians exhibited consensus on the highest ranked problems in the first and last case abstracts but agreement quickly deteriorates; in contrast, physicians broadly disagreed on the relevant problems for the middle and most difficult case abstract

    Generalized Matrix Decomposition Regression: Estimation and Inference for Two-way Structured Data

    Get PDF
    Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of interest. In this article, we first propose the GMD regression (GMDR) as an estimation/prediction tool that seamlessly incorporates two-way structures into high-dimensional linear models. The proposed GMDR directly regresses the outcome on a set of GMD components, selected by a novel procedure that guarantees the best prediction performance. We then propose the GMD inference (GMDI) framework to identify variables that are associated with the outcome for any model in a large family of regression models that includes GMDR. As opposed to most existing tools for high-dimensional inference, GMDI efficiently accounts for pre-specified two-way structures and can provide asymptotically valid inference even for non-sparse coefficient vectors. We study the theoretical properties of GMDI in terms of both the type-I error rate and power. We demonstrate the effectiveness of GMDR and GMDI on simulated data and an application to microbiome data

    A simulation study of diagnostics for bias in non-probability samples

    Get PDF
    A non-probability sampling mechanism is likely to bias estimates of parameters with respect to a target population of interest. This bias poses a unique challenge when selection is \u27non-ignorable\u27, i.e. dependent upon the unobserved outcome of interest, since it is then undetectable and thus cannot be ameliorated. We extend a simulation study by Nishimura et al. [International Statistical Review, 84, 43--62 (2016)], adding a recently published statistic, the so-called \u27standardized measure of unadjusted bias\u27, which explicitly quantifies the extent of bias under the assumption that a specified amount of non-ignorable selection exists. Our findings suggest that this new sensitivity diagnostic is considerably correlated with, and more predictive of, the true, unknown extent of selection bias than other diagnostics, even when the underlying assumed level of non-ignorability is incorrect

    1,373

    full texts

    1,589

    metadata records
    Updated in last 30 days.
    Collection Of Biostatistics Research Archive is based in United States
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇