Search CORE

303 research outputs found

Using somatic mutation data to test tumors for clonal relatedness

Author: Begg Colin B.
Ostrovnaya Irina
Seshan Venkatraman E.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/09/2015
Field of study

A major challenge for cancer pathologists is to determine whether a new tumor in a patient with cancer is a metastasis or an independent occurrence of the disease. In recent years numerous studies have evaluated pairs of tumor specimens to examine the similarity of the somatic characteristics of the tumors and to test for clonal relatedness. As the landscape of mutation testing has evolved, a number of statistical methods for determining clonality have developed, notably for comparing losses of heterozygosity at candidate markers, and for comparing copy number profiles. Increasingly tumors are being evaluated for point mutations in panels of candidate genes using gene sequencing technologies. Comparison of the mutational profiles of pairs of tumors presents unusual methodological challenges: mutations at some loci are much more common than others; knowledge of the marginal mutation probabilities is scanty for most loci at which mutations might occur; the sample space of potential mutational profiles is vast. We examine this problem and propose a test for clonal relatedness of a pair of tumors from a single patient. Using simulations, its properties are shown to be promising. The method is illustrated using several examples from the literature.Comment: Published at http://dx.doi.org/10.1214/15-AOAS836 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

PubMed Central

A Hybrid Bayesian Laplacian Approach for Generalized Linear Mixed Models

Author: Begg Colin B
Capanu Marinela
Gonen Mithat
Publication venue: Collection of Biostatistics Research Archive
Publication date: 16/09/2011
Field of study

The analytical intractability of generalized linear mixed models (GLMMs) has generated a lot of research in the past two decades. Applied statisticians routinely face the frustrating prospect of widely disparate results produced by the methods that are currently implemented in commercially available software. This article is motivated by this frustration and develops guidance as well as new methods that are computationally efficient and statistically reliable. Two main classes of approximations have been developed: likelihood-based methods and Bayesian methods. Likelihood-based methods such as the penalized quasi-likelihood approach of Breslow and Clayton (1993) have been shown to produce biased estimates especially for binary clustered data with small clusters sizes. More recent methods such as the adaptive Gaussian quadrature approach perform well but can be overwhelmed by problems with large numbers of random effects, and efficient algorithms to better handle these situations have not yet been integrated in standard statistical packages. Similarly, Bayesian methods, though they have good frequentist properties when the model is correct, are known to be computationally intensive and also require specialized code, limiting their use in practice. In this article we build on our previous method (Capanu and Begg 2010) and propose a hybrid approach that provides a bridge between the likelihood-based and Bayesian approaches by employing Bayesian estimation for the variance compo- nents followed by Laplacian estimation for the regression coefficients with the goal of obtaining good statistical properties, with relatively good computing speed, and using widely available software. The hybrid approach is shown to perform well against the other competitors considered. Another impor- tant finding of this research is the surprisingly good performance of the Laplacian approximation in the difficult case of binary clustered data with small clusters sizes. We apply the methods to a real study of head and neck squamous cell carcinoma and illustrate their properties using simulations based on a widely-analyzed salamander mating dataset and on another important dataset involving the Guatemalan Child Health survey

Collection Of Biostatistics Research Archive

Comparing ROC Curves Derived From Regression Models

Author: Begg Colin B
Gonen Mithat
Seshan Venkatraman E
Publication venue: Collection of Biostatistics Research Archive
Publication date: 08/06/2011
Field of study

In constructing predictive models, investigators frequently assess the incremental value of a predictive marker by comparing the ROC curve generated from the predictive model including the new marker with the ROC curve from the model excluding the new marker. Many commentators have noticed empirically that a test of the two ROC areas often produces a non-significant result when a corresponding Wald test from the underlying regression model is significant. A recent article showed using simulations that the widely-used ROC area test [1] produces exceptionally conservative test size and extremely low power [2]. In this article we show why the ROC area test is invalid in this context. We demonstrate how a valid test of the ROC areas can be constructed that has comparable statistical properties to the Wald test. We conclude that using the Wald test to assess the incremental contribution of a marker remains the best strategy. We also examine the use of derived markers from non-nested models and the use of validation samples. We show that comparing ROC areas is invalid in these contexts as well

Crossref

Collection Of Biostatistics Research Archive

Estimating the Empirical Lorenz Curve and Gini Coefficient in the Presence of Error

Author: Begg Colin B.
Moskowitz Chaya S
Riedel Elyn
Venkatraman E. S.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 15/01/2007
Field of study

The Lorenz curve is a graphical tool that is widely used to characterize the concentration of a measure in a population, such as wealth. It is frequently the case that the measure of interest used to rank experimental units when estimating the empirical Lorenz curve, and the corresponding Gini coefficient, is subject to random error. This error can result in an incorrect ranking of experimental units which inevitably leads to a curve that exaggerates the degree of concentration (variation) in the population. We explore this bias and discuss several widely available statistical methods that have the potential to reduce or remove the bias in the empirical Lorenz curve. The properties of these methods are examined and compared in a simulation study. This work is motivated by a health outcomes application which seeks to assess the concentration of black patient visits among primary care physicians. The methods are illustrated on data from this study

Collection Of Biostatistics Research Archive

Recommended from our members

The first international workshop on the role and impact of mathematics in medicine: a collective account

Author: Artzrouni Marc
Begg Colin B
Chabiniok Radomir
Clairambault Jean
Foss Alex
Hargrove John
Lee Eva K
Siggers Jennifer S
Tindall Marcus
Publication venue: e-century Publishing Corporation
Publication date: 01/01/2011
Field of study

The First International Workshop on The Role and Impact of Mathematics in Medicine (RIMM) convened in Paris in June 2010. A broad range of researchers discussed the difficulties, challenges and opportunities faced by those wishing to see mathematical methods contribute to improved medical outcomes. Finding mechanisms for inter- disciplinary meetings, developing a common language, staying focused on the medical problem at hand, deriving realistic mathematical solutions, obtainin

Central Archive at the University of Reading

PubMed Central

Statistical Evaluation of Evidence for Clonal Allelic Alterations in array-CGH Experiments

Author: Begg Colin B
Eng Kevin
Olshen Adam
Venkatraman E S
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/03/2007
Field of study

In recent years numerous investigators have conducted genetic studies of pairs of tumor specimens from the same patient to determine whether the tumors share a clonal origin. These studies have the potential to be of considerable clinical significance, especially in clinical settings where the distinction of a new primary cancer and metastatic spread of a previous cancer would lead to radically different indications for treatment. Studies of clonality have typically involved comparison of the patterns of somatic mutations in the tumors at candidate genetic loci to see if the patterns are sufficiently similar to indicate a clonal origin. More recently, some investigators have explored the use of array CGH for this purpose. Standard clustering approaches have been used to analyze the data, but these existing statistical methods are not suited to this problem due to the paired nature of the data, and the fact that there exists no “gold standard” diagnosis to provide a definitive determination of which pairs are clonal and which pairs are of independent origin. In this article we propose a new statistical method that focuses on the individual allelic gains or losses that have been identified in both tumors, and a statistical test is developed that assesses the degree of matching of the locations of the markers that indicate the endpoints of the allelic change. The validity and statistical power of the test is evaluated, and it is shown to be a promising approach for establishing clonality in tumor samples

Collection Of Biostatistics Research Archive

A Metastasis or a Second Independent Cancer? Evaluating the Clonal Origin of Tumors Using Array-CGH Data

Author: Albertson D G
Begg Colin B
Olshen Adam
Orlow Irene
Ostrovnaya Irina
Seshan Venkatraman E
Publication venue: Collection of Biostatistics Research Archive
Publication date: 12/08/2008
Field of study

When a cancer patient develops a new tumor it is necessary to determine if this is a recurrence (metastasis) of the original cancer, or an entirely new occurrence of the disease. This is accomplished by assessing the histo-pathology of the lesions, and it is frequently relatively straightforward. However, there are many clinical scenarios in which this pathological diagnosis is difficult. Since each tumor is characterized by a genetic fingerprint of somatic mutations, a more definitive diagnosis is possible in principle in these difficult clinical scenarios by comparing the fingerprints. In this article we develop and evaluate a statistical strategy for this comparison when the data are derived from array comparative genomic hybridization, a technique designed to identify all of the somatic allelic gains and losses across the genome. Our method involves several stages. First a segmentation algorithm is used to estimate the regions of allelic gain and loss. Then the broad correlation in these patterns between the two tumors is assessed, leading to an initial likelihood ratio for the two diagnoses. This is then further refined by comparing in detail each plausibly clonal mutation within individual chromosome arms, and the results are aggregated to determine a final likelihood ratio. The method is employed to diagnose patients from several clinical scenarios, and the results show that in many cases a strong clonal signal emerges, occasionally contradicting the clinical diagnosis. The “quality” of the arrays can be summarized by a parameter that characterizes the clarity with which allelic changes are detected. Sensitivity analyses show that most of the diagnoses are robust when the data are of high quality

Collection Of Biostatistics Research Archive

Optimized Variable Selection Via Repeated Data Splitting

Author: Begg Colin B
Capanu Marinela
Gonen Mithat
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/01/2017
Field of study

We introduce a new variable selection procedure that repeatedly splits the data into two sets, one for estimation and one for validation, to obtain an empirically optimized threshold which is then used to screen for variables to include in the final model. Simulation results show that the proposed variable selection technique enjoys superior performance compared to candidate methods, being amongst those with the lowest inclusion of noisy predictors while having the highest power to detect the correct model and being unaffected by correlations among the predictors. We illustrate the methods by applying them to a cohort of patients undergoing hepatectomy at our institution

Collection Of Biostatistics Research Archive

Genomic investigation of etiologic heterogeneity: methodologic challenges

Author: Arora Arshi
Begg Colin B
Choueiri Toni K
Furberg Helena
Hakimi A Ari
Hsieh James J
Karam Jose A
Maranchie Jodi K
Nielsen Matthew E
Rathmell W Kimryn
Seshan Venkatraman E
Shen Ronglai
Signoretti Sabina
Tamboli Pheroze
Zabor Emily C
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: The etiologic heterogeneity of cancer has traditionally been investigated by comparing risk factor frequencies within candidate sub-types, defined for example by histology or by distinct tumor markers of interest. Increasingly tumors are being profiled for molecular features much more extensively. This greatly expands the opportunities for defining distinct sub-types. In this article we describe an exploratory analysis of the etiologic heterogeneity of clear cell kidney cancer. Data are available on the primary known risk factors for kidney cancer, while the tumors are characterized on a genome-wide basis using expression, methylation, copy number and mutational profiles. Methods: We use a novel clustering strategy to identify sub-types. This is accomplished independently for the expression, methylation and copy number profiles. The goals are to identify tumor sub-types that are etiologically distinct, to identify the risk factors that define specific sub-types, and to endeavor to characterize the key genes that appear to represent the principal features of the distinct sub-types. Results: The analysis reveals strong evidence that gender represents an important factor that distinguishes disease sub-types. The sub-types defined using expression data and methylation data demonstrate considerable congruence and are also clearly correlated with mutations in important cancer genes. These sub-types are also strongly correlated with survival. The complexity of the data presents many analytical challenges including, prominently, the risk of false discovery. Conclusions: Genomic profiling of tumors offers the opportunity to identify etiologically distinct sub-types, paving the way for a more refined understanding of cancer etiology. Electronic supplementary material The online version of this article (doi:10.1186/1471-2288-14-138) contains supplementary material, which is available to authorized users

Crossref

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

Carolina Digital Repository

D-Scholarship@Pitt