37 research outputs found

    Correlation of ePOSE score with three individual endophenotypes.

    No full text
    <p>Measured endophenotype versus predicted impact (<i>ePOSE Score</i>) for 20 <i>CFTR</i> variants using classifiers trained with (A) sweat chloride, (B) chloride conductance, or (C) fraction of correctly processed CFTR protein. Each plot is the result of 20 leave-one-out cross-validation calculations (i.e., one data point for each of the 20 variants). Blue circles, green squares, and red diamonds denote benign, indeterminate, and disease-causing annotated phenotype, respectively, for each of the 20 variants. Note: increasing sweat chloride is associated with increasing disease severity, whereas for the two in vivo assays, decreasing values correspond to decreasing protein function or processing.</p

    Six disease-associated genes with sources of variant-specific endophenotypic data.

    No full text
    <p>Six disease-associated genes with sources of variant-specific endophenotypic data.</p

    Hypothetical visualization of a multidimensional endophenotypic landscape for cystic fibrosis.

    No full text
    <p>Each cSNV can be represented as a point in a three-dimensional space of three endophenotypic scores relevant to cystic fibrosis disease: post-translational processing (glycosylation) and trafficking of the CFTR protein to the epithelial cell plasma membrane, an in vivo cellular assay of chloride conductance that measures channel gating, and chloride concentration in a diagnostic sweat test. Each point on the landscape can be interpreted with respect to disease severity, shown in the color bar to the right of the landscape.</p

    Five years of independent testing of cSNV variant classifiers.

    No full text
    <p>Five years of independent testing of cSNV variant classifiers.</p

    Interpolation plot of predicted endophenotypes resulting from the separate leave-one-out cross-validation calculations shown in Fig 4.

    No full text
    <p>ePOSE score for the 20 <i>CFTR</i> variants from <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004725#pcbi.1004725.g004" target="_blank">Fig 4</a> plotted and interpolated (color shows ePOSE scores resulting from training with sweat chloride data). Using the resulting classifiers, each endophenotype was predicted for three additional variants (G551S, A561E, and G1349D) and subsequently validated. A561E was accurately predicted to affect disease via drastically reduced CFTR processing and channel gating. G551S was accurately predicted to affect cystic fibrosis primarily via channel gating.</p

    Some advantages of considering endophenotypes, relative to phenotypes, illustrated using three <i>CFTR</i> variants.

    No full text
    <p>Mean sweat chloride from individuals harboring the three variants (S1235R, D614G, and G551D), and results from two distinct in vivo experiments performed in cells expressing the variants. Increasing sweat chloride is associated with increasing disease severity, whereas in the two in vivo assays decreasing values correspond to decreasing protein function or abundance. Endophenotypes were scaled for purposes of presenting on a single chart, such that three sweat chlorides could be compared with one another, the three chloride conductance measurements could be compared with one another, etc.</p

    Overview of SCHISM framework.

    No full text
    <p>The framework decouples estimation of somatic mutation cellularities and reconstruction of subclone phylogenies. Given somatic mutation read counts from next generation sequencing data and somatic copy number calls if available, any tools for mutation cellularity estimation and mutation clustering can be applied. Their output is used to estimate the statistical support for temporal ordering of mutation or mutation cluster pairs, using a generalized likelihood ratio test (GLRT). Other approaches to tree reconstruction can be applied, by using the fitness function as the objective for optimization. GA = genetic algorithm, WGS = whole genome sequencing, WES = whole exome sequencing, DS = (targeted) deep sequencing. KDE = kernel density estimation. POV = precedence order violation.</p

    Reconstruction of subclonal phylogenies in murine models of SCLC.

    No full text
    <p><b>A. Animal 3588.</b> SCHISM identified a single maximum fitness 8-node tree using one primary and two metastatic tumors. <b>B. Animal 3151.</b> Six maximum fitness 9-node trees were identified using one primary and two metastatic tumors. <b>C. Animal 984.</b> Six maximum fitness 7-node trees were identified using one primary and one metastatic tumor. Solid arrows represent lineage relationships shared by all six trees and dashed arrows represent lineage relationships shared by only a subset of the trees. Each arrow is labeled with the fraction of maximum fitness trees that include the lineage relationship. Highlighted arrows indicate the tree manually constructed by the study authors. GL = germline state. Cluster precedence order violation (CPOV) matrices are shown to the left of each tree. Columns and rows represent subclones (or Clones in the terminology of [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004416#pcbi.1004416.ref013" target="_blank">13</a>]). Each red square represents a pair of subclones (I,J) for which the null hypothesis that I could be the parent of J was rejected. Each blue square represents a pair for which the null hypothesis could not be rejected.</p

    SubClonal Hierarchy Inference from Somatic Mutations: Automatic Reconstruction of Cancer Evolutionary Trees from Multi-region Next Generation Sequencing

    No full text
    <div><p>Recent improvements in next-generation sequencing of tumor samples and the ability to identify somatic mutations at low allelic fractions have opened the way for new approaches to model the evolution of individual cancers. The power and utility of these models is increased when tumor samples from multiple sites are sequenced. Temporal ordering of the samples may provide insight into the etiology of both primary and metastatic lesions and rationalizations for tumor recurrence and therapeutic failures. Additional insights may be provided by temporal ordering of evolving subclones—cellular subpopulations with unique mutational profiles. Current methods for subclone hierarchy inference tightly couple the problem of temporal ordering with that of estimating the fraction of cancer cells harboring each mutation. We present a new framework that includes a rigorous statistical hypothesis test and a collection of tools that make it possible to decouple these problems, which we believe will enable substantial progress in the field of subclone hierarchy inference. The methods presented here can be flexibly combined with methods developed by others addressing either of these problems. We provide tools to interpret hypothesis test results, which inform phylogenetic tree construction, and we introduce the first genetic algorithm designed for this purpose. The utility of our framework is systematically demonstrated in simulations. For most tested combinations of tumor purity, sequencing coverage, and tree complexity, good power (≥ 0.8) can be achieved and Type 1 error is well controlled when at least three tumor samples are available from a patient. Using data from three published multi-region tumor sequencing studies of (murine) small cell lung cancer, acute myeloid leukemia, and chronic lymphocytic leukemia, in which the authors reconstructed subclonal phylogenetic trees by manual expert curation, we show how different configurations of our tools can identify either a single tree in agreement with the authors, or a small set of trees, which include the authors’ preferred tree. Our results have implications for improved modeling of tumor evolution and the importance of multi-region tumor sequencing.</p></div

    Power and Type 1 error of hypothesis test on simulated data.

    No full text
    <p>For each combination of coverage and purity, results are shown for trees with node counts from three to eight. Each curve was computed by taking the mean over all <i>instances</i> and all replicates for each node count. Curves with circular marks show power and curves with triangular marks show Type 1 error. Transparent coloring indicates ±1<i>SE</i>. Dotted line indicates power = 0.8.</p
    corecore