29 research outputs found

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    Conceptualizing Cancer Drugs as Classifiers

    No full text
    <div><p>Cancer and healthy cells have distinct distributions of molecular properties and thus respond differently to drugs. Cancer drugs ideally kill cancer cells while limiting harm to healthy cells. However, the inherent variance among cells in both cancer and healthy cell populations increases the difficulty of selective drug action. Here we formalize a classification framework based on the idea that an ideal cancer drug should maximally discriminate between cancer and healthy cells. More specifically, this discrimination should be performed on the basis of measurable cell markers. We divide the problem into three parts which we explore with examples. First, molecular markers should discriminate cancer cells from healthy cells at the single-cell level. Second, the effects of drugs should be statistically predicted by these molecular markers. Third, drugs should be optimized for classification performance. We find that expression levels of a handful of genes suffice to discriminate well between individual cells in cancer and healthy tissue. We also find that gene expression predicts the efficacy of some cancer drugs, suggesting that these cancer drugs act as suboptimal classifiers using gene profiles. Finally, we formulate a framework that defines an optimal drug, and predicts drug cocktails that may target cancer more accurately than the individual drugs alone. Conceptualizing cancer drugs as solving a discrimination problem in the high-dimensional space of molecular markers promises to inform the design of new cancer drugs and drug cocktails.</p></div

    Cancer treatment optimization.

    No full text
    <p>Better discrimination between cell populations is achieved by including an additional drug. The classification threshold line shown, in reality, represents a gradient related to “probability of cell death” which is indicated by shading. See text for full description.</p

    Discriminability of healthy versus cancer cells as a function of the number of genes considered.

    No full text
    <p>When measuring accuracy of cell classification as cancerous or healthy, one should consider both types of errors: false positives and false negatives (or more conventionally, true positives). This is illustrated by the Receiver Operating Characteristic (ROC) Curve. Lines indicate mean values, and error bars indicate bootstrapped 95% confidence intervals. Accuracy was measured using cross-validation; and chance value was determined using shuffle control.</p

    The idea of a classifier.

    No full text
    <p>This illustrates how one can combine information from two cellular markers to construct a classifier that separates the two populations (cancerous and healthy cells) better than either marker alone.</p

    Discriminability of healthy versus cancer cells as a function of the number of cells and genes measured.

    No full text
    <p>Classification performance was measured as area under the curve (AUC) of the ROC curve. A perfect classifier would achieve an AUC of 1, whereas a random classifier would achieve an AUC of.5. Each colored line represents a different number of cells used to train the classifier, showing that performance improves as more cells are used. Lines indicate mean values, and shaded areas indicate bootstrapped 95% confidence intervals. Accuracy was measured using cross-validation; and chance value was determined using shuffle control.</p
    corecore