2,185 research outputs found

    The case for cloud computing in genome informatics

    Get PDF
    With DNA sequencing now getting cheaper more quickly than data storage, the time may have come to use cloud computing for genome informatics

    Inferring clonal evolution of tumors from single nucleotide somatic mutations

    Get PDF
    High-throughput sequencing allows the detection and quantification of frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionary history and population frequency of the subclonal lineages of tumor cells present in the sample can be reconstructed from these SNV frequency measurements. However, automated methods to do this reconstruction are not available and the conditions under which reconstruction is possible have not been described. We describe the conditions under which the evolutionary history can be uniquely reconstructed from SNV frequencies from single or multiple samples from the tumor population and we introduce a new statistical model, PhyloSub, that infers the phylogeny and genotype of the major subclonal lineages represented in the population of cancer cells. It uses a Bayesian nonparametric prior over trees that groups SNVs into major subclonal lineages and automatically estimates the number of lineages and their ancestry. We sample from the joint posterior distribution over trees to identify evolutionary histories and cell population frequencies that have the highest probability of generating the observed SNV frequency data. When multiple phylogenies are consistent with a given set of SNV frequencies, PhyloSub represents the uncertainty in the tumor phylogeny using a partial order plot. Experiments on a simulated dataset and two real datasets comprising tumor samples from acute myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that PhyloSub can infer both linear (or chain) and branching lineages and its inferences are in good agreement with ground truth, where it is available

    A human functional protein interaction network and its application to cancer data analysis

    Get PDF
    A high-quality human functional protein interaction network is constructed. Its utility is demonstrated in the identification of cancer candidate genes

    ISOWN: accurate somatic mutation identification in the absence of normal tissue controls.

    Get PDF
    BackgroundA key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenarios in which matched normal tissue is not available for comparison.ResultsIn this work, we describe an algorithm to distinguish somatic single nucleotide variants (SNVs) in next-generation sequencing data from germline polymorphisms in the absence of normal samples using a machine learning approach. Our algorithm was evaluated using a family of supervised learning classifications across six different cancer types and ~1600 samples, including cell lines, fresh frozen tissues, and formalin-fixed paraffin-embedded tissues; we tested our algorithm with both deep targeted and whole-exome sequencing data. Our algorithm correctly classified between 95 and 98% of somatic mutations with F1-measure ranges from 75.9 to 98.6% depending on the tumor type. We have released the algorithm as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues).ConclusionsIn this work, we describe the development, implementation, and validation of ISOWN, an accurate algorithm for predicting somatic mutations in cancer tissues in the absence of matching normal tissues. ISOWN is available as Open Source under Apache License 2.0 from https://github.com/ikalatskaya/ISOWN

    Vennter – An interactive analysis tool for WormBase interaction data using Venn diagrams

    Get PDF
    WormBase curates four different types of gene-to-gene interaction data: genetic, regulatory, physical, and predicted. These data are found in the Interactions widget in each gene page. Aside from the predicted interactions, the other three types are curated with direct experimental evidence from the literature. In WormBase, genetic interaction data is defined as a phenotypic deviation of double mutants (or any other genetic perturbations) from single mutant phenotypes and the control phenotype. Regulatory interactions are defined by how perturbation of one gene or gene product affects the expression of another gene or the localization of its gene product. Physical interactions represent molecular associations between genes and gene products from C. elegans (Grove et al. 2018). Each type of interaction data is essential to understanding certain aspects of the biological process mediated by the two interacting genes. However, integrating information from these three types of interaction data is critical to reading the biological context. Understanding the logical relations between different types of interaction data provides a vital clue on how to tackle a gene-to-gene interaction within this context. To achieve this goal, we introduce a new tool for analyzing these logical relations among the interaction data using a Venn diagram, named Vennter (Venn diagram for interaction). Venn diagrams are very useful for displaying similarities and distinctions between different data sets of interest, especially when they are area-proportional to the amount of information presented. Vennter consists of three circles, each representing one of the three different interaction data types. The size of any area within the interactive Venn diagram corresponds to the number of unique interactor genes pertaining to that region. Figure 1A shows an example of a Vennter diagram for selected genes (daf-2, daf-15, daf-16, and let-363) which play key roles in the Insulin/IGF-1 and TOR-dependent signaling pathway. These results exhibit distinct patterns of overlap among their interaction data. In some cases, the differences might reflect differences in prior scientific approaches to studying these genes as well as the availability of annotated interaction data in WormBase. Vennter is fully interactive for analyzing interactor genes from different sets of interaction data. By clicking on any single or multiple area in combination, one can easily obtain all gene names corresponding to the selected area (represented by hashed lines) in Vennter. The number of selected genes is shown next to the ‘View selected genes’ button (see arrows). From there, one can open a popup window called ‘Browse gene set’ which contains a comprehensive list of interactor genes shown in alphabetic order. This list can be copied or downloaded in diverse formats, and for the user’s convenience, each gene name is linked to its unique WormBase gene page. Under the same ‘Browse gene set’ window, Vennter also offers other functions for further analysis of selected genes, such as direct links to the batch analysis tools ‘SimpleMine’ and ‘Gene-set Enrichment Analysis’ (Figure 1B) which can help to query the gene list more conveniently and efficiently. To demonstrate yet another aspect of Vennter, we can look at the high-throughput physical protein-protein interaction data, which comprise long lists of genes. To date, WormBase has curated 86.9% of physical protein-protein interaction data from these high-throughput studies (Cho et al. 2018). However, identifying the functional relevance of such large amounts of interacting gene candidates can often be quite challenging to researchers. To improve this, Vennter provides more organized and relevant information to researchers by evaluating overlapping information between physical, genetic, and/or regulatory interactions, which enables researchers to more easily measure the confidence and biological relevance of their gene candidates of interest. For the C. elegans research community, Vennter can help compare and integrate the different types of interaction data available. It also provides hyperlinks to other useful WormBase tools in order to analyze the gene interactor list in a more efficient and automated manner

    Vennter – An interactive analysis tool for WormBase interaction data using Venn diagrams

    Get PDF
    WormBase curates four different types of gene-to-gene interaction data: genetic, regulatory, physical, and predicted. These data are found in the Interactions widget in each gene page. Aside from the predicted interactions, the other three types are curated with direct experimental evidence from the literature. In WormBase, genetic interaction data is defined as a phenotypic deviation of double mutants (or any other genetic perturbations) from single mutant phenotypes and the control phenotype. Regulatory interactions are defined by how perturbation of one gene or gene product affects the expression of another gene or the localization of its gene product. Physical interactions represent molecular associations between genes and gene products from C. elegans (Grove et al. 2018). Each type of interaction data is essential to understanding certain aspects of the biological process mediated by the two interacting genes. However, integrating information from these three types of interaction data is critical to reading the biological context. Understanding the logical relations between different types of interaction data provides a vital clue on how to tackle a gene-to-gene interaction within this context. To achieve this goal, we introduce a new tool for analyzing these logical relations among the interaction data using a Venn diagram, named Vennter (Venn diagram for interaction). Venn diagrams are very useful for displaying similarities and distinctions between different data sets of interest, especially when they are area-proportional to the amount of information presented. Vennter consists of three circles, each representing one of the three different interaction data types. The size of any area within the interactive Venn diagram corresponds to the number of unique interactor genes pertaining to that region. Figure 1A shows an example of a Vennter diagram for selected genes (daf-2, daf-15, daf-16, and let-363) which play key roles in the Insulin/IGF-1 and TOR-dependent signaling pathway. These results exhibit distinct patterns of overlap among their interaction data. In some cases, the differences might reflect differences in prior scientific approaches to studying these genes as well as the availability of annotated interaction data in WormBase. Vennter is fully interactive for analyzing interactor genes from different sets of interaction data. By clicking on any single or multiple area in combination, one can easily obtain all gene names corresponding to the selected area (represented by hashed lines) in Vennter. The number of selected genes is shown next to the ‘View selected genes’ button (see arrows). From there, one can open a popup window called ‘Browse gene set’ which contains a comprehensive list of interactor genes shown in alphabetic order. This list can be copied or downloaded in diverse formats, and for the user’s convenience, each gene name is linked to its unique WormBase gene page. Under the same ‘Browse gene set’ window, Vennter also offers other functions for further analysis of selected genes, such as direct links to the batch analysis tools ‘SimpleMine’ and ‘Gene-set Enrichment Analysis’ (Figure 1B) which can help to query the gene list more conveniently and efficiently. To demonstrate yet another aspect of Vennter, we can look at the high-throughput physical protein-protein interaction data, which comprise long lists of genes. To date, WormBase has curated 86.9% of physical protein-protein interaction data from these high-throughput studies (Cho et al. 2018). However, identifying the functional relevance of such large amounts of interacting gene candidates can often be quite challenging to researchers. To improve this, Vennter provides more organized and relevant information to researchers by evaluating overlapping information between physical, genetic, and/or regulatory interactions, which enables researchers to more easily measure the confidence and biological relevance of their gene candidates of interest. For the C. elegans research community, Vennter can help compare and integrate the different types of interaction data available. It also provides hyperlinks to other useful WormBase tools in order to analyze the gene interactor list in a more efficient and automated manner

    A cancer cell-line titration series for evaluating somatic classification.

    Get PDF
    BackgroundAccurate detection of somatic single nucleotide variants and small insertions and deletions from DNA sequencing experiments of tumour-normal pairs is a challenging task. Tumour samples are often contaminated with normal cells confounding the available evidence for the somatic variants. Furthermore, tumours are heterogeneous so sub-clonal variants are observed at reduced allele frequencies. We present here a cell-line titration series dataset that can be used to evaluate somatic variant calling pipelines with the goal of reliably calling true somatic mutations at low allele frequencies.ResultsCell-line DNA was mixed with matched normal DNA at 8 different ratios to generate samples with known tumour cellularities, and exome sequenced on Illumina HiSeq to depths of >300×. The data was processed with several different variant calling pipelines and verification experiments were performed to assay >1500 somatic variant candidates using Ion Torrent PGM as an orthogonal technology. By examining the variants called at varying cellularities and depths of coverage, we show that the best performing pipelines are able to maintain a high level of precision at any cellularity. In addition, we estimate the number of true somatic variants undetected as cellularity and coverage decrease.ConclusionsOur cell-line titration series dataset, along with the associated verification results, was effective for this evaluation and will serve as a valuable dataset for future somatic calling algorithm development. The data is available for further analysis at the European Genome-phenome Archive under accession number EGAS00001001016. Data access requires registration through the International Cancer Genome Consortium's Data Access Compliance Office (ICGC DACO)

    Cost-Effectiveness of Sponge-Based Surveillance with Genetic Testing For Early Diagnosis of Esophageal Adenocarcinoma

    Get PDF
    • …
    corecore