87 research outputs found

    ISOWN: accurate somatic mutation identification in the absence of normal tissue controls.

    Get PDF
    BackgroundA key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenarios in which matched normal tissue is not available for comparison.ResultsIn this work, we describe an algorithm to distinguish somatic single nucleotide variants (SNVs) in next-generation sequencing data from germline polymorphisms in the absence of normal samples using a machine learning approach. Our algorithm was evaluated using a family of supervised learning classifications across six different cancer types and ~1600 samples, including cell lines, fresh frozen tissues, and formalin-fixed paraffin-embedded tissues; we tested our algorithm with both deep targeted and whole-exome sequencing data. Our algorithm correctly classified between 95 and 98% of somatic mutations with F1-measure ranges from 75.9 to 98.6% depending on the tumor type. We have released the algorithm as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues).ConclusionsIn this work, we describe the development, implementation, and validation of ISOWN, an accurate algorithm for predicting somatic mutations in cancer tissues in the absence of matching normal tissues. ISOWN is available as Open Source under Apache License 2.0 from https://github.com/ikalatskaya/ISOWN

    Exome sequencing identifies nonsegregating nonsense ATM and PALB2 variants in familial pancreatic cancer.

    Get PDF
    We sequenced 11 germline exomes from five families with familial pancreatic cancer (FPC). One proband had a germline nonsense variant in ATM with somatic loss of the variant allele. Another proband had a nonsense variant in PALB2 with somatic loss of the variant allele. Both variants were absent in a relative with FPC. These findings question the causal mechanisms of ATM and PALB2 in these families and highlight challenges in identifying the causes of familial cancer syndromes using exome sequencing

    A Dynamic Ontology Mapping Architecture for a Grid Database System

    Full text link
    Abstract — Most large-scale heterogeneous distributed computing systems, such as Grids, rely on Service Oriented Architectures (SOA) to interact with others in different platforms and computing languages. However, we still need to solve the semantic heterogeneity problem of data; we must interpret the data from different systems in some semantically related ways. Ontologies are the most common and well-accepted methodology to handle this problem at multiple levels of granularities across different systems. Nevertheless, using ontologies in a dynamic environment, such as a Grid, to share some common concepts is still a challenge. It is difficult to keep a static mapping between ontologies; the corresponding semantic mapping changes must occur consistently. Therefore, we adopt the concept of Tuple Space and propose a flexible approach for managing ontologies in a Grid. It enables systems and users to interoperate semantically and dynamically by sharing and managing the concepts and semantic ontology mappings in a flexible approach. I

    Spatiotemporal integration of molecular and anatomical data in virtual reality using semantic mapping

    Get PDF
    We have developed a computational framework for spatiotemporal integration of molecular and anatomical datasets in a virtual reality environment. Using two case studies involving gene expression data and pharmacokinetic data, respectively, we demonstrate how existing knowledge bases for molecular data can be semantically mapped onto a standardized anatomical context of human body. Our data mapping methodology uses ontological representations of heterogeneous biomedical datasets and an ontology reasoner to create complex semantic descriptions of biomedical processes. This framework provides a means to systematically combine an increasing amount of biomedical imaging and numerical data into spatiotemporally coherent graphical representations. Our work enables medical researchers with different expertise to simulate complex phenomena visually and to develop insights through the use of shared data, thus paving the way for pathological inference, developmental pattern discovery and biomedical hypothesis testing

    Identification of genes expressed by immune cells of the colon that are regulated by colorectal cancer-associated variants.

    Get PDF
    A locus on human chromosome 11q23 tagged by marker rs3802842 was associated with colorectal cancer (CRC) in a genome-wide association study; this finding has been replicated in case-control studies worldwide. In order to identify biologic factors at this locus that are related to the etiopathology of CRC, we used microarray-based target selection methods, coupled to next-generation sequencing, to study 103 kb at the 11q23 locus. We genotyped 369 putative variants from 1,030 patients with CRC (cases) and 1,061 individuals without CRC (controls) from the Ontario Familial Colorectal Cancer Registry. Two previously uncharacterized genes, COLCA1 and COLCA2, were found to be co-regulated genes that are transcribed from opposite strands. Expression levels of COLCA1 and COLCA2 transcripts correlate with rs3802842 genotypes. In colon tissues, COLCA1 co-localizes with crystalloid granules of eosinophils and granular organelles of mast cells, neutrophils, macrophages, dendritic cells and differentiated myeloid-derived cell lines. COLCA2 is present in the cytoplasm of normal epithelial, immune and other cell lineages, as well as tumor cells. Tissue microarray analysis demonstrates the association of rs3802842 with lymphocyte density in the lamina propria (p = 0.014) and levels of COLCA1 in the lamina propria (p = 0.00016) and COLCA2 (tumor cells, p = 0.0041 and lamina propria, p = 6 × 10(-5)). In conclusion, genetic, expression and immunohistochemical data implicate COLCA1 and COLCA2 in the pathogenesis of colon cancer. Histologic analyses indicate the involvement of immune pathways

    Mainstreaming Grassroots Adaptation and Building Climate Resilient Agriculture in SAT Vietnam

    Get PDF
    Vietnam has a population of more than 86 million people, and an inland surface of 33,115,000 ha. Forest and agricultural lands, cover 44.7% and 28.4%, respectively. The agricultural sector, including crops, livestock, fisheries and aquaculture accounts for more than 20% of the national GDP, 65% of employment and 30% of export value. The agricultural sector has a considerable influence on the national economic growth, poverty eradication and malnutrition elimination (GSO 2010)...

    Association between germline variants and somatic mutations in colorectal cancer

    Full text link
    Colorectal cancer (CRC) is a heterogeneous disease with evidence of distinct tumor types that develop through different somatically altered pathways. To better understand the impact of the host genome on somatically mutated genes and pathways, we assessed associations of germline variations with somatic events via two complementary approaches. We first analyzed the association between individual germline genetic variants and the presence of non-silent somatic mutations in genes in 1375 CRC cases with genome-wide SNPs data and a tumor sequencing panel targeting 205 genes. In the second analysis, we tested if germline variants located within previously identified regions of somatic allelic imbalance were associated with overall CRC risk using summary statistics from a recent large scale GWAS (n similar or equal to 125 k CRC cases and controls). The first analysis revealed that a variant (rs78963230) located within a CNA region associated with TLR3 was also associated with a non-silent mutation within gene FBXW7. In the secondary analysis, the variant rs2302274 located in CDX1/PDGFRB frequently gained/lost in colorectal tumors was associated with overall CRC risk (OR = 0.96, p = 7.50e-7). In summary, we demonstrate that an integrative analysis of somatic and germline variation can lead to new insights about CRC

    Identifying colorectal cancer caused by biallelic MUTYH pathogenic variants using tumor mutational signatures

    Full text link
    Carriers of germline biallelic pathogenic variants in the MUTYH gene have a high risk of colorectal cancer. We test 5649 colorectal cancers to evaluate the discriminatory potential of a tumor mutational signature specific to MUTYH for identifying biallelic carriers and classifying variants of uncertain clinical significance (VUS). Using a tumor and matched germline targeted multi-gene panel approach, our classifier identifies all biallelic MUTYH carriers and all known non-carriers in an independent test set of 3019 colorectal cancers (accuracy = 100% (95% confidence interval 99.87-100%)). All monoallelic MUTYH carriers are classified with the non-MUTYH carriers. The classifier provides evidence for a pathogenic classification for two VUS and a benign classification for five VUS. Somatic hotspot mutations KRAS p.G12C and PIK3CA p.Q546K are associated with colorectal cancers from biallelic MUTYH carriers compared with non-carriers (p = 2 x 10(-23) and p = 6 x 10(-11), respectively). Here, we demonstrate the potential application of mutational signatures to tumor sequencing workflows to improve the identification of biallelic MUTYH carriers. Germline biallelic pathogenic MUTYH variants predispose patients to colorectal cancer (CRC); however, approaches to identify MUTYH variant carriers are lacking. Here, the authors evaluated mutational signatures that could distinguish MUTYH carriers in large CRC cohorts, and found MUTYH-associated somatic mutations
    corecore