14 research outputs found

    Text-mining clinically relevant cancer biomarkers for curation into the CIViC database

    Get PDF
    Background: Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing, and drug response markers is essential. Several knowledgebases have been created by different groups to collate evidence for these associations. These include the open-access Clinical Interpretation of Variants in Cancer (CIViC) knowledgebase. These databases rely on time-consuming manual curation from skilled experts who read and interpret the relevant biomedical literature. Methods: To aid in this curation and provide the greatest coverage for these databases, particularly CIViC, we propose the use of text mining approaches to extract these clinically relevant biomarkers from all available published literature. To this end, a group of cancer genomics experts annotated sentences that discussed biomarkers with their clinical associations and achieved good inter-annotator agreement. We then used a supervised learning approach to construct the CIViCmine knowledgebase. Results: We extracted 121,589 relevant sentences from PubMed abstracts and PubMed Central Open Access full-text papers. CIViCmine contains over 87,412 biomarkers associated with 8035 genes, 337 drugs, and 572 cancer types, representing 25,818 abstracts and 39,795 full-text publications. Conclusions: Through integration with CIVIC, we provide a prioritized list of curatable clinically relevant cancer biomarkers as well as a resource that is valuable to other knowledgebases and precision cancer analysts in general. All data is publically available and distributed with a Creative Commons Zero license. The CIViCmine knowledgebase is available at http://bionlp.bcgsc.ca/civicmine/

    Phagocytosis of Aspergillus fumigatus by Human Bronchial Epithelial Cells Is Mediated by the Arp2/3 Complex and WIPF2

    Get PDF
    Aspergillus fumigatus is an opportunistic fungal pathogen capable of causing severe infection in humans. One of the limitations in our understanding of how A. fumigatus causes infection concerns the initial stages of infection, notably the initial interaction between inhaled spores or conidia and the human airway. Using publicly-available datasets, we identified the Arp2/3 complex and the WAS-Interacting Protein Family Member 2 WIPF2 as being potentially responsible for internalization of conidia by airway epithelial cells. Using a cell culture model, we demonstrate that RNAi-mediated knockdown of WIPF2 significantly reduces internalization of conidia into airway epithelial cells. Furthermore, we demonstrate that inhibition of Arp2/3 by a small molecule inhibitor causes similar effects. Using super-resolution fluorescence microscopy, we demonstrate that WIPF2 is transiently localized to the site of bound conidia. Overall, we demonstrate the active role of the Arp2/3 complex and WIPF2 in mediating the internalization of A. fumigatus conidia into human airway epithelial cells

    Genome-Wide Discovery of Somatic Regulatory Variants in Diffuse Large B-Cell Lymphoma

    Get PDF
    Diffuse large B-cell lymphoma (DLBCL) is an aggressive cancer originating from mature B-cells. Prognosis is strongly associated with molecular subgroup, although the driver mutations that distinguish the two main subgroups remain poorly defined. Through an integrative analysis of whole genomes, exomes, and transcriptomes, we have uncovered genes and non-coding loci that are commonly mutated in DLBCL. Our analysis has identified novel cis-regulatory sites, and implicates recurrent mutations in the 3′ UTR of NFKBIZ as a novel mechanism of oncogene deregulation and NF-κB pathway activation in the activated B-cell (ABC) subgroup. Small amplifications associated with over-expression of FCGR2B (the Fcγ receptor protein IIB), primarily in the germinal centre B-cell (GCB) subgroup, correlate with poor patient outcomes suggestive of a novel oncogene. These results expand the list of subgroup driver mutations that may facilitate implementation of improved diagnostic assays and could offer new avenues for the development of targeted therapeutics.&nbsp

    Copy number variation in metastatic cancer : methods and analysis of somatic copy number variation in advanced human cancers

    No full text
    Genome sequencing has transformed our understanding of human genetic diseases in recent years, not least of which is Cancer. Among the genetic abnormalities commonly observed within cancer are copy number variants, alterations in the abundance of DNA, which often affect cellular function and contribute to disease. Whole-genome sequencing has allowed for high throughput examination and identification of mutations such as single nucleotide variants within cancer, while the identification of copy number variants remains comparatively difficult. The critical task for accurate identification of copy number variants from this data remains segmentation, the task of aggregating sequences of DNA abundance observations into contiguous segments of presumably constant DNA copy number. In this dissertation, we propose a novel method for performing copy number segmentation of sequenced whole cancer genomes. We apply a novel bottom-up, coarse-to-fine segmentation algorithm alongside statistical techniques to identify tumor heterogeneity and accurately perform copy number variant detection. We compare our method with a number of other methods in a variety of contexts, including fully synthetic data, resequenced cell line data, and a large cohort of sequenced metastatic cancer genomes. Next, we apply the results of our method in the analysis of chromosomal instability patterns throughout the genome. We assess genome-wide patterns of homozygous deletion and methods of measuring chromosomal instability and its numerous interfaces with tumor biology, including mutational signatures and gene dosage effects. Finally, we investigate the prospect of identifying copy number variants using long read data from oxford nanopore instruments. We present two case reports of copy number variation analysis in these data, and subsequently assess our ability to identify these variants in metastatic cancer biopsies as compared to traditional short read sequencing methods. We subsequently investigate factors influencing our ability to identify copy number events in nanopore data. In this work, we have focused on the methods and analysis of copy number variants in human cancer. The methods and analyses performed herein will assist in research concerning these mutations and their greater role in cancer and human biology.Science, Faculty ofGraduat

    Transcriptomic and proteomic host response to Aspergillus fumigatus conidia in an air-liquid interface model of human bronchial epithelium.

    No full text
    Aspergillus fumigatus (A. fumigatus) is a wide-spread fungus that is a potent allergen in hypersensitive individuals but also an opportunistic pathogen in immunocompromised patients. It reproduces asexually by releasing airborne conidiospores (conidia). Upon inhalation, fungal conidia are capable of reaching the airway epithelial cells (AECs) in bronchial and alveolar tissues. Previous studies have predominantly used submerged monolayer cultures for studying this host-pathogen interaction; however, these cultures do not recapitulate the mucocililary differentiation phenotype of the in vivo epithelium in the respiratory tract. Thus, the aim of this study was to use well-differentiated primary human bronchial epithelial cells (HBECs) grown at the air-liquid interface (ALI) to determine their transcriptomic and proteomic responses following interaction with A. fumigatus conidia. We visualized conidial interaction with HBECs using confocal laser scanning microscopy (CLSM), and applied NanoString nCounter and shotgun proteomics to assess gene expression changes in the human cells upon interaction with A. fumigatus conidia. Western blot analysis was used to assess the expression of top three differentially expressed proteins, CALR, SET and NUCB2. CLSM showed that, unlike submerged monolayer cultures, well-differentiated ALI cultures of primary HBECs were estimated to internalize less than 1% of bound conidia. Nevertheless, transcriptomic and proteomic analyses revealed numerous differentially expressed host genes; these were enriched for pathways including apoptosis/autophagy, translation, unfolded protein response and cell cycle (up-regulated); complement and coagulation pathways, iron homeostasis, nonsense mediated decay and rRNA binding (down-regulated). CALR and SET were confirmed to be up-regulated in ALI cultures of primary HBECs upon exposure to A. fumigatus via western blot analysis. Therefore, using transcriptomics and proteomics approaches, ALI models recapitulating the bronchial epithelial barrier in the conductive zone of the respiratory tract can provide novel insights to the molecular response of bronchial epithelial cells upon exposure to A. fumigatus conidia

    Text-mining clinically relevant cancer biomarkers for curation into the CIViC database

    Get PDF
    Background: Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing, and drug response markers is essential. Several knowledgebases have been created by different groups to collate evidence for these associations. These include the open-access Clinical Interpretation of Variants in Cancer (CIViC) knowledgebase. These databases rely on time-consuming manual curation from skilled experts who read and interpret the relevant biomedical literature. Methods: To aid in this curation and provide the greatest coverage for these databases, particularly CIViC, we propose the use of text mining approaches to extract these clinically relevant biomarkers from all available published literature. To this end, a group of cancer genomics experts annotated sentences that discussed biomarkers with their clinical associations and achieved good inter-annotator agreement. We then used a supervised learning approach to construct the CIViCmine knowledgebase. Results: We extracted 121,589 relevant sentences from PubMed abstracts and PubMed Central Open Access full-text papers. CIViCmine contains over 87,412 biomarkers associated with 8035 genes, 337 drugs, and 572 cancer types, representing 25,818 abstracts and 39,795 full-text publications. Conclusions: Through integration with CIVIC, we provide a prioritized list of curatable clinically relevant cancer biomarkers as well as a resource that is valuable to other knowledgebases and precision cancer analysts in general. All data is publically available and distributed with a Creative Commons Zero license. The CIViCmine knowledgebase is available at http://bionlp.bcgsc.ca/civicmine/.Other UBCNon UBCReviewedFacult

    The genome of the forest insect pest <i>Pissodes strobi</i> reveals genome expansion and evidence of a <i>Wolbachia</i> endosymbiont

    No full text
    AbstractThe highly diverse insect family of true weevils, Curculionidae, includes many agricultural and forest pests. Pissodes strobiPissodes strobiPissodes strobiWolbachiaWolbachi

    The Genome of the Steller Sea Lion (Eumetopias jubatus)

    No full text
    The Steller sea lion is the largest member of the Otariidae family and is found in the coastal waters of the northern Pacific Rim. Here, we present the Steller sea lion genome, determined through DNA sequencing approaches that utilized microfluidic partitioning library construction, as well as nanopore technologies. These methods constructed a highly contiguous assembly with a scaffold N50 length of over 14 megabases, a contig N50 length of over 242 kilobases and a total length of 2.404 gigabases. As a measure of completeness, 95.1% of 4104 highly conserved mammalian genes were found to be complete within the assembly. Further annotation identified 19,668 protein coding genes. The assembled genome sequence and underlying sequence data can be found at the National Center for Biotechnology Information (NCBI) under the BioProject accession number PRJNA475770.Medicine, Faculty ofScience, Faculty ofNon UBCMedical Genetics, Department ofOceans and Fisheries, Institute for theReviewedFacult
    corecore