11,845 research outputs found

    Finding Conflicting Statements in the Biomedical Literature

    Get PDF

    ExTRI: Extraction of transcription regulation interactions from literature

    Get PDF
    The regulation of gene transcription by transcription factors is a fundamental biological process, yet the relations between transcription factors (TF) and their target genes (TG) are still only sparsely covered in databases. Text-mining tools can offer broad and complementary solutions to help locate and extract mentions of these biological relationships in articles. We have generated ExTRI, a knowledge graph of TF-TG relationships, by applying a high recall text-mining pipeline to MedLine abstracts identifying over 100,000 candidate sentences with TF-TG relations. Validation procedures indicated that about half of the candidate sentences contain true TF-TG relationships. Post-processing identified 53,000 high confidence sentences containing TF-TG relationships, with a cross-validation F1-score close to 75%. The resulting collection of TF-TG relationships covers 80% of the relations annotated in existing databases. It adds 11,000 other potential interactions, including relationships for ~100 TFs currently not in public TF-TG relation databases. The high confidence abstract sentences contribute 25,000 literature references not available from other resources and offer a wealth of direct pointers to functional aspects of the TF-TG interactions. Our compiled resource encompassing ExTRI together with publicly available resources delivers literature-derived TF-TG interactions for more than 900 of the 1500–1600 proteins considered to function as specific DNA binding TFs. The obtained result can be used by curators, for network analysis and modelling, for causal reasoning or knowledge graph mining approaches, or serve to benchmark text mining strategies.We thank the participants of the COST Action GREEKC (CA15205) for fruitful discussions during workshops supported by COST (European Cooperation in Science and Technology).Peer ReviewedPostprint (published version

    Machine Learning Models for Deciphering Regulatory Mechanisms and Morphological Variations in Cancer

    Get PDF
    The exponential growth of multi-omics biological datasets is resulting in an emerging paradigm shift in fundamental biological research. In recent years, imaging and transcriptomics datasets are increasingly incorporated into biological studies, pushing biology further into the domain of data-intensive-sciences. New approaches and tools from statistics, computer science, and data engineering are profoundly influencing biological research. Harnessing this ever-growing deluge of multi-omics biological data requires the development of novel and creative computational approaches. In parallel, fundamental research in data sciences and Artificial Intelligence (AI) has advanced tremendously, allowing the scientific community to generate a massive amount of knowledge from data. Advances in Deep Learning (DL), in particular, are transforming many branches of engineering, science, and technology. Several of these methodologies have already been adapted for harnessing biological datasets; however, there is still a need to further adapt and tailor these techniques to new and emerging technologies. In this dissertation, we present computational algorithms and tools that we have developed to study gene-regulation and cellular morphology in cancer. The models and platforms that we have developed are general and widely applicable to several problems relating to dysregulation of gene expression in diseases. Our pipelines and software packages are disseminated in public repositories for larger scientific community use. This dissertation is organized in three main projects. In the first project, we present Causal Inference Engine (CIE), an integrated platform for the identification and interpretation of active regulators of transcriptional response. The platform offers visualization tools and pathway enrichment analysis to map predicted regulators to Reactome pathways. We provide a parallelized R-package for fast and flexible directional enrichment analysis to run the inference on custom regulatory networks. Next, we designed and developed MODEX, a fully automated text-mining system to extract and annotate causal regulatory interaction between Transcription Factors (TFs) and genes from the biomedical literature. MODEX uses putative TF-gene interactions derived from high-throughput ChIP-Seq or other experiments and seeks to collect evidence and meta-data in the biomedical literature to validate and annotate the interactions. MODEX is a complementary platform to CIE that provides auxiliary information on CIE inferred interactions by mining the literature. In the second project, we present a Convolutional Neural Network (CNN) classifier to perform a pan-cancer analysis of tumor morphology, and predict mutations in key genes. The main challenges were to determine morphological features underlying a genetic status and assess whether these features were common in other cancer types. We trained an Inception-v3 based model to predict TP53 mutation in five cancer types with the highest rate of TP53 mutations. We also performed a cross-classification analysis to assess shared morphological features across multiple cancer types. Further, we applied a similar methodology to classify HER2 status in breast cancer and predict response to treatment in HER2 positive samples. For this study, our training slides were manually annotated by expert pathologists to highlight Regions of Interest (ROIs) associated with HER2+/- tumor microenvironment. Our results indicated that there are strong morphological features associated with each tumor type. Moreover, our predictions highly agree with manual annotations in the test set, indicating the feasibility of our approach in devising an image-based diagnostic tool for HER2 status and treatment response prediction. We have validated our model using samples from an independent cohort, which demonstrates the generalizability of our approach. Finally, in the third project, we present an approach to use spatial transcriptomics data to predict spatially-resolved active gene regulatory mechanisms in tissues. Using spatial transcriptomics, we identified tissue regions with differentially expressed genes and applied our CIE methodology to predict active TFs that can potentially regulate the marker genes in the region. This project bridged the gap between inference of active regulators using molecular data and morphological studies using images. The results demonstrate a significant local pattern in TF activity across the tissue, indicating differential spatial-regulation in tissues. The results suggest that the integrative analysis of spatial transcriptomics data with CIE can capture discriminant features and identify localized TF-target links in the tissue

    Playing hide and seek on the genomic playground: unveiling biological function from literature

    Get PDF

    Effects of fermented wheat germ extract on oral cancer cells and research of biomarkers for diagnosis and prognosis of oral cancer

    Get PDF
    Oral squamous cell carcinoma (OSCC) represents one of the most aggressive types of cancer. The disease occurs when the accumulation of multiple genetic mutations in the oral epithelial cells leads to an irreversible damage of DNA and the cells lose their normal life cycle. The prognosis correlates to several factors and an early diagnosis, certainly, improves the outcome. The treatment strategy for OSCC incorporates both the surgical and oncologic approaches. There are two main challenges of the current research in cancer treatment: the first one is the development of more personalized and effective therapies, since not all tumors of the same stage respond to the therapy in the same way, and the second one is the setup of a more targeted therapy, that can affect only the cancer cells, without destroying healthy ones. Many efforts are made to find compounds that can support and improve the cancer therapy, and great attention is focused on some of natural products, known to have beneficial properties on the human organism. The aim of this thesis is to present results deriving from a research directed to investigate a possible use of a natural compound, Fermented Wheat Germ Extract (FWGE), for the treatment of Oral squamous cell carcinoma (OSCC). In order to summarize the scientific evidence of the use of FWGE for treatment of cancer cells, a systematic review of the literature was performed. Sixteen articles were included in the final qualitative analysis. Various types of cancer cells treated with FWGE have been analyzed, showing mainly cytotoxic effects, alteration of the cell cycle, antiproliferative effects, and induction of apoptosis. After that, a series of in vitro experiments, including MTT assay, invasion and migration assays were performed to investigate the effects of the treatment of OSCC cells (HSC-3, SAS and SCC-25) with different concentrations of FWGE. The inhibitory effect on viability 2 of OSCC cells, exerted by chemotherapeutic drugs (cisplatin and 5-fluorouracil) and the combination of these with FWGE, was also evaluated. The results showed a significant reduction of cells viability after treatment with FWGE. Regarding migration and invasion capacity, the HSC-3 cells resulted to be the most sensitive to the treatment with FWGE. The combination of chemotherapeutic drugs and FWGE at 10mg/ml led to a significantly higher decrease in cell viability. A secondary purpose of this thesis regarded the investigation of prognostic meaning of certain mutations and expression of proteins characterizing OSCC. Firstly, a histologic and bioinformatic analysis of Musashi 2 (MSI2) expression was performed and its correlation with clinic-pathologic and prognostic features of OSCC evaluated. Musashi-2 is an RNA-binding protein, playing a fundamental role in the oncogenesis of several cancers. A bioinformatic analysis was performed on data downloaded from The Cancer Genome Atlas (TCGA) database. The MSI2 expression data were analysed for their correlation with clinic-pathological and prognostic features. In addition, an immunohistochemical evaluation of MSI2 expression on 108 OSCC samples included in a tissue microarray and 13 healthy mucosae samples was performed. 241 patients’ data from TCGA were included in the final analysis. No DNA mutations were detected for the MSI2 gene, but a hyper methylated condition of the gene emerged. MSI2 mRNA expression correlated with Grading (p = 0.009) and overall survival (p = 0.045), but not with disease free survival (p = 0.549). Males presented a higher MSI2 mRNA expression than females. The immunohistochemical evaluation revealed a weak expression of MSI2 in both OSCC samples and in healthy oral mucosae. In addition, MSI2 expression directly correlated with Cyclin-D1 expression (p = 0.022). However, no correlation has been detected with prognostic outcomes (overall and disease free survival). The role of MSI2 expression in OSCC seems to be not so closely correlated with prognosis, as in other human neoplasms. 3 The correlation with Cyclin-D1 expression suggests an indirect role that MSI2 might have in the proliferation of OSCC cells, but further studies are needed to confirm such results. Secondly, the role of programmed death ligand 1 (PD‐L1) in the tumour immunity and its potential function as a marker for OSCC prognosis were investigated through a metaanalysis. The studies were identified by searching PubMed, SCOPUS, Web of Science and were assessed by two of the authors. After the selection process, 11 articles met eligibility criteria and were included in the meta‐analysis. Quality assessment of studies was performed according to the REMARK guidelines, and the risk of biases across studies was investigated through Q and I2 tests. Meta‐analysis was performed to investigate the association between the PD‐L1 expression either overall survival (OS), disease‐free survival (DFS), diseasespecific survival (DSS), gender and lymph node metastasis. A total of 1060 patients were analysed in the 11 studies included in the meta‐analysis. Pooled analysis revealed that the expression of PD‐L1 did not correlate with poor OS (HR, 0.60; 95% CI: [0.33, 1.10]; P = 0.10), DFS (HR, 0.62; 95% CI: [0.21, 1.88]; P = 0.40), DSS (HR, 2.05; 95% CI: [0.53, 7.86]; P = 0.29 and lymph node metastasis (HR, 1.15; 95% CI: [0.74, 1.81]; P = 0.53). Furthermore, results of the meta‐analysis showed that high expression of PD‐L1 is two times more frequent in female patients (OR, 0.5; 95% CI: [0.36, 0.69]; P < 0.0001) compared to males. For all the three outcomes analysed, a high rate of heterogeneity was detected (I2 > 50%). High PD‐L1 expression did not correlate with poor prognosis of patients suffering for oral squamous cell carcinoma. Studies published on the topic showed a significant variation in results, limiting the use of PD‐L1 expression by immunohistochemistry as prognostic biomarker in clinical practice. Lastly, the role of the tumour-suppressor gene TP53 was evaluated in different head and neck squamous cell carcinoma (HNSCC). A systematic bioinformatics appraisal of TP53 mutations was performed on 415 HNSCC cases available on The Cancer Genome Atlas (TCGA). The following features were analysed and correlated with known 4 clinicopathological variables: mutational profile of TP53, location (within secondary structure and predicted domains of p53 protein) and well-known hotspot mutations. Interactome–genome–transcriptome network analysis highlighted different gene networks. An algorithm was generated to develop a new prognostic classification system based on patients’ overall survival. TP53 mutations in HNSCCs exhibited distinct differences in different anatomical sites. The mutational profile of TP53 was an independent prognostic factor in HNSCC. High risk of death mutations, identified by our novel classification algorithm, was an independent prognostic factor in TCGA HNSCC database. Finally, network analysis suggested that distinct p53 molecular pathways exist in a site- and mutation-specific manner. The mutational profile of TP53 may serve as an independent prognostic factor in HNSCC patients, and is associated with distinctive site-specific biological networks

    Network-based approaches to explore complex biological systems towards network medicine

    Get PDF
    Network medicine relies on different types of networks: from the molecular level of protein–protein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of protein–protein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAs—including long non-coding RNAs (lncRNAs) —competing with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genes—called switch genes—critically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes

    MENGA: a new comprehensive tool for the integration of neuroimaging data and the Allen human brain transcriptome atlas

    Get PDF
    Brain-wide mRNA mappings offer a great potential for neuroscience research as they can provide information about system proteomics. In a previous work we have correlated mRNA maps with the binding patterns of radioligands targeting specific molecular systems and imaged with positron emission tomography (PET) in unrelated control groups. This approach is potentially applicable to any imaging modality as long as an efficient procedure of imaging-genomic matching is provided. In the original work we considered mRNA brain maps of the whole human genome derived from the Allen human brain database (ABA) and we performed the analysis with a specific region-based segmentation with a resolution that was limited by the PET data parcellation. There we identified the need for a platform for imaging-genomic integration that should be usable with any imaging modalities and fully exploit the high resolution mapping of ABA dataset.In this work we present MENGA (Multimodal Environment for Neuroimaging and Genomic Analysis), a software platform that allows the investigation of the correlation patterns between neuroimaging data of any sort (both functional and structural) with mRNA gene expression profiles derived from the ABA database at high resolution.We applied MENGA to six different imaging datasets from three modalities (PET, single photon emission tomography and magnetic resonance imaging) targeting the dopamine and serotonin receptor systems and the myelin molecular structure. We further investigated imaging-genomic correlations in the case of mismatch between selected proteins and imaging targets
    • 

    corecore