624 research outputs found
Analysis of the human diseasome reveals phenotype modules across common, genetic, and infectious diseases
Phenotypes are the observable characteristics of an organism arising from its
response to the environment. Phenotypes associated with engineered and natural
genetic variation are widely recorded using phenotype ontologies in model
organisms, as are signs and symptoms of human Mendelian diseases in databases
such as OMIM and Orphanet. Exploiting these resources, several computational
methods have been developed for integration and analysis of phenotype data to
identify the genetic etiology of diseases or suggest plausible interventions. A
similar resource would be highly useful not only for rare and Mendelian
diseases, but also for common, complex and infectious diseases. We apply a
semantic text- mining approach to identify the phenotypes (signs and symptoms)
associated with over 8,000 diseases. We demonstrate that our method generates
phenotypes that correctly identify known disease-associated genes in mice and
humans with high accuracy. Using a phenotypic similarity measure, we generate a
human disease network in which diseases that share signs and symptoms cluster
together, and we use this network to identify phenotypic disease modules
Uberon: towards a comprehensive multi-species anatomy ontology
The lack of a single unified species-neutral ontology covering the anatomy of a variety of metazoans is a hindrance to translating model organism research to human health. We have developed an Uber-anatomy ontology to fill this need, filling the gap between the CARO upper-level ontology and species-specific anatomical ontologies
Ontology-based cross-species integration and analysis of Saccharomyces cerevisiae phenotypes
Ontologies are widely used in the biomedical community for annotation and integration of databases. Formal definitions can relate classes from different ontologies and thereby integrate data across different levels of granularity, domains and species. We have applied this methodology to the Ascomycete Phenotype Ontology (APO), enabling the reuse of various orthogonal ontologies and we have converted the phenotype associated data found in the SGD following our proposed patterns. We have integrated the resulting data in the cross-species phenotype network PhenomeNET, and we make both the cross-species integration of yeast phenotypes and a similarity-based comparison of yeast phenotypes across species available in the PhenomeBrowser. Furthermore, we utilize our definitions and the yeast phenotype annotations to suggest novel functional annotations of gene products in yeast
Analysis of translesion polymerases in colorectal cancer cells following cetuximab treatment:A network perspective
IntroductionAdaptive mutagenesis observed in colorectal cancer (CRC) cells upon exposure to EGFR inhibitors contributes to the development of resistance and recurrence. Multiple investigations have indicated a parallel between cancer cells and bacteria in terms of exhibiting adaptive mutagenesis. This phenomenon entails a transient and coordinated escalation of error-prone translesion synthesis polymerases (TLS polymerases), resulting in mutagenesis of a magnitude sufficient to drive the selection of resistant phenotypes.MethodsIn this study, we conducted a comprehensive pan-transcriptome analysis of the regulatory framework within CRC cells, with the objective of identifying potential transcriptome modules encompassing certain translesion polymerases and the associated transcription factors (TFs) that govern them. Our sampling strategy involved the collection of transcriptomic data from tumors treated with cetuximab, an EGFR inhibitor, untreated CRC tumors, and colorectal-derived cell lines, resulting in a diverse dataset. Subsequently, we identified co-regulated modules using weighted correlation network analysis with a minKMEtostay threshold set at 0.5 to minimize false-positive module identifications and mapped the modules to STRING annotations. Furthermore, we explored the putative TFs influencing these modules using KBoost, a kernel PCA regression model.ResultsOur analysis did not reveal a distinct transcriptional profile specific to cetuximab treatment. Moreover, we elucidated co-expression modules housing genes, for example, POLK, POLI, POLQ, REV1, POLN, and POLM. Specifically, POLK, POLI, and POLQ were assigned to the “blue” module, which also encompassed critical DNA damage response enzymes, for example. BRCA1, BRCA2, MSH6, and MSH2. To delineate the transcriptional control of this module, we investigated associated TFs, highlighting the roles of prominent cancer-associated TFs, such as CENPA, HNF1A, and E2F7.ConclusionWe found that translesion polymerases are co-regulated with DNA mismatch repair and cell cycle-associated factors. We did not, however, identified any networks specific to cetuximab treatment indicating that the response to EGFR inhibitors relates to a general stress response mechanism
PhenomeNET: a whole-phenome approach to disease gene discovery
Phenotypes are investigated in model organisms to understand and reveal the molecular mechanisms underlying disease. Phenotype ontologies were developed to capture and compare phenotypes within the context of a single species. Recently, these ontologies were augmented with formal class definitions that may be utilized to integrate phenotypic data and enable the direct comparison of phenotypes between different species. We have developed a method to transform phenotype ontologies into a formal representation, combine phenotype ontologies with anatomy ontologies, and apply a measure of semantic similarity to construct the PhenomeNET cross-species phenotype network. We demonstrate that PhenomeNET can identify orthologous genes, genes involved in the same pathway and gene–disease associations through the comparison of mutant phenotypes. We provide evidence that the Adam19 and Fgf15 genes in mice are involved in the tetralogy of Fallot, and, using zebrafish phenotypes, propose the hypothesis that the mammalian homologs of Cx36.7 and Nkx2.5 lie in a pathway controlling cardiac morphogenesis and electrical conductivity which, when defective, cause the tetralogy of Fallot phenotype. Our method implements a whole-phenome approach toward disease gene discovery and can be applied to prioritize genes for rare and orphan diseases for which the molecular basis is unknown
The role of ontologies in biological and biomedical research: a functional perspective.
Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.This is the final version of the article. It first appeared from Oxford University Press via http://dx.doi.org/10.1093/bib/bbv01
The Units Ontology: a tool for integrating units of measurement in science
Units are basic scientific tools that render meaning to numerical data. Their standardization and formalization caters for the report, exchange, process, reproducibility and integration of quantitative measurements. Ontologies are means that facilitate the integration of data and knowledge allowing interoperability and semantic information processing between diverse biomedical resources and domains. Here, we present the Units Ontology (UO), an ontology currently being used in many scientific resources for the standardized description of units of measurements
A novel generative adversarial networks modelling for the class imbalance problem in high dimensional omics data
Class imbalance remains a large problem in high-throughput omics analyses, causing bias towards the over-represented class when training machine learning-based classifiers. Oversampling is a common method used to balance classes, allowing for better generalization of the training data. More naive approaches can introduce other biases into the data, being especially sensitive to inaccuracies in the training data, a problem considering the characteristically noisy data obtained in healthcare. This is especially a problem with high-dimensional data. A generative adversarial network-based method is proposed for creating synthetic samples from small, high-dimensional data, to improve upon other more naive generative approaches. The method was compared with ‘synthetic minority over-sampling technique’ (SMOTE) and ‘random oversampling’ (RO). Generative methods were validated by training classifiers on the balanced data
The anatomy of phenotype ontologies: principles, properties and applications
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.The National Science Foundation (IOS:1340112 to G.V.G.), the European Commission H2020 (grant agreement number 731075) to G.V.G. and the King Abdullah University of Science and Technology (to R.H.)
A sentence classification framework to identify geometric errors in radiation therapy from relevant literature
The objective of systematic reviews is to address a research question by summarizing relevant studies following a detailed, comprehensive, and transparent plan and search protocol to reduce bias. Systematic reviews are very useful in the biomedical and healthcare domain; however, the data extraction phase of the systematic review process necessitates substantive expertise and is labour-intensive and time-consuming. The aim of this work is to partially automate the process of building systematic radiotherapy treatment literature reviews by summarizing the required data elements of geometric errors of radiotherapy from relevant literature using machine learning and natural language processing (NLP) approaches. A framework is developed in this study that initially builds a training corpus by extracting sentences containing different types of geometric errors of radiotherapy from relevant publications. The publications are retrieved from PubMed following a given set of rules defined by a domain expert. Subsequently, the method develops a training corpus by extracting relevant sentences using a sentence similarity measure. A support vector machine (SVM) classifier is then trained on this training corpus to extract the sentences from new publications which contain relevant geometric errors. To demonstrate the proposed approach, we have used 60 publications containing geometric errors in radiotherapy to automatically extract the sentences stating the mean and standard deviation of different types of errors between planned and executed radiotherapy. The experimental results show that the recall and precision of the proposed framework are, respectively, 97% and 72%. The results clearly show that the framework is able to extract almost all sentences containing required data of geometric errors
- …