113 research outputs found
Recommended from our members
Enriching the Human Phenotype Ontology with inferred axioms from textual descriptions
The Human Phenotype Ontology (HP) is a reference vocabulary of human phenotypic abnormalities. HP, apart from the textual information (general definitions, descriptions, synonyms, etc.) of each ontology concept, also provides computer-readable logical definitions (axioms) of terms that will allow human phenotypic abnormalities to be related to entities from anatomy, pathology, biochemistry and other areas. In this paper we present a prototype to generate new axiomatic knowledge from the textual descriptions of each HP term. The prototype (i) detects terms in the textual descriptions and not found in the given logical expressions, (ii) generates pair combinations of those terms, (iii) builds triples after detecting the most probable relation between the pair of terms using a statistical model and, finally, (iv) suggests the most probable triples to the user so she can decide which ones can be added to the original axioms
The Ontology of Biological Attributes (OBA)-computational traits for the life sciences.
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos
Predicting Gene-Disease Associations with Knowledge Graph Embeddings over Multiple Ontologies
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021There are still more than 1,400 Mendelian conditions whose molecular cause is un known. In addition, almost all medical conditions are somehow influenced by human genetic variation. This challenge also presents itself as an opportunity to understand the mechanisms of diseases, thus allowing the development of better mitigation strategies, finding diagnostic markers and therapeutic targets. Deciphering the link between genes and diseases is one of the most demanding tasks in biomedical research. Computational approaches for gene-disease associations prediction can greatly accelerate this process, and recent developments that explore the scientific knowledge described in ontologies have achieved good results. State-of-the-art approaches that take advantage of ontologies or knowledge graphs for these predictions are typically based on semantic similarity measures that only take into consideration hierarchical relations. New developments in the area of knowledge graphs embeddings support more powerful representations but are usually limited to a single ontology, which may be insufficient in multi-domain applications such as the prediction of gene-disease associations. This dissertation proposes a novel approach of gene-disease associations prediction by exploring both the Human Phenotype Ontology and the Gene Ontology, using knowledge graph embeddings to represent gene and disease features in a shared semantic space that covers both gene function and phenotypes. Our approach integrates different methods for building the shared semantic space, as well as multiple knowledge graph embeddings algorithms and machine learning methods. The prediction performance was evaluated on curated gene-disease associations from DisGeNET and compared to classical semantic similarity measures. Our experiments demonstrate the value of employing knowledge graph embeddings based on random walks and highlight the need for closer integration of different ontologies
Covid19/IT the digital side of Covid19: A picture from Italy with clustering and taxonomy
The Covid19 pandemic has significantly impacted on our lives, triggering a strong reaction resulting in vaccines, more effective diagnoses and therapies, policies to contain the pandemic outbreak, to name but a few. A significant contribution to their success comes from the computer science and information technology communities, both in support to other disciplines and as the primary driver of solutions for, e.g., diagnostics, social distancing, and contact tracing. In this work, we surveyed the Italian computer science and engineering community initiatives against the Covid19 pandemic. The 128 responses thus collected document the response of such a community during the first pandemic wave in Italy (February-May 2020), through several initiatives carried out by both single researchers and research groups able to promptly react to Covid19, even remotely. The data obtained by the survey are here reported, discussed and further investigated by Natural Language Processing techniques, to generate semantic clusters based on embedding representations of the surveyed activity descriptions. The resulting clusters have been then used to extend an existing Covid19 taxonomy with the classification of related research activities in computer science and information technology areas, summarizing this work contribution through a reproducible survey-to-taxonomy methodology
Aquisição e Interrogação de Conhecimento de Prática Clínica usando Linguagem Natural
The scientific concepts, methodologies and tools in the Knowledge Representation (KR) sub-
domain of applied Artificial Intelligence (AI) came a long way with enormous strides in recent
years. The usage of domain conceptualizations that are Ontologies is now powerful enough to aim
at computable reasoning over complex realities.
One of the most challenging scientific and technical human endeavors is the daily Clinical Prac-
tice (CP) of Cardiovascular (CV) specialty healthcare providers.
Such a complex domain can benefit largely from the possibility of clinical reasoning aids that are now
at the edge of being available.
We research into a complete end-to-end solid ontological infrastructure for CP knowledge represen-
tation as well as the associated processes to automatically acquire knowledge from clinical texts and
reason over it
Clinical practice knowledge acquisition and interrogation using natural language: aquisição e interrogação de conhecimento de prática clínica usando linguagem natural
Os conceitos científicos, metodologias e ferramentas no sub-dominio da Representação de Conhecimento da área da Inteligência Artificial Aplicada têm sofrido avanços muito significativos nos anos recentes. A utilização de Ontologias como conceptualizações de domínios é agora suficientemente poderosa para aspirar ao raciocínio computacional sobre realidades complexas. Uma das tarefas científica e tecnicamente mais desafiante é prestação de cuidados pelos profissionais de saúde na especialidade cardiovascular. Um domínio de tal forma complexo pode beneficiar largamente da possibilidade de ajudas ao raciocínio clínico que estão neste momento a beira de ficarem disponíveis. Investigamos no sentido de desenvolver uma infraestrutura sólida e completa para a representação de conhecimento na prática clínica bem como os processes associados para adquirir o conhecimento a partir de textos clínicos e raciocinar automaticamente sobre esse conhecimento; ABSTRACT: The scientific concepts, methodologies and tools in the Knowledge Representation (KR) subdomain of applied Artificial Intelligence (AI) came a long way with enormous strides in recent years. The usage of domain conceptualizations that are Ontologies is now powerful enough to aim at computable reasoning over complex realities. One of the most challenging scientific and technical human endeavors is the daily Clinical Practice (CP) of Cardiovascular (C V) specialty healthcare providers. Such a complex domain can benefit largely from the possibility of clinical reasoning aids that are now at the edge of being available. We research into al complete end-to-end solid ontological infrastructure for CP knowledge representation as well as the associated processes to automatically acquire knowledge from clinical texts and reason over it
Recommended from our members
Addressing Semantic Interoperability and Text Annotations. Concerns in Electronic Health Records using Word Embedding, Ontology and Analogy
Electronic Health Record (EHR) creates a huge number of databases which are
being updated dynamically. Major goal of interoperability in healthcare is to
facilitate the seamless exchange of healthcare related data and an environment
to supports interoperability and secure transfer of data. The health care
organisations face difficulties in exchanging patient’s health care information
and laboratory reports etc. due to a lack of semantic interoperability. Hence,
there is a need of semantic web technologies for addressing healthcare
interoperability problems by enabling various healthcare standards from various
healthcare entities (doctors, clinics, hospitals etc.) to exchange data and its
semantics which can be understood by both machines and humans. Thus, a
framework with a similarity analyser has been proposed in the thesis that dealt
with semantic interoperability. While dealing with semantic interoperability,
another consideration was the use of word embedding and ontology for
knowledge discovery. In medical domain, the main challenge for medical
information extraction system is to find the required information by considering
explicit and implicit clinical context with high degree of precision and accuracy.
For semantic similarity of medical text at different levels (conceptual, sentence
and document level), different methods and techniques have been widely
presented, but I made sure that the semantic content of a text that is presented
includes the correct meaning of words and sentences. A comparative analysis
of approaches included ontology followed by word embedding or vice-versa
have been applied to explore the methodology to define which approach gives
better results for gaining higher semantic similarity. Selecting the Kidney Cancer
dataset as a use case, I concluded that both approaches work better in different circumstances. However, the approach in which ontology is followed by word
embedding to enrich data first has shown better results. Apart from enriching
the EHR, extracting relevant information is also challenging. To solve this
challenge, the concept of analogy has been applied to explain similarities
between two different contents as analogies play a significant role in
understanding new concepts. The concept of analogy helps healthcare
professionals to communicate with patients effectively and help them
understand their disease and treatment. So, I utilised analogies in this thesis to
support the extraction of relevant information from the medical text. Since
accessing EHR has been challenging, tweets text is used as an alternative for
EHR as social media has appeared as a relevant data source in recent years.
An algorithm has been proposed to analyse medical tweets based on analogous
words. The results have been used to validate the proposed methods. Two
experts from medical domain have given their views on the proposed methods
in comparison with the similar method named as SemDeep. The quantitative
and qualitative results have shown that the proposed analogy-based method
bring diversity and are helpful in analysing the specific disease or in text
classification
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
Recommended from our members
Drug repositioning and indication discovery using description logics
Drug repositioning is the discovery of new indications for approved or failed drugs. This practice is commonly done within the drug discovery process in order to adjust or expand the application line of an active molecule. Nowadays, an increasing number of computational methodologies aim at predicting repositioning opportunities in an automated fashion. Some approaches rely on the direct physical interaction between molecules and protein targets (docking) and some methods consider more abstract descriptors, such as a gene expression signature, in order to characterise the potential pharmacological action of a drug (Chapter 1).
On a fundamental level, repositioning opportunities exist because drugs perturb multiple biological entities, (on and off-targets) themselves involved in multiple biological processes. Therefore, a drug can play multiple roles or exhibit various mode of actions responsible for its pharmacology. The work done for my thesis aims at characterising these various modes and mechanisms of action for approved drugs, using a mathematical framework called description logics.
In this regard, I first specify how living organisms can be compared to complex black box machines and how this analogy can help to capture biomedical knowledge using description logics (Chapter 2). Secondly, the theory is implemented in the Functional Therapeutic Chemical Classification System (FTC - https://www.ebi.ac.uk/chembl/ftc/), a resource defining over 20,000 new categories representing the modes and mechanisms of action of approved drugs. The FTC also indexes over 1,000 approved drugs, which have been classified into the mode of action categories using automated reasoning. The FTC is evaluated against a gold standard, the Anatomical Therapeutic Chemical Classification System (ATC), in order to characterise its quality and content (Chapter 3).
Finally, from the information available in the FTC, a series of drug repositioning hypotheses were generated and made publicly available via a web application (https://www.ebi.ac.uk/chembl/research/ftc-hypotheses). A subset of the hypotheses related to the cardiovascular hypertension as well as for Alzheimer’s disease are further discussed in more details, as an example of an application (Chapter 4).
The work performed illustrates how new valuable biomedical knowledge can be automatically generated by integrating and leveraging the content of publicly available resources using description logics and automated reasoning. The newly created classification (FTC) is a first attempt to formally and systematically characterise the function or role of approved drugs using the concept of mode of action. The open hypotheses derived from the resource are available to the community to analyse and design further experiments.This work was supported by the European Molecular Biology Laboratory (EMBL)
- …