2,197 research outputs found
A Knowledge Graph Framework for Dementia Research Data
Dementia disease research encompasses diverse data modalities, including advanced
imaging, deep phenotyping, and multi-omics analysis. However, integrating these disparate data
sources has historically posed a significant challenge, obstructing the unification and comprehensive
analysis of collected information. In recent years, knowledge graphs have emerged as a powerful
tool to address such integration issues by enabling the consolidation of heterogeneous data sources
into a structured, interconnected network of knowledge. In this context, we introduce DemKG, an
open-source framework designed to facilitate the construction of a knowledge graph integrating
dementia research data, comprising three core components: a KG-builder that integrates diverse
domain ontologies and data annotations, an extensions ontology providing necessary terms tailored
for dementia research, and a versatile transformation module for incorporating study data. In contrast
with other current solutions, our framework provides a stable foundation by leveraging established
ontologies and community standards and simplifies study data integration while delivering solid
ontology design patterns, broadening its usability. Furthermore, the modular approach of its components enhances flexibility and scalability. We showcase how DemKG might aid and improve
multi-modal data investigations through a series of proof-of-concept scenarios focused on relevant
Alzheimer’s disease biomarkers
QueryOR: a comprehensive web platform for genetic variant analysis and prioritization
Background: Whole genome and exome sequencing are contributing to the extraordinary progress in the study of
human genetic variants. In this fast developing field, appropriate and easily accessible tools are required to facilitate
data analysis.
Results: Here we describe QueryOR, a web platform suitable for searching among known candidate genes as well
as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive,
flexible and easy to use. Instead of being designed on specific datasets, it works on a general XML schema specifying
formats and criteria of each data source. Thanks to this flexibility, new criteria can be easily added for future
expansion. Currently, up to 70 user-selectable criteria are available, including a wide range of gene and variant features.
Moreover, rather than progressively discarding variants taking one criterion at a time, the prioritization is achieved by a
global positive selection process that considers all transcript isoforms, thus producing reliable results. QueryOR is easy
to use and its intuitive interface allows to handle different kinds of inheritance as well as features related to sharing
variants in different patients. QueryOR is suitable for investigating single patients, families or cohorts.
Conclusions: QueryOR is a comprehensive and flexible web platform eligible for an easy user-driven variant
prioritization. It is freely available for academic institutions at http://queryor.cribi.unipd.it/
Corporate Smart Content Evaluation
Nowadays, a wide range of information sources are available due to the
evolution of web and collection of data. Plenty of these information are
consumable and usable by humans but not understandable and processable by
machines. Some data may be directly accessible in web pages or via data feeds,
but most of the meaningful existing data is hidden within deep web databases
and enterprise information systems. Besides the inability to access a wide
range of data, manual processing by humans is effortful, error-prone and not
contemporary any more. Semantic web technologies deliver capabilities for
machine-readable, exchangeable content and metadata for automatic processing
of content. The enrichment of heterogeneous data with background knowledge
described in ontologies induces re-usability and supports automatic processing
of data. The establishment of “Corporate Smart Content” (CSC) - semantically
enriched data with high information content with sufficient benefits in
economic areas - is the main focus of this study. We describe three actual
research areas in the field of CSC concerning scenarios and datasets
applicable for corporate applications, algorithms and research. Aspect-
oriented Ontology Development advances modular ontology development and
partial reuse of existing ontological knowledge. Complex Entity Recognition
enhances traditional entity recognition techniques to recognize clusters of
related textual information about entities. Semantic Pattern Mining combines
semantic web technologies with pattern learning to mine for complex models by
attaching background knowledge. This study introduces the afore-mentioned
topics by analyzing applicable scenarios with economic and industrial focus,
as well as research emphasis. Furthermore, a collection of existing datasets
for the given areas of interest is presented and evaluated. The target
audience includes researchers and developers of CSC technologies - people
interested in semantic web features, ontology development, automation,
extracting and mining valuable information in corporate environments. The aim
of this study is to provide a comprehensive and broad overview over the three
topics, give assistance for decision making in interesting scenarios and
choosing practical datasets for evaluating custom problem statements. Detailed
descriptions about attributes and metadata of the datasets should serve as
starting point for individual ideas and approaches
Development of a text mining approach to disease network discovery
Scientific literature is one of the major sources of knowledge for systems biology, in the form of papers, patents and other types of written reports. Text mining methods aim at automatically extracting relevant information from the literature. The hypothesis of this thesis was that biological systems could be elucidated by the development of text mining solutions that can automatically extract relevant information from documents. The first objective consisted in developing software components to recognize biomedical entities in text, which is the first step to generate a network about a biological system. To this end, a machine learning solution was developed, which can be trained for specific biological entities using an annotated dataset, obtaining high-quality results. Additionally, a rule-based solution was developed, which can be easily adapted to various types of entities.
The second objective consisted in developing an automatic approach to link the recognized entities to a reference knowledge base. A solution based on the PageRank algorithm was developed in order to match the entities to the concepts that most contribute to the overall coherence.
The third objective consisted in automatically extracting relations between entities, to generate knowledge graphs about biological systems. Due to the lack of annotated datasets available for this task, distant supervision was employed to train a relation classifier on a corpus of documents and a knowledge base. The applicability of this approach was demonstrated in two case studies: microRNAgene relations for cystic fibrosis, obtaining a network of 27 relations using the abstracts of 51 recently published papers; and cell-cytokine relations for tolerogenic cell therapies, obtaining a network of 647 relations from 3264 abstracts.
Through a manual evaluation, the information contained in these networks was determined to be relevant. Additionally, a solution combining deep learning techniques with ontology information was developed, to take advantage of the domain knowledge provided by ontologies.
This thesis contributed with several solutions that demonstrate the usefulness of text mining methods to systems biology by extracting domain-specific information from the literature. These solutions make it easier to integrate various areas of research, leading to a better understanding of biological systems
Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records
The extraction of phenotype information which is naturally contained in
electronic health records (EHRs) has been found to be useful in various
clinical informatics applications such as disease diagnosis. However, due to
imprecise descriptions, lack of gold standards and the demand for efficiency,
annotating phenotypic abnormalities on millions of EHR narratives is still
challenging. In this work, we propose a novel unsupervised deep learning
framework to annotate the phenotypic abnormalities from EHRs via semantic
latent representations. The proposed framework takes the advantage of Human
Phenotype Ontology (HPO), which is a knowledge base of phenotypic
abnormalities, to standardize the annotation results. Experiments have been
conducted on 52,722 EHRs from MIMIC-III dataset. Quantitative and qualitative
analysis have shown the proposed framework achieves state-of-the-art annotation
performance and computational efficiency compared with other methods.Comment: Accepted by BIBM 2019 (Regular
- …