157 research outputs found

    Accessing scientific data through knowledge graphs with Ontop.

    Get PDF
    In this tutorial, we learn how to set up and exploit the virtual knowledge graph (VKG) approach to access data stored in relational legacy systems and to enrich such data with domain knowledge coming from different heterogeneous (biomedical) resources. The VKG approach is based on an ontology that describes a domain of interest in terms of a vocabulary familiar to the user and exposes a high-level conceptual view of the data. Users can access the data by exploiting the conceptual view, and in this way they do not need to be aware of low-level storage details. They can easily integrate ontologies coming from different sources and can obtain richer answers thanks to the interaction between data and domain knowledge

    Ontology-Based Framework for the Automatic Recognition of Activities of Daily Living Using Class Expression Learning Techniques

    Get PDF
    The miniaturization and price reduction of sensors have encouraged the proliferation of smart environments, in which multitudinous sensors detect and describe the activities carried out by inhabitants. In this context, the recognition of activities of daily living has represented one of the most developed research areas in recent years. Its objective is to determine what daily activity is developed by the inhabitants of a smart environment. In this field, many proposals have been presented in the literature, many of them being based on ad hoc ontologies to formalize logical rules, which hinders their reuse in other contexts. In this work, we propose the use of class expression learning (CEL), an ontology-based data mining technique, for the recognition of ADL. This technique is based on combining the entities in the ontology, trying to find the expressions that best describe those activities. As far as we know, it is the first time that this technique is applied to this problem. To evaluate the performance of CEL for the automatic recognition of activities, we have first developed a framework that is able to convert many of the available datasets to all the ontology models we have found in the literature for dealing with ADL. Two different CEL algorithms have been employed for the recognition of eighteen activities in two different datasets. Although all the available ontologies in the literature are focused on the description of the context of the activities, the results show that the sequence of the events produced by the sensors is more relevant for their automatic recognition, in general terms

    Documenting Data Integration Using Knowledge Graphs

    Get PDF
    With the increasing volume of data on the Web and the proliferation of published knowledge graphs, there is a growing need for improved data management and information extraction. However, heterogeneity issues across the data sources, i.e., various formats and systems, negatively impact efficient access, manage, reuse, and analyze the data. A data integration system (DIS) provides uniform access to heterogeneous data sources and their relationships; it offers a unified and comprehensive view of the data. DISs resort to mapping rules, expressed in declarative languages like RML, to align data from various sources to classes and properties defined in an ontology. This work defines a knowledge graph where data integration systems are represented as factual statements. The aim of this work is to provide the basis for integrated analysis of data collected from heterogeneous data silos. The proposed knowledge graph is also specified as a data integration system, that integrates all data integration systems. The proposed solution includes a unified schema, which defines and explains the relationships between all elements in the data integration system DIS=⟹G, S, M, F⟩. The results suggest that factual statements from the proposed knowledge graph, improve the understanding of the features that characterize knowledge graphs declaratively defined like data integration systems

    Ontology-Driven Semantic Annotations for Multiple Engineering Viewpoints in Computer Aided Design

    Get PDF
    Engineering design involves a series of activities to handle data, including capturing and storing data, retrieval and manipulation of data. This also applies throughout the entire product lifecycle (PLC). Unfortunately, a closed loop of knowledge and information management system has not been implemented for the PLC. As part of product lifecycle management (PLM) approaches, computer-aided design (CAD) systems are extensively used from embodiment and detail design stages in mechanical engineering. However, current CAD systems lack the ability to handle semantically-rich information, thus to represent, manage and use knowledge among multidisciplinary engineers, and to integrate various tools/services with distributed data and knowledge. To address these challenges, a general-purpose semantic annotation approach based on CAD systems in the mechanical engineering domain is proposed, which contributes to knowledge management and reuse, data interoperability and tool integration. In present-day PLM systems, annotation approaches are currently embedded in software applications and use diverse data and anchor representations, making them static, inflexible and difficult to incorporate with external systems. This research will argue that it is possible to take a generalised approach to annotation with formal annotation content structures and anchoring mechanisms described using general-purpose ontologies. In this way viewpoint-oriented annotation may readily be captured, represented and incorporated into PLM systems together with existing annotations in a common framework, and the knowledge collected or generated from multiple engineering viewpoints may be reasoned with to derive additional knowledge to enable downstream processes. Therefore, knowledge can be propagated and evolved through the PLC. Within this framework, a knowledge modelling methodology has also been proposed for developing knowledge models in various situations. In addition, a prototype system has been designed and developed in order to evaluate the core contributions of this proposed concept. According to an evaluation plan, cost estimation and finite element analysis as case studies have been used to validate the usefulness, feasibility and generality of the proposed framework. Discussion has been carried out based on this evaluation. As a conclusion, the presented research work has met the original aim and objectives, and can be improved further. At the end, some research directions have been suggested.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Enabling Complex Semantic Queries to Bioinformatics Databases through Intuitive Search Over Data

    Get PDF
    Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data already available publicly. However, the heterogene- ity of the existing data sources still poses significant challenges for achieving interoperability among biological databases. Furthermore, merely solving the technical challenges of data in- tegration, for example through the use of common data representation formats, leaves open the larger problem. Namely, the steep learning curve required for understanding the data models of each public source, as well as the technical language through which the sources can be queried and joined. As a consequence, most of the available biological data remain practically unexplored today. In this thesis, we address these problems jointly, by first introducing an ontology-based data integration solution in order to mitigate the data source heterogeneity problem. We illustrate through the concrete example of Bgee, a gene expression data source, how relational databases can be exposed as virtual Resource Description Framework (RDF) graphs, through relational-to-RDF mappings. This has the important advantage that the original data source can remain unmodified, while still becoming interoperable with external RDF sources. We complement our methods with applied case studies designed to guide domain experts in formulating expressive federated queries targeting the integrated data across the domains of evolutionary relationships and gene expression. More precisely, we introduce two com- parative analyses, first within the same domain (using orthology data from multiple, inter- operable, data sources) and second across domains, in order to study the relation between expression change and evolution rate following a duplication event. Finally, in order to bridge the semantic gap between users and data, we design and im- plement Bio-SODA, a question answering system over domain knowledge graphs, that does not require training data for translating user questions to SPARQL. Bio-SODA uses a novel ranking approach that combines syntactic and semantic similarity, while also incorporating node centrality metrics to rank candidate matches for a given user question. Our results in testing Bio-SODA across several real-world databases that span multiple domains (both within and outside bioinformatics) show that it can answer complex, multi-fact queries, be- yond the current state-of-the-art in the more well-studied open-domain question answering. -- L’intĂ©gration des donnĂ©es promet d’ĂȘtre l’un des principaux catalyseurs permettant d’extraire des nouveaux aperçus de la richesse des donnĂ©es biologiques dĂ©jĂ  disponibles publiquement. Cependant, l’hĂ©tĂ©rogĂ©nĂ©itĂ© des sources de donnĂ©es existantes pose encore des dĂ©fis importants pour parvenir Ă  l’interopĂ©rabilitĂ© des bases de donnĂ©es biologiques. De plus, en surmontant seulement les dĂ©fis techniques de l’intĂ©gration des donnĂ©es, par exemple grĂące Ă  l’utilisation de formats standard de reprĂ©sentation de donnĂ©es, on laisse ouvert un problĂšme encore plus grand. À savoir, la courbe d’apprentissage abrupte nĂ©cessaire pour comprendre la modĂ©li- sation des donnĂ©es choisie par chaque source publique, ainsi que le langage technique par lequel les sources peuvent ĂȘtre interrogĂ©s et jointes. Par consĂ©quent, la plupart des donnĂ©es biologiques publiquement disponibles restent pratiquement inexplorĂ©s aujourd’hui. Dans cette thĂšse, nous abordons l’ensemble des deux problĂšmes, en introduisant d’abord une solution d’intĂ©gration de donnĂ©es basĂ©e sur ontologies, afin d’attĂ©nuer le problĂšme d’hĂ©tĂ©- rogĂ©nĂ©itĂ© des sources de donnĂ©es. Nous montrons, Ă  travers l’exemple de Bgee, une base de donnĂ©es d’expression de gĂšnes, une approche permettant les bases de donnĂ©es relationnelles d’ĂȘtre publiĂ©s sous forme de graphes RDF (Resource Description Framework) virtuels, via des correspondances relationnel-vers-RDF (« relational-to-RDF mappings »). Cela prĂ©sente l’important avantage que la source de donnĂ©es d’origine peut rester inchangĂ©, tout en de- venant interopĂ©rable avec les sources RDF externes. Nous complĂ©tons nos mĂ©thodes avec des Ă©tudes de cas appliquĂ©es, conçues pour guider les experts du domaine dans la formulation de requĂȘtes fĂ©dĂ©rĂ©es expressives, ciblant les don- nĂ©es intĂ©grĂ©es dans les domaines des relations Ă©volutionnaires et de l’expression des gĂšnes. Plus prĂ©cisĂ©ment, nous introduisons deux analyses comparatives, d’abord dans le mĂȘme do- maine (en utilisant des donnĂ©es d’orthologie provenant de plusieurs sources de donnĂ©es in- teropĂ©rables) et ensuite Ă  travers des domaines interconnectĂ©s, afin d’étudier la relation entre le changement d’expression et le taux d’évolution suite Ă  une duplication de gĂšne. Enfin, afin de mitiger le dĂ©calage sĂ©mantique entre les utilisateurs et les donnĂ©es, nous concevons et implĂ©mentons Bio-SODA, un systĂšme de rĂ©ponse aux questions sur des graphes de connaissances domaine-spĂ©cifique, qui ne nĂ©cessite pas de donnĂ©es de formation pour traduire les questions des utilisateurs vers SPARQL. Bio-SODA utilise une nouvelle ap- proche de classement qui combine la similaritĂ© syntactique et sĂ©mantique, tout en incorporant des mĂ©triques de centralitĂ© des nƓuds, pour classer les possibles candidats en rĂ©ponse Ă  une question utilisateur donnĂ©e. Nos rĂ©sultats suite aux tests effectuĂ©s en utilisant Bio-SODA sur plusieurs bases de donnĂ©es Ă  travers plusieurs domaines (tantĂŽt liĂ©s Ă  la bioinformatique qu’extĂ©rieurs) montrent que Bio-SODA rĂ©ussit Ă  rĂ©pondre Ă  des questions complexes, en- gendrant multiples entitĂ©s, au-delĂ  de l’état actuel de la technique en matiĂšre de systĂšmes de rĂ©ponses aux questions sur les donnĂ©es structures, en particulier graphes de connaissances

    Protein Ontology: Enhancing and scaling up the representation of protein entities

    Get PDF
    The Protein Ontology (PRO; http://purl.obolibrary.org/obo/pr) formally defines and describes taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translational modification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities

    Semantic reasoning on the edge of internet of things

    Get PDF
    Abstract. The Internet of Things (IoT) is a paradigm where physical objects are connected with each other with identifying, sensing, networking and processing capabilities over the Internet. Millions of new devices will be added into IoT network thus generating huge amount of data. How to represent, store, interconnect, search, and organize information generated by IoT devices become a challenge. Semantic technologies could play an important role by encoding meaning into data to enable a computer system to possess knowledge and reasoning. The vast amount of devices and data are also challenges. Edge Computing reduces both network latency and resource consumptions by deploying services and distributing computing tasks from the core network to the edge. We recognize four challenges from IoT systems. First the centralized server may generate long latency because of physical distances. Second concern is that the resource-constrained IoT devices have limited computing ability in processing heavy tasks. Third, the data generated by heterogeneous devices can hardly be understood and utilized by other devices or systems. Our research focuses on these challenges and provide a solution based on Edge computing and semantic technologies. We utilize Edge computing and semantic reasoning into IoT. Edge computing distributes tasks to the reasoning devices, which we call the Edge nodes. They are close to the terminal devices and provide services. The newly added resources could balance the workload of the systems and improve the computing capability. We annotate meaning into the data with Resource Description Framework thus providing an approach for heterogeneous machines to understand and utilize the data. We use semantic reasoning as a general purpose intelligent processing method. The thesis work focuses on studying semantic reasoning performance in IoT system with Edge computing paradigm. We develop an Edge based IoT system with semantic technologies. The system deploys semantic reasoning services on Edge nodes. Based on IoT system, we design five experiments to evaluate the performance of the integrated IoT system. We demonstrate how could the Edge computing paradigm facilitate IoT in terms of data transforming, semantic reasoning and service experience. We analyze how to improve the performance by properly distributing the task for Cloud and Edge nodes. The thesis work result shows that the Edge computing could improve the performance of the semantic reasoning in IoT

    CLO: The cell line ontology

    Get PDF
    Abstract Background Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions. Construction and content Collaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as ‘cell line’, ‘cell line cell’, ‘cell line culturing’, and ‘mortal’ vs. ‘immortal cell line cell’. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms. Utility and discussion The CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO’s utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development.http://deepblue.lib.umich.edu/bitstream/2027.42/109554/1/13326_2013_Article_185.pd
    • 

    corecore