    A foundation for ontology modularisation

    There has been great interest in realising the Semantic Web. Ontologies are used to define Semantic Web applications. Ontologies have grown to be large and complex to the point where it causes cognitive overload for humans, in understanding and maintaining, and for machines, in processing and reasoning. Furthermore, building ontologies from scratch is time-consuming and not always necessary. Prospective ontology developers could consider using existing ontologies that are of good quality. However, an entire large ontology is not always required for a particular application, but a subset of the knowledge may be relevant. Modularity deals with simplifying an ontology for a particular context or by structure into smaller ontologies, thereby preserving the contextual knowledge. There are a number of benefits in modularising an ontology including simplified maintenance and machine processing, as well as collaborative efforts whereby work can be shared among experts. Modularity has been successfully applied to a number of different ontologies to improve usability and assist with complexity. However, problems exist for modularity that have not been satisfactorily addressed. Currently, modularity tools generate large modules that do not exclusively represent the context. Partitioning tools, which ought to generate disjoint modules, sometimes create overlapping modules. These problems arise from a number of issues: different module types have not been clearly characterised, it is unclear what the properties of a 'good' module are, and it is unclear which evaluation criteria applies to specific module types. In order to successfully solve the problem, a number of theoretical aspects have to be investigated. It is important to determine which ontology module types are the most widely-used and to characterise each such type by distinguishing properties. One must identify properties that a 'good' or 'usable' module meets. In this thesis, we investigate these problems with modularity systematically. We begin by identifying dimensions for modularity to define its foundation: use-case, technique, type, property, and evaluation metric. Each dimension is populated with sub-dimensions as fine-grained values. The dimensions are used to create an empirically-based framework for modularity by classifying a set of ontologies with them, which results in dependencies among the dimensions. The formal framework can be used to guide the user in modularising an ontology and as a starting point in the modularisation process. To solve the problem with module quality, new and existing metrics were implemented into a novel tool TOMM, and an experimental evaluation with a set of modules was performed resulting in dependencies between the metrics and module types. These dependencies can be used to determine whether a module is of good quality. For the issue with existing modularity techniques, we created five new algorithms to improve the current tools and techniques and experimentally evaluate them. The algorithms of the tool, NOMSA, performs as well as other tools for most performance criteria. For NOMSA's generated modules, two of its algorithms' generated modules are good quality when compared to the expected dependencies of the framework. The remaining three algorithms' modules correspond to some of the expected values for the metrics for the ontology set in question. The success of solving the problems with modularity resulted in a formal foundation for modularity which comprises: an exhaustive set of modularity dimensions with dependencies between them, a framework for guiding the modularisation process and annotating module, a way to measure the quality of modules using the novel TOMM tool which has new and existing evaluation metrics, the SUGOI tool for module management that has been investigated for module interchangeability, and an implementation of new algorithms to fill in the gaps of insufficient tools and techniques

    A semantic framework for ontology usage analysis

    The Semantic Web envisions a Web where information is accessible and processable by computers as well as humans. Ontologies are the cornerstones for realizing this vision of the Semantic Web by capturing domain knowledge by defining the terms and the relationship between these terms to provide a formal representation of the domain with machine-understandable semantics. Ontologies are used for semantic annotation, data interoperability and knowledge assimilation and dissemination.In the literature, different approaches have been proposed to build and evolve ontologies, but in addition to these, one more important concept needs to be considered in the ontology lifecycle, that is, its usage. Measuring the “usage” of ontologies will help us to effectively and efficiently make use of semantically annotated structured data published on the Web (formalized knowledge published on the Web), improve the state of ontology adoption and reusability, provide a usage-based feedback loop to the ontology maintenance process for a pragmatic conceptual model update, and source information accurately and automatically which can then be utilized in the other different areas of the ontology lifecycle. Ontology Usage Analysis is the area which evaluates, measures and analyses the use of ontologies on the Web. However, in spite of its importance, no formal approach is present in the literature which focuses on measuring the use of ontologies on the Web. This is in contrast to the approaches proposed in the literature on the other concepts of the ontology lifecycle, such as ontology development, ontology evaluation and ontology evolution. So, to address this gap, this thesis is an effort in such a direction to assess, analyse and represent the use of ontologies on the Web.In order to address the problem and realize the abovementioned benefits, an Ontology Usage Analysis Framework (OUSAF) is presented. The OUSAF Framework implements a methodological approach which is comprised of identification, investigation, representation and utilization phases. These phases provide a complete solution for usage analysis by allowing users to identify the key ontologies, and investigate, represent and utilize usage analysis results. Various computation components with several methods, techniques, and metrics for each phase are presented and evaluated using the Semantic Web data crawled from the Web. For the dissemination of ontology-usage-related information accessible to machines and humans, The U Ontology is presented to formalize the conceptual model of the ontology usage domain. The evaluation of the framework, solution components, methods, and a formalized conceptual model is presented, indicating the usefulness of the overall proposed solution

    Helping scientists integrate and interact with biomedical data

    Tese de mestrado, Bioinformática e Biologia Computacional , 2021, Universidade de Lisboa, Faculdade de CiênciasFor the past decades, the amount and complexity of biomedical data available have increased and far exceeded the human capacity to process it. To support this, knowledge graphs and ontologies have been increasingly used, allowing semantic integration of heterogeneous data within and across domains. However, the independent development of biomedical ontologies has created heterogeneity problems, with the design of ontologies with overlapping domains or significant differences. Automated ontology alignment techniques have been developed to tackle the semantic heterogeneity problem, by establishing meaningful correspondences between entities of two ontologies. However, their performance is limited, and the alignments they produce can contain erroneous, incoherent, or missing mappings. Therefore, manual validation of automated ontology alignments remains essential to ensure their quality. Given the complexity of the ontology matching process, is important to provide visualization and a user interface with the necessary features to support the exploration, validation, and edition of alignments. However, these aspects are often overlooked, as few alignment systems feature user interfaces enabling alignment visualization, fewer allow editing alignments, and fewer provide the functionalities needed to make the task seamless for users. This dissertation developed VOWLMap — an extension for the standalone web application, WebVOWL — for visualizing, editing, and validating biomedical ontology alignments. This work extended the Visual Notation for OWL Ontologies (VOWL), which defines a visual representation for most language constructs of OWL, to support graphical representations of alignments and restructured WebVOWL to load and visualize alignments. VOWLMap employs modularization techniques to facilitate the visualization of large alignments, while maintaining the context of each mapping, and offers a dynamic visualization that supports interaction mechanisms, including direct interaction with and editing of graph representations. A user study was conducted to evaluate the usability and performance of VOWLMap, having obtained positive feedback with an excellent score in a standard usability questionnaire

    Academia/Industry DynAmics (AIDA): A knowledge Graph within the scholarly domain and its applications

    Scholarly knowledge graphs are a form of knowledge representation that aims to capture and organize the information and knowledge contained in scholarly publications, such as research papers, books, patents, and datasets. Scholarly knowledge graphs can provide a comprehensive and structured view of the scholarly domain, covering various aspects such as authors, affiliations, research topics, methods, results, citations, and impact. Scholarly knowledge graphs can enable various applications and services that can facilitate and enhance scholarly communication, such as information retrieval, data analysis, recommendation systems, semantic search, and knowledge discovery. However, constructing and maintaining scholarly knowledge graphs is a challenging task that requires dealing with large-scale, heterogeneous, and dynamic data sources. Moreover, extracting and integrating the relevant information and knowledge from unstructured or semi-structured text is not trivial, as it involves natural language processing, machine learning, ontology engineering, and semantic web technologies. Furthermore, ensuring the quality and validity of the scholarly knowledge graphs is essential for their usability and reliability

    Yavaa: supporting data workflows from discovery to visualization

    Recent years have witness an increasing number of data silos being opened up both within organizations and to the general public: Scientists publish their raw data as supplements to articles or even standalone artifacts to enable others to verify and extend their work. Governments pass laws to open up formerly protected data treasures to improve accountability and transparency as well as to enable new business ideas based on this public good. Even companies share structured information about their products and services to advertise their use and thus increase revenue. Exploiting this wealth of information holds many challenges for users, though. Oftentimes data is provided as tables whose sheer endless rows of daunting numbers are barely accessible. InfoVis can mitigate this gap. However, offered visualization options are generally very limited and next to no support is given in applying any of them. The same holds true for data wrangling. Only very few options to adjust the data to the current needs and barely any protection are in place to prevent even the most obvious mistakes. When it comes to data from multiple providers, the situation gets even bleaker. Only recently tools emerged to search for datasets across institutional borders reasonably. Easy-to-use ways to combine these datasets are still missing, though. Finally, results generally lack proper documentation of their provenance. So even the most compelling visualizations can be called into question when their coming about remains unclear. The foundations for a vivid exchange and exploitation of open data are set, but the barrier of entry remains relatively high, especially for non-expert users. This thesis aims to lower that barrier by providing tools and assistance, reducing the amount of prior experience and skills required. It covers the whole workflow ranging from identifying proper datasets, over possible transformations, up until the export of the result in the form of suitable visualizations

    Linked Data Entity Summarization

    On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion

    Debugging scripts in SPipes editor

    Jazyk SPipes je technologie umožňující zpracování strukturovaných dat Sémantického webu. Tato diplomová práce se zabývá zlepšením stavu stávajícího editoru SPipes skriptů. V práci jsou nejprve představeny principy Sémantického webu a relevantní technologie. Na základě zevrubné analýzy již existujícího editoru a provedené rešerše byla navržena úprava architektury aplikace a definovány funkční a nefunkčí požadavky na editor. Hlavní přínosy práce jsou převedení backendové části z jazyka Scala do Javy za účelem odstranění problémů vyplývajících z nekompatibility mezi jazykem Scala a Spring frameworkem, který je použit. Dále pak vytvoření testů, které zjednodušují odhalení potenciálních chyb v aplikaci, rozdělení původně monolytické aplikace na několik oddělených služeb využívajících Docker a docker-compose, čímž se výrazně sníží práce spojená se správnou konfigurací a spouštěním aplikace. V neposlední řadě přináší tato práce nové a netriviální funkce editoru - možnost validovat a ladit editované skripty a moduly.The SPipes language is a technology that enables the processing of structured data in the form of the Semantic Web. This thesis attempts to improve the existing SPipes script editor. The thesis first introduces the principles of the Semantic Web and related technologies. Based on a thorough analysis of the existing editor and conducted survey, the application architecture was redesigned and functional and non-functional requirements for the editor were defined. Main contributions of this work are re-implementation of the backend part from Scala to Java, which eliminates the compatibility issues arising from the incompatibility between Scala and the Spring framework that is used. Special attention was paid to writing tests for most parts of the application, which simplifies the detection of potential bugs in the application. Major change in architecture was to split the originally monolithic application into several separate services with the use of Docker and docker-compose, leading to simpler configuration and easier deployment of the application. Last but not least, this thesis introduces new non-trivial features of the editor - the capability of validating and debugging of SPipes scripts and modules