    Visões em bancos de dados de grafos : uma abordagem multifoco para dados heterogêneos

    Orientador: Claudia Maria Bauzer MedeirosTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A pesquisa científica tornou-se cada vez mais dependente de dados. Esse novo paradigma de pesquisa demanda técnicas e tecnologias computacionais sofisticadas para apoiar tanto o ciclo de vida dos dados científicos como a colaboração entre cientistas de diferentes áreas. Uma demanda recorrente em equipes multidisciplinares é a construção de múltiplas perspectivas sobre um mesmo conjunto de dados. Soluções atuais cobrem vários aspectos, desde o projeto de padrões de interoperabilidade ao uso de sistemas de gerenciamento de bancos de dados não-relacionais. Entretanto, nenhum desses esforços atende de forma adequada a necessidade de múltiplas perspectivas, denominadas focos nesta tese. Em termos gerais, um foco é projetado e construído para atender um determinado grupo de pesquisa (mesmo no escopo de um único projeto) que necessita manipular um subconjunto de dados de interesse em múltiplos níveis de agregação/generalização. A definição e criação de um foco são tarefas complexas que demandam mecanismos capazes de manipular múltiplas representações de um mesmo fenômeno do mundo real. O objetivo desta tese é prover múltiplos focos sobre dados heterogêneos. Para atingir esse objetivo, esta pesquisa se concentrou em quatro principais problemas. Os problemas inicialmente abordados foram: (1) escolher um paradigma de gerenciamento de dados adequado e (2) elencar os principais requisitos de pesquisas multifoco. Nossos resultados nos direcionaram para a adoção de bancos de dados de grafos como solução para o problema (1) e a utilização do conceito de visões, de bancos de dados relacionais, para o problema (2). Entretanto, não há consenso sobre um modelo de dados para bancos de dados de grafos e o conceito de visões é pouco explorado nesse contexto. Com isso, os demais problemas tratados por esta pesquisa são: (3) a especificação de um modelo de dados de grafos e (4) a definição de um framework para manipular visões em bancos de dados de grafos. Nossa pesquisa nesses quatro problemas resultaram nas contribuições principais desta tese: (i) apontar o uso de bancos de dados de grafos como camada de persistência em pesquisas multifoco - um tipo de banco de dados de esquema flexível e orientado a relacionamentos que provê uma ampla compreensão sobre as relações entre os dados; (ii) definir visões para bancos de dados de grafos como mecanismo para manipular múltiplos focos, considerando operações de manipulação de dados em grafos, travessias e algoritmos de grafos; (iii) propor um modelo de dados para grafos - baseado em grafos de propriedade - para lidar com a ausência de um modelo de dados pleno para grafos; (iv) especificar e implementar um framework, denominado Graph-Kaleidoscope, para prover o uso de visões em bancos de dados de grafos e (v) validar nosso framework com dados reais em aplicações distintas - em biodiversidade e em recursos naturais - dois típicos exemplos de pesquisas multidisciplinares que envolvem a análise de interações de fenômenos a partir de dados heterogêneosAbstract: Scientific research has become data-intensive and data-dependent. This new research paradigm requires sophisticated computer science techniques and technologies to support the life cycle of scientific data and collaboration among scientists from distinct areas. A major requirement is that researchers working in data-intensive interdisciplinary teams demand construction of multiple perspectives of the world, built over the same datasets. Present solutions cover a wide range of aspects, from the design of interoperability standards to the use of non-relational database management systems. None of these efforts, however, adequately meet the needs of multiple perspectives, which are called foci in the thesis. Basically, a focus is designed/built to cater to a research group (even within a single project) that needs to deal with a subset of data of interest, under multiple ggregation/generalization levels. The definition and creation of a focus are complex tasks that require mechanisms and engines to manipulate multiple representations of the same real world phenomenon. This PhD research aims to provide multiple foci over heterogeneous data. To meet this challenge, we deal with four research problems. The first two were (1) choosing an appropriate data management paradigm; and (2) eliciting multifocus requirements. Our work towards solving these problems made as choose graph databases to answer (1) and the concept of views in relational databases for (2). However, there is no consensual data model for graph databases and views are seldom discussed in this context. Thus, research problems (3) and (4) are: (3) specifying an adequate graph data model and (4) defining a framework to handle views on graph databases. Our research in these problems results in the main contributions of this thesis: (i) to present the case for the use of graph databases in multifocus research as persistence layer - a schemaless and relationship driven type of database that provides a full understanding of data connections; (ii) to define views for graph databases to support the need for multiple foci, considering graph data manipulation, graph algorithms and traversal tasks; (iii) to propose a property graph data model (PGDM) to fill the gap of absence of a full-fledged data model for graphs; (iv) to specify and implement a framework, named Graph-Kaleidoscope, that supports views over graph databases and (v) to validate our framework for real world applications in two domains - biodiversity and environmental resources - typical examples of multidisciplinary research that involve the analysis of interactions of phenomena using heterogeneous dataDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    A software system for agent-assisted ontology building

    This thesis investigates how one can design a team of intelligent software agents that helps its human partner develop a formal ontology from a relational database and enhance it with higher-level abstractions. The resulting efficiency of ontology development could facilitate the building of intelligent decision support systems that allow: high-level semantic queries on legacy relational databases autonomous implementation within a host organization and incremental deployment without affecting the underlying database or its conventional use. We introduce a set of design principles, formulate the prototype system requirements and architecture, elaborate agent roles and interactions, develop suitable design techniques, and test the approach through practical implementation of selected features. We endow each agent with model meta-ontology, which enables it to reason and communicate about ontology, and planning meta-ontology, which captures the role-specific know-how of the ontology building method. We also assess the maturity of development tools for a larger-scale implementation. --Leaf i.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b214471

    Neural Networks forBuilding Semantic Models and Knowledge Graphs

    Applications of flexible querying to graph data

    Graph data models provide flexibility and extensibility that makes them well-suited to modelling data that may be irregular, complex, and evolving in structure and content. However, a consequence of this is that users may not be familiar with the full structure of the data, which itself may be changing over time, making it hard for users to formulate queries that precisely match the data graph and meet their information seeking requirements. There is a need therefore for flexible querying systems over graph data that can automatically make changes to the user's query so as to find additional or different answers, and so help the user to retrieve information of relevance to them. This chapter describes recent work in this area, looking at a variety of graph query languages, applications, flexible querying techniques and implementations

    Linked data as medium for distributed Multi-Agent Systems

    The conceptual design and discussion of multi-agents systems (MAS) typically focuses on agents and their models, and the elements and effects in the environment which they perceive. This view, however, leaves out potential pitfalls in the later implementation of the system that may stem from limitations in data models, interfaces, or protocols by which agents and environments exchange information. By today, the research community agrees that for this, that the environment should be understood as well as abstraction layer by which agents access, interpret, and modify elements within the environment. This, however, blurs the the line of the environment being the sum of interactive elements and phenomena perceivable by agents, and the underlying technology by which this information and interactions are offered to agents. This thesis proposes as remedy to consider as third component of multi agent systems, besides agents and environments, the digital medium by which the environment is provided to agents. "Medium" then refers to exactly this technological component via which environment data is published interactively towards the agents, and via which agents perceive, interpret, and finally, modify the underlying environment data. Furthermore, this thesis will detail how MAS may use capabilities of a properly chosen medium to achieve coordinating system behaviors. A suitable candidate technology for digital agent media comes from the Semantic Web in form of Linked Data. In addition to conceptual discussions about the notions of digital agent media, this thesis will provide in detail a specification of a Linked Data agent medium, and detail on means to implement MAS around Linked Data media technologies.Sowohl der konzeptuelle Entwurf von, als auch die wissenschaftliche Diskussion über Multi-Agenten-Systeme (MAS) konzentrieren sich für gewöhnlich auf die Agenten selbst, die Agentenmodelle, sowie die Elemente und Effekte, die sie in ihrer Umgebung wahrnehmen. Diese Betrachtung lässt jedoch mögliche Probleme in einer späteren Implementierung aus, die von Einschränkungen in Datenmodellen, Schnittstellen, oder Protokollen herrühren können, über die Agenten und ihre Umgebung Informationen miteinander austauschen. Heutzutage ist sich die Forschungsgemeinschaft einig, dass die Umgebung als solche als Abstraktionsschicht verstanden werden sollte, über die Agenten Umgebungseffekte und -elemente wahrnehmen, interpretieren, und mit ihnen interagieren. Diese Betrachtungsweise verschleiert jedoch die Trennung zwischen der Umgebung als die Sammlung interaktiver Elemente und wahrnehmbarer Phänomene auf der einen Seite, und der zugrundeliegenden Technologie, über die diese Information den Agenten bereitgestellt wird, auf der anderen. Diese Dissertation schlägt als Lösung vor, zusätzlich zu Agenten undUmgebung ein digitales Medium, über das Agenten die Umgebung bereitgestellt wird, als drittes Element von Multi-Agenten-Systemen zu betrachten. Der Begriff "Medium" bezieht sich dann genau auf diese technologische Komponente, über die Umgebungsinformationen Agenten interaktiv bereitgestellt werden, und über die Agenten die zugrundeliegenden Daten wahrnehmen, interpretieren, und letztendlich modifizieren. Desweiteren wird diese Dissertation aufzeigen, wie die Eigenschaften eines sorgfältig gewählten Mediums ausgenutzt werden können, um ein koordiniertes Systemverhalten zu erreichen. Ein geeigneter Kandidat für ein digitales Agentenmedium findet sich im Ökosystem des „Semantic Web”, in Form von „Linked Data”, wörtlich („verknüpfte Daten”). Zusätzlich zu einer konzeptionellen Diskussion über die Natur digitaler Agenten- Media, spezifiziert diese Dissertation „Linked Data” als Agentenmedium detailliert aus, und beschreibt im Detail die Mittel, wie sich MAS um Linked Data Technologien herum implementieren lassen

    Optimizing Analytical Queries over Semantic Web Sources

    Visual exploration of semantic-web-based knowledge structures

    Humans have a curious nature and seek a better understanding of the world. Data, in- formation, and knowledge became assets of our modern society through the information technology revolution in the form of the internet. However, with the growing size of accumulated data, new challenges emerge, such as searching and navigating in these large collections of data, information, and knowledge. The current developments in academic and industrial contexts target the corresponding challenges using Semantic Web techno- logies. The Semantic Web is an extension of the Web and provides machine-readable representations of knowledge for various domains. These machine-readable representations allow intelligent machine agents to understand the meaning of the data and information; and enable additional inference of new knowledge. Generally, the Semantic Web is designed for information exchange and its processing and does not focus on presenting such semantically enriched data to humans. Visualizations support exploration, navigation, and understanding of data by exploiting humans’ ability to comprehend complex data through visual representations. In the context of Semantic- Web-Based knowledge structures, various visualization methods and tools are available, and new ones are being developed every year. However, suitable visualizations are highly dependent on individual use cases and targeted user groups. In this thesis, we investigate visual exploration techniques for Semantic-Web-Based knowledge structures by addressing the following challenges: i) how to engage various user groups in modeling such semantic representations; ii) how to facilitate understanding using customizable visual representations; and iii) how to ease the creation of visualizations for various data sources and different use cases. The achieved results indicate that visual modeling techniques facilitate the engagement of various user groups in ontology modeling. Customizable visualizations enable users to adjust visualizations to the current needs and provide different views on the data. Additionally, customizable visualization pipelines enable rapid visualization generation for various use cases, data sources, and user group

    Semantic Data Management in Data Lakes

    In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontologybased data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies

    Knowledge-based Biomedical Data Science 2019

    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table