42 research outputs found

    LIPIcs, Volume 258, SoCG 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 258, SoCG 2023, Complete Volum

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Enabling Complex Semantic Queries to Bioinformatics Databases through Intuitive Search Over Data

    Get PDF
    Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data already available publicly. However, the heterogene- ity of the existing data sources still poses significant challenges for achieving interoperability among biological databases. Furthermore, merely solving the technical challenges of data in- tegration, for example through the use of common data representation formats, leaves open the larger problem. Namely, the steep learning curve required for understanding the data models of each public source, as well as the technical language through which the sources can be queried and joined. As a consequence, most of the available biological data remain practically unexplored today. In this thesis, we address these problems jointly, by first introducing an ontology-based data integration solution in order to mitigate the data source heterogeneity problem. We illustrate through the concrete example of Bgee, a gene expression data source, how relational databases can be exposed as virtual Resource Description Framework (RDF) graphs, through relational-to-RDF mappings. This has the important advantage that the original data source can remain unmodified, while still becoming interoperable with external RDF sources. We complement our methods with applied case studies designed to guide domain experts in formulating expressive federated queries targeting the integrated data across the domains of evolutionary relationships and gene expression. More precisely, we introduce two com- parative analyses, first within the same domain (using orthology data from multiple, inter- operable, data sources) and second across domains, in order to study the relation between expression change and evolution rate following a duplication event. Finally, in order to bridge the semantic gap between users and data, we design and im- plement Bio-SODA, a question answering system over domain knowledge graphs, that does not require training data for translating user questions to SPARQL. Bio-SODA uses a novel ranking approach that combines syntactic and semantic similarity, while also incorporating node centrality metrics to rank candidate matches for a given user question. Our results in testing Bio-SODA across several real-world databases that span multiple domains (both within and outside bioinformatics) show that it can answer complex, multi-fact queries, be- yond the current state-of-the-art in the more well-studied open-domain question answering. -- L’intégration des données promet d’être l’un des principaux catalyseurs permettant d’extraire des nouveaux aperçus de la richesse des données biologiques déjà disponibles publiquement. Cependant, l’hétérogénéité des sources de données existantes pose encore des défis importants pour parvenir à l’interopérabilité des bases de données biologiques. De plus, en surmontant seulement les défis techniques de l’intégration des données, par exemple grâce à l’utilisation de formats standard de représentation de données, on laisse ouvert un problème encore plus grand. À savoir, la courbe d’apprentissage abrupte nécessaire pour comprendre la modéli- sation des données choisie par chaque source publique, ainsi que le langage technique par lequel les sources peuvent être interrogés et jointes. Par conséquent, la plupart des données biologiques publiquement disponibles restent pratiquement inexplorés aujourd’hui. Dans cette thèse, nous abordons l’ensemble des deux problèmes, en introduisant d’abord une solution d’intégration de données basée sur ontologies, afin d’atténuer le problème d’hété- rogénéité des sources de données. Nous montrons, à travers l’exemple de Bgee, une base de données d’expression de gènes, une approche permettant les bases de données relationnelles d’être publiés sous forme de graphes RDF (Resource Description Framework) virtuels, via des correspondances relationnel-vers-RDF (« relational-to-RDF mappings »). Cela présente l’important avantage que la source de données d’origine peut rester inchangé, tout en de- venant interopérable avec les sources RDF externes. Nous complétons nos méthodes avec des études de cas appliquées, conçues pour guider les experts du domaine dans la formulation de requêtes fédérées expressives, ciblant les don- nées intégrées dans les domaines des relations évolutionnaires et de l’expression des gènes. Plus précisément, nous introduisons deux analyses comparatives, d’abord dans le même do- maine (en utilisant des données d’orthologie provenant de plusieurs sources de données in- teropérables) et ensuite à travers des domaines interconnectés, afin d’étudier la relation entre le changement d’expression et le taux d’évolution suite à une duplication de gène. Enfin, afin de mitiger le décalage sémantique entre les utilisateurs et les données, nous concevons et implémentons Bio-SODA, un système de réponse aux questions sur des graphes de connaissances domaine-spécifique, qui ne nécessite pas de données de formation pour traduire les questions des utilisateurs vers SPARQL. Bio-SODA utilise une nouvelle ap- proche de classement qui combine la similarité syntactique et sémantique, tout en incorporant des métriques de centralité des nœuds, pour classer les possibles candidats en réponse à une question utilisateur donnée. Nos résultats suite aux tests effectués en utilisant Bio-SODA sur plusieurs bases de données à travers plusieurs domaines (tantôt liés à la bioinformatique qu’extérieurs) montrent que Bio-SODA réussit à répondre à des questions complexes, en- gendrant multiples entités, au-delà de l’état actuel de la technique en matière de systèmes de réponses aux questions sur les données structures, en particulier graphes de connaissances

    Knowledge hypergraph based-approach for multi-source data integration and querying : Application for Earth Observation domain

    Get PDF
    Early warning against natural disasters to save lives and decrease damages has drawn increasing interest to develop systems that observe, monitor, and assess the changes in the environment. Over the last years, numerous environmental monitoring systems and Earth Observation (EO) programs were implemented. Nevertheless, these systems generate a large amount of EO data while using different vocabularies and different conceptual schemas. Accordingly, data resides in many siloed systems and are mainly untapped for integrated operations, insights, and decision making situations. To overcome the insufficient exploitation of EO data, a data integration system is crucial to break down data silos and create a common information space where data will be semantically linked. Within this context, we propose a semantic data integration and querying approach, which aims to semantically integrate EO data and provide an enhanced query processing in terms of accuracy, completeness, and semantic richness of response. . To do so, we defined three main objectives. The first objective is to capture the knowledge of the environmental monitoring domain. To do so, we propose MEMOn, a domain ontology that provides a common vocabulary of the environmental monitoring domain in order to support the semantic interoperability of heterogeneous EO data. While creating MEMOn, we adopted a development methodology, including three fundamental principles. First, we used a modularization approach. The idea is to create separate modules, one for each context of the environment domain in order to ensure the clarity of the global ontology’s structure and guarantee the reusability of each module separately. Second, we used the upper-level ontology Basic Formal Ontology and the mid-level ontologies, the Common Core ontologies, to facilitate the integration of the ontological modules in order to build the global one. Third, we reused existing domain ontologies such as ENVO and SSN, to avoid creating the ontology from scratch, and this can improve its quality since the reused components have already been evaluated. MEMOn is then evaluated using real use case studies, according to the Sahara and Sahel Observatory experts’ requirements. The second objective of this work is to break down the data silos and provide a common environmental information space. Accordingly, we propose a knowledge hypergraphbased data integration approach to provide experts and software agents with a virtual integrated and linked view of data. This approach generates RML mappings between the developed ontology and metadata and then creates a knowledge hypergraph that semantically links these mappings to identify more complex relationships across data sources. One of the strengths of the proposed approach is it goes beyond the process of combining data retrieved from multiple and independent sources and allows the virtual data integration in a highly semantic and expressive way, using hypergraphs. The third objective of this thesis concerns the enhancement of query processing in terms of accuracy, completeness, and semantic richness of response in order to adapt the returned results and make them more relevant and richer in terms of relationships. Accordingly, we propose a knowledge-hypergraph based query processing that improves the selection of sources contributing to the final result of an input query. Indeed, the proposed approach moves beyond the discovery of simple one-to-one equivalence matches and relies on the identification of more complex relationships across data sources by referring to the knowledge hypergraph. This enhancement significantly showcases the increasing of answer completeness and semantic richness. The proposed approach was implemented in an open-source tool and has proved its effectiveness through a real use case in the environmental monitoring domain

    Neural Networks forBuilding Semantic Models and Knowledge Graphs

    Get PDF
    1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInoopenFutia, Giusepp

    Engineering Background Knowledge for Social Robots

    Get PDF
    Social robots are embodied agents that continuously perform knowledge-intensive tasks involving several kinds of information coming from different heterogeneous sources. Providing a framework for engineering robots' knowledge raises several problems like identifying sources of information and modeling solutions suitable for robots' activities, integrating knowledge coming from different sources, evolving this knowledge with information learned during robots' activities, grounding perceptions on robots' knowledge, assessing robots' knowledge with respect humans' one and so on. In this thesis we investigated feasibility and benefits of engineering background knowledge of Social Robots with a framework based on Semantic Web technologies and Linked Data. This research has been supported and guided by a case study that provided a proof of concept through a prototype tested in a real socially assistive context

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    LODデータセット生成の自動化のための宣言的記述による半構造化データの抽出とその統合手法

    Get PDF
    筑波大学修士(情報学)学位論文・平成31年3月25日授与(41278号
    corecore