54 research outputs found

    Incremental schema integration for data wrangling via knowledge graphs

    Get PDF
    Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data integration. Schema integration requires a set of well-known constructs: the data source schemata and wrappers, a global integrated schema and the mappings between them. Based on them, virtual data integration systems enable fast and on-demand data exploration via query rewriting. Unfortunately, the generation of such constructs is currently performed in a largely manual manner, hindering its feasibility in real scenarios. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental approach grounded on knowledge graphs to generate the required schema integration constructs in four main steps: bootstrapping, schema matching, schema integration, and generation of system-specific constructs. We also present NextiaDI, a tool implementing our approach. Finally, a comprehensive evaluation is presented to scrutinize our approach.This work was partly supported by the DOGO4ML project, funded by the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00, and D3M project, funded by the Spanish Agencia Estatal de Investigación (AEI) under project PDC2021-121195-I00. Javier Flores is supported by contract 2020-DI-027 of the Industrial Doctorate Program of the Government of Catalonia and Consejo Nacional de Ciencia y Tecnología (CONACYT, Mexico). Sergi Nadal is partly supported by the Spanish Ministerio de Ciencia e Innovación, as well as the European Union – NextGenerationEU, under project FJC2020-045809-I.Peer ReviewedPostprint (published version

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Bench-Ranking: ettekirjutav analüüsimeetod suurte teadmiste graafide päringutele

    Get PDF
    Relatsiooniliste suurandmete (BD) töötlemisraamistike kasutamine suurte teadmiste graafide töötlemiseks kätkeb endas võimalust päringu jõudlust optimeerimida. Kaasaegsed BD-süsteemid on samas keerulised andmesüsteemid, mille konfiguratsioonid omavad olulist mõju jõudlusele. Erinevate raamistike ja konfiguratsioonide võrdlusuuringud pakuvad kogukonnale parimaid tavasid parema jõudluse saavutamiseks. Enamik neist võrdlusuuringutest saab liigitada siiski vaid kirjeldavaks ja diagnostiliseks analüütikaks. Lisaks puudub ühtne standard nende uuringute võrdlemiseks kvantitatiivselt järjestatud kujul. Veelgi enam, suurte graafide töötlemiseks vajalike konveierite kavandamine eeldab täiendavaid disainiotsuseid mis tulenevad mitteloomulikust (relatsioonilisest) graafi töötlemise paradigmast. Taolisi disainiotsuseid ei saa automaatselt langetada, nt relatsiooniskeemi, partitsioonitehnika ja salvestusvormingute valikut. Käesolevas töös käsitleme kuidas me antud uurimuslünga täidame. Esmalt näitame disainiotsuste kompromisside mõju BD-süsteemide jõudluse korratavusele suurte teadmiste graafide päringute tegemisel. Lisaks näitame BD-raamistike jõudluse kirjeldavate ja diagnostiliste analüüside piiranguid suurte graafide päringute tegemisel. Seejärel uurime, kuidas lubada ettekirjutavat analüütikat järjestamisfunktsioonide ja mitmemõõtmeliste optimeerimistehnikate (nn "Bench-Ranking") kaudu. See lähenemine peidab kirjeldava tulemusanalüüsi keerukuse, suunates praktiku otse teostatavate teadlike otsusteni.Leveraging relational Big Data (BD) processing frameworks to process large knowledge graphs yields a great interest in optimizing query performance. Modern BD systems are yet complicated data systems, where the configurations notably affect the performance. Benchmarking different frameworks and configurations provides the community with best practices for better performance. However, most of these benchmarking efforts are classified as descriptive and diagnostic analytics. Moreover, there is no standard for comparing these benchmarks based on quantitative ranking techniques. Moreover, designing mature pipelines for processing big graphs entails considering additional design decisions that emerge with the non-native (relational) graph processing paradigm. Those design decisions cannot be decided automatically, e.g., the choice of the relational schema, partitioning technique, and storage formats. Thus, in this thesis, we discuss how our work fills this timely research gap. Particularly, we first show the impact of those design decisions’ trade-offs on the BD systems’ performance replicability when querying large knowledge graphs. Moreover, we showed the limitations of the descriptive and diagnostic analyses of BD frameworks’ performance for querying large graphs. Thus, we investigate how to enable prescriptive analytics via ranking functions and Multi-Dimensional optimization techniques (called ”Bench-Ranking”). This approach abstracts out from the complexity of descriptive performance analysis, guiding the practitioner directly to actionable informed decisions.https://www.ester.ee/record=b553332

    Automatically selecting patients for clinical trials with justifications

    Get PDF
    Clinical trials are human research studies that are used to evaluate the effectiveness of a surgical, medical, or behavioral intervention. They have been widely used by researchers to determine whether a new treatment, such as a new medication, is safe and effective in humans. A clinical trial is frequently performed to determine whether a new treatment is more successful than the current treatment or has less harmful side effects. However, clinical trials have a high failure rate. One method applied is to find patients based on patient records. Unfortunately, this is a difficult process. This is because this process is typically performed manually, making it time-consuming and error-prone. Consequently, clinical trial deadlines are often missed, and studies do not move forward. Time can be a determining factor for success. Therefore, it would be advantageous to have automatic support in this process. Since it is also important to be able to validate whether the patients were selected correctly for the trial, avoiding eventual health problems, it would be important to have a mechanism to present justifications for the selected patients. In this dissertation, we present one possible solution to solve the problem of patient selection for clinical trials. We developed the necessary algorithms and created a simple and intuitive web application that features the selection of patients for clinical trials automatically. This was achieved by combining knowledge expressed in different formalisms. We integrated medical knowledge using ontologies, with criteria that were expressed using nonmonotonic rules. To address the validation procedure automatically, we developed a mechanism that generates the justifications for each selection together with the results of the patients who were selected. In the end, it is expected that a user can easily enter a set of trial criteria, and the application will generate the results of the selected patients and their respective justifications, based on the criteria inserted, medical information and a database of patient information.Os ensaios clínicos são estudos de pesquisa em humanos, utilizados para avaliar a eficácia de uma intervenção cirúrgica, médica ou comportamental. Estes estudos, têm sido amplamente utilizados pelos investigadores para determinar se um novo tratamento, como é o caso de um novo medicamento, é seguro e eficaz em humanos. Um ensaio clínico é realizado frequentemente, para determinar se um novo tratamento tem mais sucesso do que o tratamento atual ou se tem menos efeitos colaterais prejudiciais. No entanto, os ensaios clínicos têm uma taxa de insucesso alta. Um método aplicado é encontrar pacientes com base em registos. Infelizmente, este é um processo difícil. Isto deve-se ao facto deste processo ser normalmente realizado à mão, o que o torna demorado e propenso a erros. Consequentemente, o prazo dos ensaios clínicos é muitas vezes ultrapassado e os estudos acabam por não avançar. O tempo pode ser por vezes um fator determinante para o sucesso. Seria então vantajoso ter algum apoio automático neste processo. Visto que também seria importante validar se os pacientes foram selecionados corretamente para o ensaio, evitando até eventuais problemas de saúde, seria importante ter um mecanismo que apresente justificações para os pacientes selecionados. Nesta dissertação, apresentamos uma possível solução para resolver o problema da seleção de pacientes para ensaios clínicos, através da criação de uma aplicação web, intuitiva e fácil de utilizar, que apresenta a seleção de pacientes para ensaios clínicos de forma automática. Isto foi alcançado através da combinação de conhecimento expresso em diferentes formalismos. Integrámos o conhecimento médico usando ontologias, com os critérios que serão expressos usando regras não monotónicas. Para tratar do processo de validação, desenvolvemos um mecanismo que gera justificações para cada seleção juntamente com os resultados dos pacientes selecionados. No final, é esperado que o utilizador consiga inserir facilmente um conjunto de critérios de seleção, e a aplicação irá gerar os resultados dos pacientes selecionados e as respetivas justificações, com base nos critérios inseridos, informações médicas e uma base de dados com informações dos pacientes

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    Automated Deduction – CADE 28

    Get PDF
    This open access book constitutes the proceeding of the 28th International Conference on Automated Deduction, CADE 28, held virtually in July 2021. The 29 full papers and 7 system descriptions presented together with 2 invited papers were carefully reviewed and selected from 76 submissions. CADE is the major forum for the presentation of research in all aspects of automated deduction, including foundations, applications, implementations, and practical experience. The papers are organized in the following topics: Logical foundations; theory and principles; implementation and application; ATP and AI; and system descriptions

    28th International Symposium on Temporal Representation and Reasoning (TIME 2021)

    Get PDF
    The 28th International Symposium on Temporal Representation and Reasoning (TIME 2021) was planned to take place in Klagenfurt, Austria, but had to move to an online conference due to the insecurities and restrictions caused by the pandemic. Since its frst edition in 1994, TIME Symposium is quite unique in the panorama of the scientifc conferences as its main goal is to bring together researchers from distinct research areas involving the management and representation of temporal data as well as the reasoning about temporal aspects of information. Moreover, TIME Symposium aims to bridge theoretical and applied research, as well as to serve as an interdisciplinary forum for exchange among researchers from the areas of artifcial intelligence, database management, logic and verifcation, and beyond
    corecore