3,818 research outputs found

    The social value of digital ghosts

    Get PDF

    The Vadalog System: Datalog-based Reasoning for Knowledge Graphs

    Full text link
    Over the past years, there has been a resurgence of Datalog-based systems in the database community as well as in industry. In this context, it has been recognized that to handle the complex knowl\-edge-based scenarios encountered today, such as reasoning over large knowledge graphs, Datalog has to be extended with features such as existential quantification. Yet, Datalog-based reasoning in the presence of existential quantification is in general undecidable. Many efforts have been made to define decidable fragments. Warded Datalog+/- is a very promising one, as it captures PTIME complexity while allowing ontological reasoning. Yet so far, no implementation of Warded Datalog+/- was available. In this paper we present the Vadalog system, a Datalog-based system for performing complex logic reasoning tasks, such as those required in advanced knowledge graphs. The Vadalog system is Oxford's contribution to the VADA research programme, a joint effort of the universities of Oxford, Manchester and Edinburgh and around 20 industrial partners. As the main contribution of this paper, we illustrate the first implementation of Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive termination control strategy. We also provide a comprehensive experimental evaluation.Comment: Extended version of VLDB paper <https://doi.org/10.14778/3213880.3213888

    Addressing and Presenting Quality of Satellite Data via Web-Based Services

    Get PDF
    With the recent attention to climate change and proliferation of remote-sensing data utilization, climate model and various environmental monitoring and protection applications have begun to increasingly rely on satellite measurements. Research application users seek good quality satellite data, with uncertainties and biases provided for each data point. However, different communities address remote sensing quality issues rather inconsistently and differently. We describe our attempt to systematically characterize, capture, and provision quality and uncertainty information as it applies to the NASA MODIS Aerosol Optical Depth data product. In particular, we note the semantic differences in quality/bias/uncertainty at the pixel, granule, product, and record levels. We outline various factors contributing to uncertainty or error budget; errors. Web-based science analysis and processing tools allow users to access, analyze, and generate visualizations of data while alleviating users from having directly managing complex data processing operations. These tools provide value by streamlining the data analysis process, but usually shield users from details of the data processing steps, algorithm assumptions, caveats, etc. Correct interpretation of the final analysis requires user understanding of how data has been generated and processed and what potential biases, anomalies, or errors may have been introduced. By providing services that leverage data lineage provenance and domain-expertise, expert systems can be built to aid the user in understanding data sources, processing, and the suitability for use of products generated by the tools. We describe our experiences developing a semantic, provenance-aware, expert-knowledge advisory system applied to NASA Giovanni web-based Earth science data analysis tool as part of the ESTO AIST-funded Multi-sensor Data Synergy Advisor project

    A population genetic study of Pasqueflower: In situ and Ex situ Conservation Genetics of a Vulnerable UK Plant Species

    Get PDF
    The population genetic structure of the vulnerable UK plant species Anemone pulsatilla L. reflects geographic patterns of historical range fragmentation and the influence of population decline and restoration intervention. Positive spatial auto-correlation of natural in situ populations of A. pulsatilla lends support to a scenario for genetic drift (i.e. random drift of allelic frequencies) driving the emergence of population genetic structure as a consequence of fragmentation. Multivariate and STRUCTURE analysis estimates the partitioning of genetic variation among four natural population genetic clusters (broadly defined by geographical regions of the species’ range) and a fifth, highly differentiated, genetic cluster defined by introduced genotypes of unverifiable genetic origin to the casually augmented AN population. It is recommended that restoration intervention (i.e. to augment declining populations or introduce populations to enhance gene flow) source propagules for introduction from within the local population genetic cluster in order to maximise the potential for introduction/exchange of locally adaptive genetic variation. The existing ex situ gene conservation strategy for A. pulsatilla can be predicted to under-represent the species’ natural genetic variability due to limited sampling effort. At a minimum, a representative ex situ gene conservation strategy for the safeguard of A. pulsatilla UK variability should aim to capture representative accessions from the most diverse population/s of each of the four natural population genetic clusters. It is also recommended that the six native AN genotypes are sampled for ex situ conservation due to a disproportionately high level of unique genetic variation. A pilot study of regenerated ex situ accessions supports a prediction that the following factors act on genetic diversity: (a) survivorship; (b) number of generation removed from the wild; (c) effective population size

    Knowledge Base Population using Semantic Label Propagation

    Get PDF
    A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct training data automatically by aligning a large text corpus with an existing knowledge base of known facts. For example, all sentences mentioning both 'Barack Obama' and 'US' may serve as positive training instances for the relation born_in(subject,object). However, distant supervision typically results in a highly noisy training set: many training sentences do not really express the intended relation. We propose to combine distant supervision with minimal manual supervision in a technique called feature labeling, to eliminate noise from the large and noisy initial training set, resulting in a significant increase of precision. We further improve on this approach by introducing the Semantic Label Propagation method, which uses the similarity between low-dimensional representations of candidate training instances, to extend the training set in order to increase recall while maintaining high precision. Our proposed strategy for generating training data is studied and evaluated on an established test collection designed for knowledge base population tasks. The experimental results show that the Semantic Label Propagation strategy leads to substantial performance gains when compared to existing approaches, while requiring an almost negligible manual annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge Bases for Natural Language Processin

    What Exactly is an Insight? A Literature Review

    Full text link
    Insights are often considered the ideal outcome of visual analysis sessions. However, there is no single definition of what an insight is. Some scholars define insights as correlations, while others define them as hypotheses or aha moments. This lack of a clear definition can make it difficult to build visualization tools that effectively support insight discovery. In this paper, we contribute a comprehensive literature review that maps the landscape of existing insight definitions. We summarize key themes regarding how insight is defined, with the goal of helping readers identify which definitions of insight align closely with their research and tool development goals. Based on our review, we also suggest interesting research directions, such as synthesizing a unified formalism for insight and connecting theories of insight to other critical concepts in visualization research.Comment: Technical report. arXiv admin note: text overlap with arXiv:2206.0476

    The pragmatic proof: hypermedia API composition and execution

    Get PDF
    Machine clients are increasingly making use of the Web to perform tasks. While Web services traditionally mimic remote procedure calling interfaces, a new generation of so-called hypermedia APIs works through hyperlinks and forms, in a way similar to how people browse the Web. This means that existing composition techniques, which determine a procedural plan upfront, are not sufficient to consume hypermedia APIs, which need to be navigated at runtime. Clients instead need a more dynamic plan that allows them to follow hyperlinks and use forms with a preset goal. Therefore, in this paper, we show how compositions of hypermedia APIs can be created by generic Semantic Web reasoners. This is achieved through the generation of a proof based on semantic descriptions of the APIs' functionality. To pragmatically verify the applicability of compositions, we introduce the notion of pre-execution and post-execution proofs. The runtime interaction between a client and a server is guided by proofs but driven by hypermedia, allowing the client to react to the application's actual state indicated by the server's response. We describe how to generate compositions from descriptions, discuss a computer-assisted process to generate descriptions, and verify reasoner performance on various composition tasks using a benchmark suite. The experimental results lead to the conclusion that proof-based consumption of hypermedia APIs is a feasible strategy at Web scale.Peer ReviewedPostprint (author's final draft
    • …
    corecore