3,818 research outputs found
The Vadalog System: Datalog-based Reasoning for Knowledge Graphs
Over the past years, there has been a resurgence of Datalog-based systems in
the database community as well as in industry. In this context, it has been
recognized that to handle the complex knowl\-edge-based scenarios encountered
today, such as reasoning over large knowledge graphs, Datalog has to be
extended with features such as existential quantification. Yet, Datalog-based
reasoning in the presence of existential quantification is in general
undecidable. Many efforts have been made to define decidable fragments. Warded
Datalog+/- is a very promising one, as it captures PTIME complexity while
allowing ontological reasoning. Yet so far, no implementation of Warded
Datalog+/- was available. In this paper we present the Vadalog system, a
Datalog-based system for performing complex logic reasoning tasks, such as
those required in advanced knowledge graphs. The Vadalog system is Oxford's
contribution to the VADA research programme, a joint effort of the universities
of Oxford, Manchester and Edinburgh and around 20 industrial partners. As the
main contribution of this paper, we illustrate the first implementation of
Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive
termination control strategy. We also provide a comprehensive experimental
evaluation.Comment: Extended version of VLDB paper
<https://doi.org/10.14778/3213880.3213888
Addressing and Presenting Quality of Satellite Data via Web-Based Services
With the recent attention to climate change and proliferation of remote-sensing data utilization, climate model and various environmental monitoring and protection applications have begun to increasingly rely on satellite measurements. Research application users seek good quality satellite data, with uncertainties and biases provided for each data point. However, different communities address remote sensing quality issues rather inconsistently and differently. We describe our attempt to systematically characterize, capture, and provision quality and uncertainty information as it applies to the NASA MODIS Aerosol Optical Depth data product. In particular, we note the semantic differences in quality/bias/uncertainty at the pixel, granule, product, and record levels. We outline various factors contributing to uncertainty or error budget; errors. Web-based science analysis and processing tools allow users to access, analyze, and generate visualizations of data while alleviating users from having directly managing complex data processing operations. These tools provide value by streamlining the data analysis process, but usually shield users from details of the data processing steps, algorithm assumptions, caveats, etc. Correct interpretation of the final analysis requires user understanding of how data has been generated and processed and what potential biases, anomalies, or errors may have been introduced. By providing services that leverage data lineage provenance and domain-expertise, expert systems can be built to aid the user in understanding data sources, processing, and the suitability for use of products generated by the tools. We describe our experiences developing a semantic, provenance-aware, expert-knowledge advisory system applied to NASA Giovanni web-based Earth science data analysis tool as part of the ESTO AIST-funded Multi-sensor Data Synergy Advisor project
A population genetic study of Pasqueflower: In situ and Ex situ Conservation Genetics of a Vulnerable UK Plant Species
The population genetic structure of the vulnerable UK plant species Anemone pulsatilla L. reflects geographic patterns of historical range fragmentation and the influence of population decline and restoration intervention. Positive spatial auto-correlation of natural in situ populations of A. pulsatilla lends support to a scenario for genetic drift (i.e. random drift of allelic frequencies) driving the emergence of population genetic structure as a consequence of fragmentation. Multivariate and STRUCTURE analysis estimates the partitioning of genetic variation among four natural population genetic clusters (broadly defined by geographical regions of the species’ range) and a fifth, highly differentiated, genetic cluster defined by introduced genotypes of unverifiable genetic origin to the casually augmented AN population. It is recommended that restoration intervention (i.e. to augment declining populations or introduce populations to enhance gene flow) source propagules for introduction from within the local population genetic cluster in order to maximise the potential for introduction/exchange of locally adaptive genetic variation.
The existing ex situ gene conservation strategy for A. pulsatilla can be predicted to under-represent the species’ natural genetic variability due to limited sampling effort. At a minimum, a representative ex situ gene conservation strategy for the safeguard of A. pulsatilla UK variability should aim to capture representative accessions from the most diverse population/s of each of the four natural population genetic clusters. It is also recommended that the six native AN genotypes are sampled for ex situ conservation due to a disproportionately high level of unique genetic variation. A pilot study of regenerated ex situ accessions supports a prediction that the following factors act on genetic diversity: (a) survivorship; (b) number of generation removed from the wild; (c) effective population size
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
What Exactly is an Insight? A Literature Review
Insights are often considered the ideal outcome of visual analysis sessions.
However, there is no single definition of what an insight is. Some scholars
define insights as correlations, while others define them as hypotheses or aha
moments. This lack of a clear definition can make it difficult to build
visualization tools that effectively support insight discovery. In this paper,
we contribute a comprehensive literature review that maps the landscape of
existing insight definitions. We summarize key themes regarding how insight is
defined, with the goal of helping readers identify which definitions of insight
align closely with their research and tool development goals. Based on our
review, we also suggest interesting research directions, such as synthesizing a
unified formalism for insight and connecting theories of insight to other
critical concepts in visualization research.Comment: Technical report. arXiv admin note: text overlap with
arXiv:2206.0476
The pragmatic proof: hypermedia API composition and execution
Machine clients are increasingly making use of the Web to perform tasks. While Web services traditionally mimic remote procedure calling interfaces, a new generation of so-called hypermedia APIs works through hyperlinks and forms, in a way similar to how people browse the Web. This means that existing composition techniques, which determine a procedural plan upfront, are not sufficient to consume hypermedia APIs, which need to be navigated at runtime. Clients instead need a more dynamic plan that allows them to follow hyperlinks and use forms with a preset goal. Therefore, in this paper, we show how compositions of hypermedia APIs can be created by generic Semantic Web reasoners. This is achieved through the generation of a proof based on semantic descriptions of the APIs' functionality. To pragmatically verify the applicability of compositions, we introduce the notion of pre-execution and post-execution proofs. The runtime interaction between a client and a server is guided by proofs but driven by hypermedia, allowing the client to react to the application's actual state indicated by the server's response. We describe how to generate compositions from descriptions, discuss a computer-assisted process to generate descriptions, and verify reasoner performance on various composition tasks using a benchmark suite. The experimental results lead to the conclusion that proof-based consumption of hypermedia APIs is a feasible strategy at Web scale.Peer ReviewedPostprint (author's final draft
- …