18 research outputs found

    Research directions in data wrangling: Visualizations and transformations for usable and credible data

    Get PDF
    In spite of advances in technologies for working with data, analysts still spend an inordinate amount of time diagnosing data quality issues and manipulating data into a usable form. This process of ‘data wrangling’ often constitutes the most tedious and time-consuming aspect of analysis. Though data cleaning and integration arelongstanding issues in the database community, relatively little research has explored how interactive visualization can advance the state of the art. In this article, we review the challenges and opportunities associated with addressing data quality issues. We argue that analysts might more effectively wrangle data through new interactive systems that integrate data verification, transformation, and visualization. We identify a number of outstanding research questions, including how appropriate visual encodings can facilitate apprehension of missing data, discrepant values, and uncertainty; how interactive visualizations might facilitate data transform specification; and how recorded provenance and social interaction might enable wider reuse, verification, and modification of data transformations

    Biodiversity and Health: Implications for Conservation

    Get PDF
    The human health and well-being benefits of contact with nature are becoming increasingly recognised and well understood, yet the implications of nature experiences for biodiversity conservation are far less clear. Theoretically, there are two plausible pathways that could lead to positive conservation outcomes. The first is a direct win-win scenario where biodiverse areas of high conservation value are also disproportionately beneficial to human health and well-being, meaning that the two sets of objectives can be simultaneously and directly achieved, as long as such green spaces are safeguarded appropriately. The second is that experiencing nature can stimulate people’s interest in biodiversity, concern for its fate, and willingness to take action to protect it, therefore generating conservation gains indirectly. To date, the two pathways have rarely been distinguished and scarcely studied. Here we consider how they may potentially operate in practice, while acknowledging that the mechanisms by which biodiversity might underpin human health and well-being benefits are still being determined

    Query Reformulation in PDMS Based on Social Relevance

    No full text
    International audienceWe consider peer-to-peer data management systems (PDMS), where each peer maintains mappings between its schema and some acquaintances, along with social links with peer friends. In this context, we deal with reformulating conjunctive queries from a peer’s schema into other peer’s schemas. Precisely, queries against a peer node are rewritten into queries against other nodes using schema mappings thus obtaining query rewritings. Unfortunately, not all the obtained rewritings are relevant to a given query, as the information gain may be negligible or the peer is not worth exploring. On the other hand, the existence of social links with peer friends might be useful to get relevant rewritings. Therefore, we propose a new notion of ‘relevance’ of a query with respect to a mapping that encompasses both a local relevance (the relevance of the query w.r.t. the mapping) and a global relevance (the relevance of the query w.r.t. the entire network). Based on this notion, we have conceived a new query reformulation approach for social PDMS which achieves great accuracy and flexibility. To this purpose, we combine several techniques: (i) social links are expressed as FOAF (Friend of a Friend) links to characterize peer’s friendship; (ii) concise mapping summaries are used to obtain mapping descriptions; (iii) local semantic views (LSV) are special views that contain information about mappings captured from the network by using gossiping techniques. Our experimental evaluation, based on a prototype on top of PeerSim and a simulated network demonstrate that our solution yields greater recall, compared to traditional query translation approaches proposed in the literature

    Efficient Ontology-Based Data Integration with Canonical IRIs

    No full text
    In this paper, we study how to efficiently integrate multiple relational databases using an ontology-based approach. In ontology-based data integration (OBDI) an ontology provides a coherent view of multiple databases, and SPARQL queries over the ontology are rewritten into (federated) SQL queries over the underlying databases. Specifically, we address the scenario where records with different identifiers in different databases can represent the same entity. The standard approach in this case is to use sameAs to model the equivalence between entities. However, the standard semantics of sameAs may cause an exponential blow up of query results, since all possible combinations of equivalent identifiers have to be included in the answers. The large number of answers is not only detrimental to the performance of query evaluation, but also makes the answers difficult to understand due to the redundancy they introduce. This motivates us to propose an alternative approach, which is based on assigning canonical IRIs to entities in order to avoid redundancy. Formally, we present our approach as a new SPARQL entailment regime and compare it with the sameAs approach. We provide a prototype implementation and evaluate it in two experiments: in a real-world data integration scenario in Statoil and in an experiment extending the Wisconsin benchmark. The experimental results show that the canonical IRI approach is significantly more scalable

    The What-To-Ask Problem for Ontology-Based Peers

    No full text
    The issue of cooperation, integration, and coordination between information peers has been addressed over the years both in the context of the Semantic Web and in several other networked environments, including data integration, Peer-to-Peer and Grid computing, service-oriented computing, distributed agent systems, and collaborative data sharing. One of the main problems arising in such contexts is how to exploit the mappings between peers in order to answer queries posed to one peer. We address this issue for peers managing data through ontologies and in particular focus on ontologies specified in logics of the DL-Lite family. Our goal is to present some basic, fundamental results on this problem. In particular, we focus on a simplified setting based on just two interoperating peers, and we investigate how to solve the so-called “What-To-Ask” problem: find a way to answer queries posed to a peer by relying only on the query answering service available at the queried peer and at the other peer. We show both a positive and a negative result. Namely, we first prove that a solution to this problem always exists when the ontology is specified in, and we provide an algorithm to compute it. Then, we show that for the case of the problem may have no solution. We finally illustrate that a solution to our problem can still be found even for more general networks of peers, and for any language of the DL-Lite family, provided that we interpret mappings according to an epistemic semantics, rather than the usual first-order semantics
    corecore