29,559 research outputs found

    Explaining Queries over Web Tables to Non-Experts

    Full text link
    Designing a reliable natural language (NL) interface for querying tables has been a longtime goal of researchers in both the data management and natural language processing (NLP) communities. Such an interface receives as input an NL question, translates it into a formal query, executes the query and returns the results. Errors in the translation process are not uncommon, and users typically struggle to understand whether their query has been mapped correctly. We address this problem by explaining the obtained formal queries to non-expert users. Two methods for query explanations are presented: the first translates queries into NL, while the second method provides a graphic representation of the query cell-based provenance (in its execution on a given table). Our solution augments a state-of-the-art NL interface over web tables, enhancing it in both its training and deployment phase. Experiments, including a user study conducted on Amazon Mechanical Turk, show our solution to improve both the correctness and reliability of an NL interface.Comment: Short paper version to appear in ICDE 201

    Ontology-based explanation of classifiers

    Get PDF
    The rise of data mining and machine learning use in many applications has brought new challenges related to classification. Here, we deal with the following challenge: how to interpret and understand the reason behind a classifier's prediction. Indeed, understanding the behaviour of a classifier is widely recognized as a very important task for wide and safe adoption of machine learning and data mining technologies, especially in high-risk domains, and in dealing with bias.We present a preliminary work on a proposal of using the Ontology-Based Data Management paradigm for explaining the behavior of a classifier in terms of the concepts and the relations that are meaningful in the domain that is relevant for the classifier

    An electronic healthcare record server implemented in PostgreSQL

    Get PDF
    This paper describes the implementation of an Electronic Healthcare Record server inside a PostgreSQL relational database without dependency on any further middleware infrastructure. The five-part international standard for communicating healthcare records (ISO EN 13606) is used as the information basis for the design of the server. We describe some of the features that this standard demands that are provided by the server, and other areas where assumptions about the durability of communications or the presence of middleware lead to a poor fit. Finally, we discuss the use of the server in two real-world scenarios including a commercial application

    Data reliability assessment in a data warehouse opened on the Web

    Get PDF
    International audienceThis paper presents an ontology-driven workflow that feeds and queries a data warehouse opened on the Web. Data are extracted from data tables in Web documents. As web documents are very heterogeneous in nature, a key issue in this workflow is the ability to assess the reliability of retrieved data. We first recall the main steps of our method to annotate and query Web data tables driven by a domain ontology. Then we propose an original method to assess Web data table reliability from a set of criteria by the means of evidence theory. Finally, we show how we extend the workflow to integrate the reliability assessment step

    Nanotechnology Publications and Patents: A Review of Social Science Studies and Search Strategies

    Get PDF
    This paper provides a comprehensive review of more than 120 social science studies in nanoscience and technology, all of which analyze publication and patent data. We conduct a comparative analysis of bibliometric search strategies that these studies use to harvest publication and patent data related to nanoscience and technology. We implement these strategies on 2006 publication data and find that Mogoutov and Kahane (2007), with their evolutionary lexical query search strategy, extract the highest number of records from the Web of Science. The strategies of Glanzel et al. (2003), Noyons et al. (2003), Porter et al. (2008) and Mogoutov and Kahane (2007) produce very similar ranking tables of the top ten nanotechnology subject areas and the top ten most prolific countries and institutions.nanotechnology, research and development, productivity, publications, patents, bibliometric analysis, search strategy
    corecore