29,559 research outputs found
Explaining Queries over Web Tables to Non-Experts
Designing a reliable natural language (NL) interface for querying tables has
been a longtime goal of researchers in both the data management and natural
language processing (NLP) communities. Such an interface receives as input an
NL question, translates it into a formal query, executes the query and returns
the results. Errors in the translation process are not uncommon, and users
typically struggle to understand whether their query has been mapped correctly.
We address this problem by explaining the obtained formal queries to non-expert
users. Two methods for query explanations are presented: the first translates
queries into NL, while the second method provides a graphic representation of
the query cell-based provenance (in its execution on a given table). Our
solution augments a state-of-the-art NL interface over web tables, enhancing it
in both its training and deployment phase. Experiments, including a user study
conducted on Amazon Mechanical Turk, show our solution to improve both the
correctness and reliability of an NL interface.Comment: Short paper version to appear in ICDE 201
Ontology-based explanation of classifiers
The rise of data mining and machine learning use in many applications has brought new challenges related to classification. Here, we deal with the following challenge: how to interpret and understand the reason behind a classifier's prediction. Indeed, understanding the behaviour of a classifier is widely recognized as a very important task for wide and safe adoption of machine learning and data mining technologies, especially in high-risk domains, and in dealing with bias.We present a preliminary work on a proposal of using the Ontology-Based Data Management paradigm for explaining the behavior of a classifier in terms of the concepts and the relations that are meaningful in the domain that is relevant for the classifier
An electronic healthcare record server implemented in PostgreSQL
This paper describes the implementation of an Electronic Healthcare Record server inside a PostgreSQL relational database without dependency on any further middleware infrastructure. The five-part international standard for communicating healthcare records (ISO EN 13606) is used as the information basis for the design of the server. We describe some of the features that this standard demands that are provided by the server, and other areas where assumptions about the durability of communications or the presence of middleware lead to a poor fit. Finally, we discuss the use of the server in two real-world scenarios including a commercial application
Data reliability assessment in a data warehouse opened on the Web
International audienceThis paper presents an ontology-driven workflow that feeds and queries a data warehouse opened on the Web. Data are extracted from data tables in Web documents. As web documents are very heterogeneous in nature, a key issue in this workflow is the ability to assess the reliability of retrieved data. We first recall the main steps of our method to annotate and query Web data tables driven by a domain ontology. Then we propose an original method to assess Web data table reliability from a set of criteria by the means of evidence theory. Finally, we show how we extend the workflow to integrate the reliability assessment step
Nanotechnology Publications and Patents: A Review of Social Science Studies and Search Strategies
This paper provides a comprehensive review of more than 120 social science studies in nanoscience and technology, all of which analyze publication and patent data. We conduct a comparative analysis of bibliometric search strategies that these studies use to harvest publication and patent data related to nanoscience and technology. We implement these strategies on 2006 publication data and find that Mogoutov and Kahane (2007), with their evolutionary lexical query search strategy, extract the highest number of records from the Web of Science. The strategies of Glanzel et al. (2003), Noyons et al. (2003), Porter et al. (2008) and Mogoutov and Kahane (2007) produce very similar ranking tables of the top ten nanotechnology subject areas and the top ten most prolific countries and institutions.nanotechnology, research and development, productivity, publications, patents, bibliometric analysis, search strategy
- …