24,177 research outputs found
SQL Query Completion for Data Exploration
Within the big data tsunami, relational databases and SQL are still there and
remain mandatory in most of cases for accessing data. On the one hand, SQL is
easy-to-use by non specialists and allows to identify pertinent initial data at
the very beginning of the data exploration process. On the other hand, it is
not always so easy to formulate SQL queries: nowadays, it is more and more
frequent to have several databases available for one application domain, some
of them with hundreds of tables and/or attributes. Identifying the pertinent
conditions to select the desired data, or even identifying relevant attributes
is far from trivial. To make it easier to write SQL queries, we propose the
notion of SQL query completion: given a query, it suggests additional
conditions to be added to its WHERE clause. This completion is semantic, as it
relies on the data from the database, unlike current completion tools that are
mostly syntactic. Since the process can be repeated over and over again --
until the data analyst reaches her data of interest --, SQL query completion
facilitates the exploration of databases. SQL query completion has been
implemented in a SQL editor on top of a database management system. For the
evaluation, two questions need to be studied: first, does the completion speed
up the writing of SQL queries? Second , is the completion easily adopted by
users? A thorough experiment has been conducted on a group of 70 computer
science students divided in two groups (one with the completion and the other
one without) to answer those questions. The results are positive and very
promising
ANSWERING WHY-NOT QUESTIONS ON REVERSE SKYLINE QUERIES OVER INCOMPLETE DATA
Recently, the development of the query-based preferences has received considerable attention from researchers and data users. One of the most popular preference-based queries is the skyline query, which will give a subset of superior records that are not dominated by any other records. As the developed version of skyline queries, a reverse skyline query rise. This query aims to get information about the query points that make a data or record as the part of result of their skyline query. Furthermore, data-oriented IT development requires scientists to be able to process data in all conditions. In the real world, there exist incomplete multidimensional data, both because of damage, loss, and privacy. In order to increase the usability over a data set, this study will discuss one of the problems in processing reverse skyline queries over incomplete data, namely the "why-not" problem. The considered solution to this "why-not" problem is advice and steps so that a query point that does not initially consider an incomplete data, as a result, can later make the record or incomplete data as part of the results. In this study, there will be further discussion about the dominance relationship between incomplete data along with the solution of the problem. Moreover, some performance evaluations are conducted to measure the level of efficiency and effectiveness
Crowdsourcing Multiple Choice Science Questions
We present a novel method for obtaining high-quality, domain-targeted
multiple choice questions from crowd workers. Generating these questions can be
difficult without trading away originality, relevance or diversity in the
answer options. Our method addresses these problems by leveraging a large
corpus of domain-specific text and a small set of existing questions. It
produces model suggestions for document selection and answer distractor choice
which aid the human question generation process. With this method we have
assembled SciQ, a dataset of 13.7K multiple choice science exam questions
(Dataset available at http://allenai.org/data.html). We demonstrate that the
method produces in-domain questions by providing an analysis of this new
dataset and by showing that humans cannot distinguish the crowdsourced
questions from original questions. When using SciQ as additional training data
to existing questions, we observe accuracy improvements on real science exams.Comment: accepted for the Workshop on Noisy User-generated Text (W-NUT) 201
- …