15 research outputs found
SQL Query Completion for Data Exploration
Within the big data tsunami, relational databases and SQL are still there and
remain mandatory in most of cases for accessing data. On the one hand, SQL is
easy-to-use by non specialists and allows to identify pertinent initial data at
the very beginning of the data exploration process. On the other hand, it is
not always so easy to formulate SQL queries: nowadays, it is more and more
frequent to have several databases available for one application domain, some
of them with hundreds of tables and/or attributes. Identifying the pertinent
conditions to select the desired data, or even identifying relevant attributes
is far from trivial. To make it easier to write SQL queries, we propose the
notion of SQL query completion: given a query, it suggests additional
conditions to be added to its WHERE clause. This completion is semantic, as it
relies on the data from the database, unlike current completion tools that are
mostly syntactic. Since the process can be repeated over and over again --
until the data analyst reaches her data of interest --, SQL query completion
facilitates the exploration of databases. SQL query completion has been
implemented in a SQL editor on top of a database management system. For the
evaluation, two questions need to be studied: first, does the completion speed
up the writing of SQL queries? Second , is the completion easily adopted by
users? A thorough experiment has been conducted on a group of 70 computer
science students divided in two groups (one with the completion and the other
one without) to answer those questions. The results are positive and very
promising
Logical Separability of Incomplete Data under Ontologies
Finding a logical formula that separates positive and negative examples given in the form of labeled data items is fundamental in applications such as concept learning, reverse engineering of database queries, and generating referring expressions. In this paper, we investigate the existence of a separating formula for incomplete data in the presence of an ontology. Both for the ontology language and the separation language, we concentrate on first-order logic and three important fragments thereof: the description logic , the guarded fragment, and the two-variable fragment. We consider several forms of separability that differ in the treatment of negative examples and in whether or not they admit the use of additional helper symbols to achieve separation. We characterize separability in a model-theoretic way, compare the separating power of the different languages, and determine the computational complexity of separability as a decision problem
Niffler: A Reference Architecture and System Implementation for View Discovery over Pathless Table Collections by Example
Identifying a project-join view (PJ-view) over collections of tables is the
first step of many data management projects, e.g., assembling a dataset to feed
into a business intelligence tool, creating a training dataset to fit a machine
learning model, and more. When the table collections are large and lack join
information--such as when combining databases, or on data lakes--query by
example (QBE) systems can help identify relevant data, but they are designed
under the assumption that join information is available in the schema, and do
not perform well on pathless table collections that do not have join path
information.
We present a reference architecture that explicitly divides the end-to-end
problem of discovering PJ-views over pathless table collections into a human
and a technical problem. We then present Niffler, a system built to address the
technical problem. We introduce algorithms for the main components of Niffler,
including a signal generation component that helps reduce the size of the
candidate views that may be large due to errors and ambiguity in both the data
and input queries. We evaluate Niffler on real datasets to demonstrate the
effectiveness of the new engine in discovering PJ-views over pathless table
collections