32,805 research outputs found
Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
A critical challenge in constructing a natural language interface to database
(NLIDB) is bridging the semantic gap between a natural language query (NLQ) and
the underlying data. Two specific ways this challenge exhibits itself is
through keyword mapping and join path inference. Keyword mapping is the task of
mapping individual keywords in the original NLQ to database elements (such as
relations, attributes or values). It is challenging due to the ambiguity in
mapping the user's mental model and diction to the schema definition and
contents of the underlying database. Join path inference is the process of
selecting the relations and join conditions in the FROM clause of the final SQL
query, and is difficult because NLIDB users lack the knowledge of the database
schema or SQL and therefore cannot explicitly specify the intermediate tables
and joins needed to construct a final SQL query. In this paper, we propose
leveraging information from the SQL query log of a database to enhance the
performance of existing NLIDBs with respect to these challenges. We present a
system Templar that can be used to augment existing NLIDBs. Our extensive
experimental evaluation demonstrates the effectiveness of our approach, leading
up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL
query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE)
201
Recommended from our members
Automatic view schema generation in object-oriented databases
An object-oriented data schema is a complex structure of classes interrelated via generalization and property decomposition relationships. We define an object-oriented view to be a virtual schema graph with possibly restructured generalization and decomposition hierarchies - rather than just one individual virtual class as proposed in the literature. In this paper, we propose a methodology, called MultiView, for supporting multiple such view schemata. MultiView is anchored on the following complementary ideas: (a) the view definer derives virtual classes and then integrates them into one consistent global schema graph and (b) the view definer specifies arbitrarily complex view schemata on this augmented global schema. The focus of this paper is, however, on the second, less explored, issue. This part of the view definition is performed using the following two steps: (1) view class selection and (2) view schema graph generation. For the first, we have developed a view definition language that can be used by the view definer to specify the selection of the desired view classes from the global schema. For the second, we have developed two algorithms that automatically augment the set of selected view classes to generate a complete, minimal and consistent view class generalization hierarchy. The first algorithm has linear complexity but it assumes that the global schema graph is a tree. The second algorithm overcomes this restricting assumption and thus allows for multiple inheritance, but it does so at the cost of a higher complexity
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Integrating and Ranking Uncertain Scientific Data
Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates
- …