9 research outputs found
Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
A critical challenge in constructing a natural language interface to database
(NLIDB) is bridging the semantic gap between a natural language query (NLQ) and
the underlying data. Two specific ways this challenge exhibits itself is
through keyword mapping and join path inference. Keyword mapping is the task of
mapping individual keywords in the original NLQ to database elements (such as
relations, attributes or values). It is challenging due to the ambiguity in
mapping the user's mental model and diction to the schema definition and
contents of the underlying database. Join path inference is the process of
selecting the relations and join conditions in the FROM clause of the final SQL
query, and is difficult because NLIDB users lack the knowledge of the database
schema or SQL and therefore cannot explicitly specify the intermediate tables
and joins needed to construct a final SQL query. In this paper, we propose
leveraging information from the SQL query log of a database to enhance the
performance of existing NLIDBs with respect to these challenges. We present a
system Templar that can be used to augment existing NLIDBs. Our extensive
experimental evaluation demonstrates the effectiveness of our approach, leading
up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL
query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE)
201
SpatialNLI: A Spatial Domain Natural Language Interface to Databases Using Spatial Comprehension
A natural language interface (NLI) to databases is an interface that
translates a natural language question to a structured query that is executable
by database management systems (DBMS). However, an NLI that is trained in the
general domain is hard to apply in the spatial domain due to the idiosyncrasy
and expressiveness of the spatial questions. Inspired by the machine
comprehension model, we propose a spatial comprehension model that is able to
recognize the meaning of spatial entities based on the semantics of the
context. The spatial semantics learned from the spatial comprehension model is
then injected to the natural language question to ease the burden of capturing
the spatial-specific semantics. With our spatial comprehension model and
information injection, our NLI for the spatial domain, named SpatialNLI, is
able to capture the semantic structure of the question and translate it to the
corresponding syntax of an executable query accurately. We also experimentally
ascertain that SpatialNLI outperforms state-of-the-art methods.Comment: 10 page
Recommended from our members
A Shallow Parsing Approach to Natural Language Queries of a Database
Copyright © 2019 by author(s) and Scientific Research Publishing Inc. The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable
of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement
Natural language interfaces to relational databases
Máster Universitario en Lógica, Computación e Inteligencia Artificia
Natural Language Processing on Data Warehouses
The main problem addressed in this research was to use natural language to query data in a data warehouse. To this effect, two natural language processing models were developed and compared on a classic star-schema sales data warehouse with sales facts and date, location and item dimensions. Utterances are queries that people make with natural language, for example, What is the sales value for mountain bikes in Georgia for 1 July 2005?" The first model, the heuristics model, implemented an algorithm that steps through the sequence of utterance words and matches the longest number of consecutive words at the highest grain of the hierarchy. In contrast, the embedding model implemented the word2vec algorithm to create different kinds of vectors from the data warehouse. These vectors are aggregated and then the cosine similarity between vectors was used to identify concepts in the utterances that can be converted to a programming language. To understand question style, a survey was set up which then helped shape random utterances created to use for the evaluation of both methods. The first key insight and main premise for the embedding model to work is a three-step process of creating three types of vectors. The first step is to train vectors (word vectors) for each individual word in the data warehouse; this is called word embeddings. For instance, the word `bike' will have a vector. The next step is when the word vectors are averaged for each unique column value (column vectors) in the data warehouse, thus leaving an entry like `mountain bike' with one vector which is the average of the vectors for `mountain' and `bike'. Lastly, the utterance by the user is averaged (utterance vectors) by using the word vectors created in step one, and then, by using cosine similarity, the utterance vector is matched to the closest column vectors in order to identify data warehouse concepts in the utterance. The second key insight was to train word vectors firstly for location, then separately for item - in other words, per dimension (one set for location, and one set for item). Removing stop words was the third key insight, and the last key insight was to use Global Vectors to instantiate the training of the word vectors. The results of the evaluation of the models indicated that the embedding model was ten times faster than the heuristics model. In terms of accuracy, the embedding algorithm (95.6% accurate) also outperformed the heuristics model (70.1% accurate). The practical application of the research is that these models can be used as a component in a chatbot on data warehouses. Combined with a Structured Query Language query generation component, and building Application Programming Interfaces on top of it, this facilitates the quick and easy distribution of data; no knowledge of a programming language such as Structured Query Language is needed to query the data
Recommended from our members
Using domain specific language and sequence to sequence models as a hybrid framework for a natural language interface to a database solution
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe aim of this project is to provide a new approach to solving the problem of
converting natural language into a language capable of querying a database or data
repository. This problem has been around for a while, in the 1970's the US Navy
developed a solution called LADDER and since then there have been an array of
solutions, approaches and tweaks that have kept the research community busy. The
introduction of electronic assistants into the smart phone in 2010 has given new
impetus to this problem.
With the increasingly pervasive nature of data and its ever expanding use to answer
questions within business science, medicine extracting data is becoming more important.
The idea behind this project is to make data more democratised by allowing access to it
without the need for specialist languages. The performance and reliability of converting
natural language into structured query language can be problematic in handling nuances
that are prevalent in natural language. Relational databases are not designed to understand
language nuance.
This project introduces the following components as part of a holistic approach to improving
the conversion of a natural language statement into a language capable of querying a data
repository.
● The idea proposed in this project combines the use of sequence to sequence models
in conjunction with the natural language part of speech technologies and domain
specific languages to convert natural language queries into SQL. The approach
being proposed by this chapter is to use natural language processing to perform an
initial shallow pass of the incoming query and then use Google's Tensor Flow to
refine the query with the use of a sequence to sequence model.
● This thesis is also proposing to use a Domain Specific Language (DSL) as part of the
conversion process. The use of the DSL has the potential to allow the natural
language query to be translated into more than just an SQL statement, but any query
language such as NoSQL or XQuery
Maximizing User Domain Expertise to Clarify Oblique Specifications of Relational Queries
While there is abundant access to data management technology today, working with data is still challenging for the average user. One common means of manipulating data is with SQL on relational databases, but this requires knowledge of SQL as well as the database's schema and contents. Consequently, previous work has proposed oblique query specification (OQS) methods such as natural language or programming-by-example to allow users to imprecisely specify their query intent. These methods, however, suffer from either low precision or low expressivity and, in addition, produce a list of candidate SQL queries that make it difficult for users to select their final target query.
My thesis is that OQS systems should maximize user domain expertise to triangulate the user's desired query. First, I demonstrate how to leverage previously-issued SQL queries to improve the accuracy of natural language interfaces. Second, I propose a system allowing users to specify a query with both natural language and programming-by-example. Finally, I develop a system where users provide feedback on system-suggested tuples to select a SQL query from a set of candidate queries generated by an OQS system.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155114/1/cjbaik_1.pd