Recent advances in NLU and NLP have resulted in renewed interest in natural
language interfaces to data, which provide an easy mechanism for non-technical
users to access and query the data. While early systems evolved from keyword
search and focused on simple factual queries, the complexity of both the input
sentences as well as the generated SQL queries has evolved over time. More
recently, there has also been a lot of focus on using conversational interfaces
for data analytics, empowering a line of non-technical users with quick
insights into the data. There are three main challenges in natural language
querying (NLQ): (1) identifying the entities involved in the user utterance,
(2) connecting the different entities in a meaningful way over the underlying
data source to interpret user intents, and (3) generating a structured query in
the form of SQL or SPARQL.
There are two main approaches for interpreting a user's NLQ. Rule-based
systems make use of semantic indices, ontologies, and KGs to identify the
entities in the query, understand the intended relationships between those
entities, and utilize grammars to generate the target queries. With the
advances in deep learning (DL)-based language models, there have been many
text-to-SQL approaches that try to interpret the query holistically using DL
models. Hybrid approaches that utilize both rule-based techniques as well as DL
models are also emerging by combining the strengths of both approaches.
Conversational interfaces are the next natural step to one-shot NLQ by
exploiting query context between multiple turns of conversation for
disambiguation. In this article, we review the background technologies that are
used in natural language interfaces, and survey the different approaches to
NLQ. We also describe conversational interfaces for data analytics and discuss
several benchmarks used for NLQ research and evaluation.Comment: The full version of this manuscript, as published by Foundations and
Trends in Databases, is available at http://dx.doi.org/10.1561/190000007