Building a question-answering agent currently requires large annotated
datasets, which are prohibitively expensive. This paper proposes Schema2QA, an
open-source toolkit that can generate a Q&A system from a database schema
augmented with a few annotations for each field. The key concept is to cover
the space of possible compound queries on the database with a large number of
in-domain questions synthesized with the help of a corpus of generic query
templates. The synthesized data and a small paraphrase set are used to train a
novel neural network based on the BERT pretrained model. We use Schema2QA to
generate Q&A systems for five Schema.org domains, restaurants, people, movies,
books and music, and obtain an overall accuracy between 64% and 75% on
crowdsourced questions for these domains. Once annotations and paraphrases are
obtained for a Schema.org schema, no additional manual effort is needed to
create a Q&A agent for any website that uses the same schema. Furthermore, we
demonstrate that learning can be transferred from the restaurant to the hotel
domain, obtaining a 64% accuracy on crowdsourced questions with no manual
effort. Schema2QA achieves an accuracy of 60% on popular restaurant questions
that can be answered using Schema.org. Its performance is comparable to Google
Assistant, 7% lower than Siri, and 15% higher than Alexa. It outperforms all
these assistants by at least 18% on more complex, long-tail questions
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.