1,264 research outputs found
Improving short text classification through global augmentation methods
We study the effect of different approaches to text augmentation. To do this
we use 3 datasets that include social media and formal text in the form of news
articles. Our goal is to provide insights for practitioners and researchers on
making choices for augmentation for classification use cases. We observe that
Word2vec-based augmentation is a viable option when one does not have access to
a formal synonym model (like WordNet-based augmentation). The use of
\emph{mixup} further improves performance of all text based augmentations and
reduces the effects of overfitting on a tested deep learning model. Round-trip
translation with a translation service proves to be harder to use due to cost
and as such is less accessible for both normal and low resource use-cases.Comment: Final version published in CD-MAKE 2020: Machine Learning and
Knowledge Extraction pp 385-39
Towards textual data augmentation for neural networks: synonyms and maximum loss
Data augmentation is one of the ways of dealing with labeled data scarcity and overfitting. Both these problems are crucial for modern deep learning algorithms which require massive amounts of data. The problem is better explored in the context of image analysis than for text. This work is a step forward to close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The augumentation is based on the substitution of words using a thesaurus as well as the Princeton WordNet. Our method improves upon the baseline in almost all cases. In terms of accuracy the best of the variants is 1.2% (pp.) better than the baseline
Improving Natural Language Inference Using External Knowledge in the Science Questions Domain
Natural Language Inference (NLI) is fundamental to many Natural Language
Processing (NLP) applications including semantic search and question answering.
The NLI problem has gained significant attention thanks to the release of large
scale, challenging datasets. Present approaches to the problem largely focus on
learning-based methods that use only textual information in order to classify
whether a given premise entails, contradicts, or is neutral with respect to a
given hypothesis. Surprisingly, the use of methods based on structured
knowledge -- a central topic in artificial intelligence -- has not received
much attention vis-a-vis the NLI problem. While there are many open knowledge
bases that contain various types of reasoning information, their use for NLI
has not been well explored. To address this, we present a combination of
techniques that harness knowledge graphs to improve performance on the NLI
problem in the science questions domain. We present the results of applying our
techniques on text, graph, and text-to-graph based models, and discuss
implications for the use of external knowledge in solving the NLI problem. Our
model achieves the new state-of-the-art performance on the NLI problem over the
SciTail science questions dataset.Comment: 9 pages, 3 figures, 5 table
Knowledge Graph semantic enhancement of input data for improving AI
Intelligent systems designed using machine learning algorithms require a
large number of labeled data. Background knowledge provides complementary, real
world factual information that can augment the limited labeled data to train a
machine learning algorithm. The term Knowledge Graph (KG) is in vogue as for
many practical applications, it is convenient and useful to organize this
background knowledge in the form of a graph. Recent academic research and
implemented industrial intelligent systems have shown promising performance for
machine learning algorithms that combine training data with a knowledge graph.
In this article, we discuss the use of relevant KGs to enhance input data for
two applications that use machine learning -- recommendation and community
detection. The KG improves both accuracy and explainability
Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
A critical challenge in constructing a natural language interface to database
(NLIDB) is bridging the semantic gap between a natural language query (NLQ) and
the underlying data. Two specific ways this challenge exhibits itself is
through keyword mapping and join path inference. Keyword mapping is the task of
mapping individual keywords in the original NLQ to database elements (such as
relations, attributes or values). It is challenging due to the ambiguity in
mapping the user's mental model and diction to the schema definition and
contents of the underlying database. Join path inference is the process of
selecting the relations and join conditions in the FROM clause of the final SQL
query, and is difficult because NLIDB users lack the knowledge of the database
schema or SQL and therefore cannot explicitly specify the intermediate tables
and joins needed to construct a final SQL query. In this paper, we propose
leveraging information from the SQL query log of a database to enhance the
performance of existing NLIDBs with respect to these challenges. We present a
system Templar that can be used to augment existing NLIDBs. Our extensive
experimental evaluation demonstrates the effectiveness of our approach, leading
up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL
query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE)
201
- …