914 research outputs found
Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss
The task of Fine-grained Entity Type Classification (FETC) consists of
assigning types from a hierarchy to entity mentions in text. Existing methods
rely on distant supervision and are thus susceptible to noisy labels that can
be out-of-context or overly-specific for the training sentence. Previous
methods that attempt to address these issues do so with heuristics or with the
help of hand-crafted features. Instead, we propose an end-to-end solution with
a neural network model that uses a variant of cross- entropy loss function to
handle out-of-context labels, and hierarchical loss normalization to cope with
overly-specific ones. Also, previous work solve FETC a multi-label
classification followed by ad-hoc post-processing. In contrast, our solution is
more elegant: we use public word embeddings to train a single-label that
jointly learns representations for entity mentions and their context. We show
experimentally that our approach is robust against noise and consistently
outperforms the state-of-the-art on established benchmarks for the task.Comment: Camera-ready for NAACL HLT 201
Fine Grained Classification of Personal Data Entities
Entity Type Classification can be defined as the task of assigning category
labels to entity mentions in documents. While neural networks have recently
improved the classification of general entity mentions, pattern matching and
other systems continue to be used for classifying personal data entities (e.g.
classifying an organization as a media company or a government institution for
GDPR, and HIPAA compliance). We propose a neural model to expand the class of
personal data entities that can be classified at a fine grained level, using
the output of existing pattern matching systems as additional contextual
features. We introduce new resources, a personal data entities hierarchy with
134 types, and two datasets from the Wikipedia pages of elected representatives
and Enron emails. We hope these resource will aid research in the area of
personal data discovery, and to that effect, we provide baseline results on
these datasets, and compare our method with state of the art models on
OntoNotes dataset
Neural architectures for fine-grained entity type classification
In this work, we investigate several neural network architectures for fine-grained entity type classification and make three key contributions. Despite being a natural comparison and addition, previous work on attentive neural architectures have not considered hand-crafted features and we combine these with learnt features and establish that they complement each other. Additionally, through quantitative analysis we establish that the attention mechanism learns to attend over syntactic heads and the phrase containing the mention, both of which are known to be strong hand-crafted features for our task. We introduce parameter sharing between labels through a hierarchical encoding method, that in lowdimensional projections show clear clusters for each type hierarchy. Lastly, despite using the same evaluation dataset, the literature frequently compare models trained using different data. We demonstrate that the choice of training data has a drastic impact on performance, which decreases by as much as 9.85% loose micro F1 score for a previously proposed method. Despite this discrepancy, our best model achieves state-of-the-art results with 75.36% loose micro F1 score on the well-established FIGER (GOLD) dataset and we report the best results for models trained using publicly available data for the OntoNotes dataset with 64.93% loose micro F1 score
An Attention-Based Word-Level Interaction Model: Relation Detection for Knowledge Base Question Answering
Relation detection plays a crucial role in Knowledge Base Question Answering
(KBQA) because of the high variance of relation expression in the question.
Traditional deep learning methods follow an encoding-comparing paradigm, where
the question and the candidate relation are represented as vectors to compare
their semantic similarity. Max- or average- pooling operation, which compresses
the sequence of words into fixed-dimensional vectors, becomes the bottleneck of
information. In this paper, we propose to learn attention-based word-level
interactions between questions and relations to alleviate the bottleneck issue.
Similar to the traditional models, the question and relation are firstly
represented as sequences of vectors. Then, instead of merging the sequence into
a single vector with pooling operation, soft alignments between words from the
question and the relation are learned. The aligned words are subsequently
compared with the convolutional neural network (CNN) and the comparison results
are merged finally. Through performing the comparison on low-level
representations, the attention-based word-level interaction model (ABWIM)
relieves the information loss issue caused by merging the sequence into a
fixed-dimensional vector before the comparison. The experimental results of
relation detection on both SimpleQuestions and WebQuestions datasets show that
ABWIM achieves state-of-the-art accuracy, demonstrating its effectiveness.Comment: Paper submitted to Neurocomputing at 11.12.201
A Question-Focused Multi-Factor Attention Network for Question Answering
Neural network models recently proposed for question answering (QA) primarily
focus on capturing the passage-question relation. However, they have minimal
capability to link relevant facts distributed across multiple sentences which
is crucial in achieving deeper understanding, such as performing multi-sentence
reasoning, co-reference resolution, etc. They also do not explicitly focus on
the question and answer type which often plays a critical role in QA. In this
paper, we propose a novel end-to-end question-focused multi-factor attention
network for answer extraction. Multi-factor attentive encoding using
tensor-based transformation aggregates meaningful facts even when they are
located in multiple sentences. To implicitly infer the answer type, we also
propose a max-attentional question aggregation mechanism to encode a question
vector based on the important words in a question. During prediction, we
incorporate sequence-level encoding of the first wh-word and its immediately
following word as an additional source of question type information. Our
proposed model achieves significant improvements over the best prior
state-of-the-art results on three large-scale challenging QA datasets, namely
NewsQA, TriviaQA, and SearchQA.Comment: 8 pages, AAAI 201
Embedding Based Link Prediction for Knowledge Graph Completion
Knowledge Graphs (KGs) are the most widely used representation of structured information about a particular domain consisting of billions of facts in the form of entities (nodes) and relations (edges) between them. Besides, the KGs also encapsulate the semantic type information of the entities. The last two decades have witnessed a constant growth of KGs in various domains such as government, scholarly data, biomedical domains, etc. KGs have been used in Machine Learning based applications such as entity linking, question answering, recommender systems, etc. Open KGs are mostly heuristically created, automatically generated from heterogeneous resources such as text, images, etc., or are human-curated. However, these KGs are often incomplete, i.e., there are missing links between the entities and missing links between the entities and their corresponding entity types. This thesis focuses on addressing these two challenges of link prediction for Knowledge Graph Completion (KGC):
\textbf{(i)} General Link Prediction in KGs that include head and tail prediction, triple classification, and
\textbf{(ii)} Entity Type Prediction.
Most of the graph mining algorithms are proven to be of high complexity, deterring their usage in KG-based applications. In recent years, KG embeddings have been trained to represent the entities and relations in the KG in a low-dimensional vector space preserving the graph structure. In most published works such as the translational models, convolutional models, semantic matching, etc., the triple information is used to generate the latent representation of the entities and relations.
In this dissertation, it is argued that contextual information about the entities obtained from the random walks, and textual entity descriptions, are the keys to improving the latent representation of the entities for KGC. The experimental results show that the knowledge obtained from the context of the entities supports the hypothesis. Several methods have been proposed for KGC and their effectiveness is shown empirically in this thesis. Firstly, a novel multi-hop attentive KG embedding model MADLINK is proposed for Link Prediction. It considers the contextual information of the entities by using random walks as well as textual entity descriptions of the entities. Secondly, a novel architecture exploiting the information contained in a pre-trained contextual Neural Language Model (NLM) is proposed for Triple Classification. Thirdly, the limitations of the current state-of-the-art (SoTA) entity type prediction models have been analysed and a novel entity typing model CAT2Type is proposed that exploits the Wikipedia Categories which is one of the most under-treated features of the KGs. This model can also be used to predict missing types of unseen entities i.e., the newly added entities in the KG.
Finally, another novel architecture GRAND is proposed to predict the missing entity types in KGs using multi-label, multi-class, and hierarchical classification by leveraging different strategic graph walks in the KGs. The extensive experiments and ablation studies show that all the proposed models outperform the current SoTA models and set new baselines for KGC.
The proposed models establish that the NLMs and the contextual information of the entities in the KGs together with the different neural network architectures benefit KGC. The promising results and observations open up interesting scopes for future research involving exploiting the proposed models in domain-specific KGs such as scholarly data, biomedical data, etc. Furthermore, the link prediction model can be exploited as a base model for the entity alignment task as it considers the neighbourhood information of the entities
Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog
Recent studies have shown remarkable success in end-to-end task-oriented
dialog system. However, most neural models rely on large training data, which
are only available for a certain number of task domains, such as navigation and
scheduling.
This makes it difficult to scalable for a new domain with limited labeled
data. However, there has been relatively little research on how to effectively
use data from all domains to improve the performance of each domain and also
unseen domains. To this end, we investigate methods that can make explicit use
of domain knowledge and introduce a shared-private network to learn shared and
specific knowledge. In addition, we propose a novel Dynamic Fusion Network
(DF-Net) which automatically exploit the relevance between the target domain
and each domain. Results show that our model outperforms existing methods on
multi-domain dialogue, giving the state-of-the-art in the literature. Besides,
with little training data, we show its transferability by outperforming prior
best model by 13.9\% on average.Comment: ACL202
The Natural Language Decathlon: Multitask Learning as Question Answering
Deep learning has improved performance on many natural language processing
(NLP) tasks individually. However, general NLP models cannot emerge within a
paradigm that focuses on the particularities of a single metric, dataset, and
task. We introduce the Natural Language Decathlon (decaNLP), a challenge that
spans ten tasks: question answering, machine translation, summarization,
natural language inference, sentiment analysis, semantic role labeling,
zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and
commonsense pronoun resolution. We cast all tasks as question answering over a
context. Furthermore, we present a new Multitask Question Answering Network
(MQAN) jointly learns all tasks in decaNLP without any task-specific modules or
parameters in the multitask setting. MQAN shows improvements in transfer
learning for machine translation and named entity recognition, domain
adaptation for sentiment analysis and natural language inference, and zero-shot
capabilities for text classification. We demonstrate that the MQAN's
multi-pointer-generator decoder is key to this success and performance further
improves with an anti-curriculum training strategy. Though designed for
decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic
parsing task in the single-task setting. We also release code for procuring and
processing data, training and evaluating models, and reproducing all
experiments for decaNLP
Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research
Sentiment analysis as a field has come a long way since it was first
introduced as a task nearly 20 years ago. It has widespread commercial
applications in various domains like marketing, risk management, market
research, and politics, to name a few. Given its saturation in specific
subtasks -- such as sentiment polarity classification -- and datasets, there is
an underlying perception that this field has reached its maturity. In this
article, we discuss this perception by pointing out the shortcomings and
under-explored, yet key aspects of this field that are necessary to attain true
sentiment understanding. We analyze the significant leaps responsible for its
current relevance. Further, we attempt to chart a possible course for this
field that covers many overlooked and unanswered questions.Comment: Published in the IEEE Transactions on Affective Computing (TAFFC
Iterative Alternating Neural Attention for Machine Reading
We propose a novel neural attention architecture to tackle machine
comprehension tasks, such as answering Cloze-style queries with respect to a
document. Unlike previous models, we do not collapse the query into a single
vector, instead we deploy an iterative alternating attention mechanism that
allows a fine-grained exploration of both the query and the document. Our model
outperforms state-of-the-art baselines in standard machine comprehension
benchmarks such as CNN news articles and the Children's Book Test (CBT)
dataset
- …