236,226 research outputs found
Undivided Attention: Are Intermediate Layers Necessary for BERT?
In recent times, BERT-based models have been extremely successful in solving
a variety of natural language processing (NLP) tasks such as reading
comprehension, natural language inference, sentiment analysis, etc. All
BERT-based architectures have a self-attention block followed by a block of
intermediate layers as the basic building component. However, a strong
justification for the inclusion of these intermediate layers remains missing in
the literature. In this work we investigate the importance of intermediate
layers on the overall network performance of downstream tasks. We show that
reducing the number of intermediate layers and modifying the architecture for
BERT-Base results in minimal loss in fine-tuning accuracy for downstream tasks
while decreasing the number of parameters and training time of the model.
Additionally, we use the central kernel alignment (CKA) similarity metric and
probing classifiers to demonstrate that removing intermediate layers has little
impact on the learned self-attention representations
A Visual Interpretation-Based Self-Improved Classification System Using Virtual Adversarial Training
The successful application of large pre-trained models such as BERT in
natural language processing has attracted more attention from researchers.
Since the BERT typically acts as an end-to-end black box, classification
systems based on it usually have difficulty in interpretation and low
robustness. This paper proposes a visual interpretation-based self-improving
classification model with a combination of virtual adversarial training (VAT)
and BERT models to address the above problems. Specifically, a fine-tuned BERT
model is used as a classifier to classify the sentiment of the text. Then, the
predicted sentiment classification labels are used as part of the input of
another BERT for spam classification via a semi-supervised training manner
using VAT. Additionally, visualization techniques, including visualizing the
importance of words and normalizing the attention head matrix, are employed to
analyze the relevance of each component to classification accuracy. Moreover,
brand-new features will be found in the visual analysis, and classification
performance will be improved. Experimental results on Twitter's tweet dataset
demonstrate the effectiveness of the proposed model on the classification task.
Furthermore, the ablation study results illustrate the effect of different
components of the proposed model on the classification results
Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs
To make machines better understand sentiments, research needs to move from
polarity identification to understanding the reasons that underlie the
expression of sentiment. Categorizing the goals or needs of humans is one way
to explain the expression of sentiment in text. Humans are good at
understanding situations described in natural language and can easily connect
them to the character's psychological needs using commonsense knowledge. We
present a novel method to extract, rank, filter and select multi-hop relation
paths from a commonsense knowledge resource to interpret the expression of
sentiment in terms of their underlying human needs. We efficiently integrate
the acquired knowledge paths in a neural model that interfaces context
representations with knowledge using a gated attention mechanism. We assess the
model's performance on a recently published dataset for categorizing human
needs. Selectively integrating knowledge paths boosts performance and
establishes a new state-of-the-art. Our model offers interpretability through
the learned attention map over commonsense knowledge paths. Human evaluation
highlights the relevance of the encoded knowledge
- …