2 research outputs found
Improving BERT with Self-Supervised Attention
One of the most popular paradigms of applying large pre-trained NLP models
such as BERT is to fine-tune it on a smaller dataset. However, one challenge
remains as the fine-tuned model often overfits on smaller datasets. A symptom
of this phenomenon is that irrelevant or misleading words in the sentence,
which are easy to understand for human beings, can substantially degrade the
performance of these finetuned BERT models. In this paper, we propose a novel
technique, called Self-Supervised Attention (SSA) to help facilitate this
generalization challenge. Specifically, SSA automatically generates weak,
token-level attention labels iteratively by probing the fine-tuned model from
the previous iteration. We investigate two different ways of integrating SSA
into BERT and propose a hybrid approach to combine their benefits. Empirically,
through a variety of public datasets, we illustrate significant performance
improvement using our SSA-enhanced BERT model
Comparative study on Judgment Text Classification for Transformer Based Models
This work involves the usage of various NLP models to predict the winner of a
particular judgment by the means of text extraction and summarization from a
judgment document. These documents are useful when it comes to legal
proceedings. One such advantage is that these can be used for citations and
precedence reference in Lawsuits and cases which makes a strong argument for
their case by the ones using it. When it comes to precedence, it is necessary
to refer to an ample number of documents in order to collect legal points with
respect to the case. However, reviewing these documents takes a long time to
analyze due to the complex word structure and the size of the document. This
work involves the comparative study of 6 different self-attention-based
transformer models and how they perform when they are being tweaked in 4
different activation functions. These models which are trained with 200
judgement contexts and their results are being judged based on different
benchmark parameters. These models finally have a confidence level up to 99%
while predicting the judgment. This can be used to get a particular judgment
document without spending too much time searching relevant cases and reading
them completely.Comment: 28 pages with 9 figure