566 research outputs found
BERT-Embedding and Citation Network Analysis based Query Expansion Technique for Scholarly Search
The enormous growth of research publications has made it challenging for
academic search engines to bring the most relevant papers against the given
search query. Numerous solutions have been proposed over the years to improve
the effectiveness of academic search, including exploiting query expansion and
citation analysis. Query expansion techniques mitigate the mismatch between the
language used in a query and indexed documents. However, these techniques can
suffer from introducing non-relevant information while expanding the original
query. Recently, contextualized model BERT to document retrieval has been quite
successful in query expansion. Motivated by such issues and inspired by the
success of BERT, this paper proposes a novel approach called QeBERT. QeBERT
exploits BERT-based embedding and Citation Network Analysis (CNA) in query
expansion for improving scholarly search. Specifically, we use the
context-aware BERT-embedding and CNA for query expansion in Pseudo-Relevance
Feedback (PRF) fash-ion. Initial experimental results on the ACL dataset show
that BERT-embedding can provide a valuable augmentation to query expansion and
improve search relevance when combined with CNA.Comment: 1
Attentional Encoder Network for Targeted Sentiment Classification
Targeted sentiment classification aims at determining the sentimental
tendency towards specific targets. Most of the previous approaches model
context and target words with RNN and attention. However, RNNs are difficult to
parallelize and truncated backpropagation through time brings difficulty in
remembering long-term patterns. To address this issue, this paper proposes an
Attentional Encoder Network (AEN) which eschews recurrence and employs
attention based encoders for the modeling between context and target. We raise
the label unreliability issue and introduce label smoothing regularization. We
also apply pre-trained BERT to this task and obtain new state-of-the-art
results. Experiments and analysis demonstrate the effectiveness and lightweight
of our model.Comment: 7 page
Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning
This paper presents a deep learning-based pipeline for categorizing Bengali
toxic comments, in which at first a binary classification model is used to
determine whether a comment is toxic or not, and then a multi-label classifier
is employed to determine which toxicity type the comment belongs to. For this
purpose, we have prepared a manually labeled dataset consisting of 16,073
instances among which 8,488 are Toxic and any toxic comment may correspond to
one or more of the six toxic categories - vulgar, hate, religious, threat,
troll, and insult simultaneously. Long Short Term Memory (LSTM) with BERT
Embedding achieved 89.42% accuracy for the binary classification task while as
a multi-label classifier, a combination of Convolutional Neural Network and
Bi-directional Long Short Term Memory (CNN-BiLSTM) with attention mechanism
achieved 78.92% accuracy and 0.86 as weighted F1-score. To explain the
predictions and interpret the word feature importance during classification by
the proposed models, we utilized Local Interpretable Model-Agnostic
Explanations (LIME) framework. We have made our dataset public and can be
accessed at -
https://github.com/deepu099cse/Multi-Labeled-Bengali-Toxic-Comments-Classificatio
- …