566 research outputs found

    BERT-Embedding and Citation Network Analysis based Query Expansion Technique for Scholarly Search

    Full text link
    The enormous growth of research publications has made it challenging for academic search engines to bring the most relevant papers against the given search query. Numerous solutions have been proposed over the years to improve the effectiveness of academic search, including exploiting query expansion and citation analysis. Query expansion techniques mitigate the mismatch between the language used in a query and indexed documents. However, these techniques can suffer from introducing non-relevant information while expanding the original query. Recently, contextualized model BERT to document retrieval has been quite successful in query expansion. Motivated by such issues and inspired by the success of BERT, this paper proposes a novel approach called QeBERT. QeBERT exploits BERT-based embedding and Citation Network Analysis (CNA) in query expansion for improving scholarly search. Specifically, we use the context-aware BERT-embedding and CNA for query expansion in Pseudo-Relevance Feedback (PRF) fash-ion. Initial experimental results on the ACL dataset show that BERT-embedding can provide a valuable augmentation to query expansion and improve search relevance when combined with CNA.Comment: 1

    Attentional Encoder Network for Targeted Sentiment Classification

    Full text link
    Targeted sentiment classification aims at determining the sentimental tendency towards specific targets. Most of the previous approaches model context and target words with RNN and attention. However, RNNs are difficult to parallelize and truncated backpropagation through time brings difficulty in remembering long-term patterns. To address this issue, this paper proposes an Attentional Encoder Network (AEN) which eschews recurrence and employs attention based encoders for the modeling between context and target. We raise the label unreliability issue and introduce label smoothing regularization. We also apply pre-trained BERT to this task and obtain new state-of-the-art results. Experiments and analysis demonstrate the effectiveness and lightweight of our model.Comment: 7 page

    Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning

    Full text link
    This paper presents a deep learning-based pipeline for categorizing Bengali toxic comments, in which at first a binary classification model is used to determine whether a comment is toxic or not, and then a multi-label classifier is employed to determine which toxicity type the comment belongs to. For this purpose, we have prepared a manually labeled dataset consisting of 16,073 instances among which 8,488 are Toxic and any toxic comment may correspond to one or more of the six toxic categories - vulgar, hate, religious, threat, troll, and insult simultaneously. Long Short Term Memory (LSTM) with BERT Embedding achieved 89.42% accuracy for the binary classification task while as a multi-label classifier, a combination of Convolutional Neural Network and Bi-directional Long Short Term Memory (CNN-BiLSTM) with attention mechanism achieved 78.92% accuracy and 0.86 as weighted F1-score. To explain the predictions and interpret the word feature importance during classification by the proposed models, we utilized Local Interpretable Model-Agnostic Explanations (LIME) framework. We have made our dataset public and can be accessed at - https://github.com/deepu099cse/Multi-Labeled-Bengali-Toxic-Comments-Classificatio
    • …
    corecore