59 research outputs found
Natural Language Engineering: Methods, Tasks and Applications
Editorial of the Special Issue “Natural Language Engineering: Methods, Tasks and Applications”
HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition
This paper introduces HeBERT and HebEMO. HeBERT is a Transformer-based model
for modern Hebrew text, which relies on a BERT (Bidirectional Encoder
Representations for Transformers) architecture. BERT has been shown to
outperform alternative architectures in sentiment analysis, and is suggested to
be particularly appropriate for MRLs. Analyzing multiple BERT specifications,
we find that while model complexity correlates with high performance on
language tasks that aim to understand terms in a sentence, a more-parsimonious
model better captures the sentiment of entire sentence. Either way, out
BERT-based language model outperforms all existing Hebrew alternatives on all
common language tasks. HebEMO is a tool that uses HeBERT to detect polarity and
extract emotions from Hebrew UGC. HebEMO is trained on a unique
Covid-19-related UGC dataset that we collected and annotated for this study.
Data collection and annotation followed an active learning procedure that aimed
to maximize predictability. We show that HebEMO yields a high F1-score of 0.96
for polarity classification. Emotion detection reaches F1-scores of 0.78-0.97
for various target emotions, with the exception of surprise, which the model
failed to capture (F1 = 0.41). These results are better than the best-reported
performance, even among English-language models of emotion detection
Multilabel Classification for News Article Using Long Short-Term Memory
oai:ojs.sjia.ilkom.unsri.ac.id:article/14Multilabel text classification is a task of categorizing text into one or more categories. Like other machine learning, multilabel classification performance is limited when there is small labeled data and leads to the difficulty of capturing semantic relationships. In this case, it requires a multi-label text classification technique that can group four labels from news articles. Deep Learning is a proposed method for solving problems in multi-label text classification techniques. By comparing the seven proposed Long Short-Term Memory (LSTM) models with large-scale datasets by dividing 4 LSTM models with 1 layer, 2 layer and 3-layer LSTM and Bidirectional LSTM to show that LSTM can achieve good performance in multi-label text classification. The results show that the evaluation of the performance of the 2-layer LSTM model in the training process obtained an accuracy of 96 with the highest testing accuracy of all models at 94.3. The performance results for model 3 with 1-layer LSTM obtained the average value of precision, recall, and f1-score equal to the 94 training process accuracy. This states that model 3 with 1-layer LSTM both training and testing process is better. The comparison among seven proposed LSTM models shows that model 3 with 1 layer LSTM is the best model
Text Classification Using Long Short-Term Memory With GloVe Features
In the classification of traditional algorithms, problems of high features dimension and data sparseness often occur when classifying text. Classifying text with traditional machine learning algorithms has high efficiency and stability characteristics. However, it has certain limitations with regard to large-scale dataset training. Deep Learning is a proposed method for solving problems in text classification techniques. By tuning the parameters and comparing the eight proposed Long Short-Term Memory (LSTM) models with a large-scale dataset, to show that LSTM with features GloVe can achieve good performance in text classification. The results show that text classification using LSTM with GloVe obtain the highest accuracy is in the sixth model with 95.17, the average precision, recall, and F1-score are 9
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)
Multiword expressions at length and in depth
The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work
MKPM: Multi keyword-pair matching for natural language sentences
Sentence matching is widely used in various natural language tasks, such as natural language inference, paraphrase identification and question answering. For these tasks, we need to understand the logical and semantic relationship between two sentences. Most current methods use all information within a sentence to build a model and hence determine its relationship to another sentence. However, the information contained in some sentences may cause redundancy or introduce noise, impeding the performance of the model. Therefore, we propose a sentence matching method based on multi keyword-pair matching (MKPM), which uses keyword pairs in two sentences to represent the semantic relationship between them, avoiding the interference of redundancy and noise. Specifically, we first propose a sentence-pair-based attention mechanism sp-attention to select the most important word pair from the two sentences as a keyword pair, and then propose a Bi-task architecture to model the semantic information of these keyword pairs. The Bi-task architecture is as follows: 1. In order to understand the semantic relationship at the word level between two sentences, we design a word-pair task (WP-Task), which uses these keyword pairs to complete sentence matching independently. 2. We design a sentence-pair task (SP-Task) to understand the sentence level semantic relationship between the two sentences by sentence denoising. Through the integration of the two tasks, our model can understand sentences more accurately from the two granularities of word and sentence. Experimental results show that our model can achieve state-of-the-art performance in several tasks. Our source code is publicly availabl
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
- …