Search CORE

12 research outputs found

Improving sentiment analysis on PeduliLindungi comments: a comparative study with CNN-Word2Vec and integrated negation handling

Author: Arianti Berliana Andra
Cahyana Nur Heri
Dreżewski Rafał
Jayadianti Herlina
Saifullah Shoffan
Publication venue: Association for Scientific Computing Electronics and Engineering (ASCEE)
Publication date: 23/11/2023
Field of study

This study investigates sentiment analysis in Google Play reviews of the PeduliLindungi application, focusing on the integration of negation handling into text preprocessing and comparing the effectiveness of two prominent methods: CNN-Word2Vec CBOW and CNN-Word2Vec SkipGram. Through a meticulous methodology, negation handling is incorporated into the preprocessing phase to enhance sentiment analysis. The results demonstrate a noteworthy improvement in accuracy for both methods with the inclusion of negation handling, with CNN-Word2Vec SkipGram emerging as the superior performer, achieving an impressive 76.2% accuracy rate. Leveraging a dataset comprising 13,567 comments, this research introduces a novel approach by emphasizing the significance of negation handling in sentiment analysis. The study not only contributes valuable insights into the optimization of sentiment analysis processes but also provides practical considerations for refining methodologies, particularly in the context of mobile application reviews

Association for Scientic Computing Electronics and Engineering (ASCEE): Open Journal Systems

Optimisation Method for Training Deep Neural Networks in Classification of Non- functional Requirements

Author: Sabir M.
Sabir M.
Publication venue: London South Bank University
Publication date: 01/01/2022
Field of study

Non-functional requirements (NFRs) are regarded critical to a software system's success. The majority of NFR detection and classification solutions have relied on supervised machine learning models. It is hindered by the lack of labelled data for training and necessitate a significant amount of time spent on feature engineering. In this work we explore emerging deep learning techniques to reduce the burden of feature engineering. The goal of this study is to develop an autonomous system that can classify NFRs into multiple classes based on a labelled corpus. In the first section of the thesis, we standardise the NFRs ontology and annotations to produce a corpus based on five attributes: usability, reliability, efficiency, maintainability, and portability. In the second section, the design and implementation of four neural networks, including the artificial neural network, convolutional neural network, long short-term memory, and gated recurrent unit are examined to classify NFRs. These models, necessitate a large corpus. To overcome this limitation, we proposed a new paradigm for data augmentation. This method uses a sort and concatenates strategy to combine two phrases from the same class, resulting in a two-fold increase in data size while keeping the domain vocabulary intact. We compared our method to a baseline (no augmentation) and an existing approach Easy data augmentation (EDA) with pre-trained word embeddings. All training has been performed under two modifications to the data; augmentation on the entire data before train/validation split vs augmentation on train set only. Our findings show that as compared to EDA and baseline, NFRs classification model improved greatly, and CNN outperformed when trained using our suggested technique in the first setting. However, we saw a slight boost in the second experimental setup with just train set augmentation. As a result, we can determine that augmentation of the validation is required in order to achieve acceptable results with our proposed approach. We hope that our ideas will inspire new data augmentation techniques, whether they are generic or task specific. Furthermore, it would also be useful to implement this strategy in other languages

LSBU Research Open

Predicting the age of social network users from user-generated texts with word embeddings

Author: Alekseev A.
Nikolenko S.
Publication venue
Publication date: 01/01/2017
Field of study

© 2016 FRUCT.Many web-based applications such as advertising or recommender systems often critically depend on the demographic information, which may be unavailable for new or anonymous users. We study the problem of predicting demographic information based on user-generated texts on a Russian-language dataset from a large social network. We evaluate the efficiency of age prediction algorithms based on word2vec word embeddings and conduct a comprehensive experimental evaluation, comparing these algorithms with each other and with classical baseline approaches

Kazan Federal University Digital Repository

Music emotion recognition: a multimodal machine learning approach

Author: Gokalp Cemre
Gökalp Cemre
Publication venue
Publication date: 19/07/2019
Field of study

Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major selection preferred by web users. As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for innovative and adaptable search mechanisms, which can be personalized according to users’ emotional state, has gained increasing consideration in recent years. This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semisupervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the crossvalidated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performanc

Sabanci University Research Database

False textual information detection, a deep learning approach

Author: Alkhawaldeh Fatima
Publication venue
Publication date: 01/02/2022
Field of study

Many approaches exist for analysing fact checking for fake news identification, which is the focus of this thesis. Current approaches still perform badly on a large scale due to a lack of authority, or insufficient evidence, or in certain cases reliance on a single piece of evidence. To address the lack of evidence and the inability of models to generalise across domains, we propose a style-aware model for detecting false information and improving existing performance. We discovered that our model was effective at detecting false information when we evaluated its generalisation ability using news articles and Twitter corpora. We then propose to improve fact checking performance by incorporating warrants. We developed a highly efficient prediction model based on the results and demonstrated that incorporating is beneficial for fact checking. Due to a lack of external warrant data, we develop a novel model for generating warrants that aid in determining the credibility of a claim. The results indicate that when a pre-trained language model is combined with a multi-agent model, high-quality, diverse warrants are generated that contribute to task performance improvement. To resolve a biased opinion and making rational judgments, we propose a model that can generate multiple perspectives on the claim. Experiments confirm that our Perspectives Generation model allows for the generation of diverse perspectives with a higher degree of quality and diversity than any other baseline model. Additionally, we propose to improve the model's detection capability by generating an explainable alternative factual claim assisting the reader in identifying subtle issues that result in factual errors. The examination demonstrates that it does indeed increase the veracity of the claim. Finally, current research has focused on stance detection and fact checking separately, we propose a unified model that integrates both tasks. Classification results demonstrate that our proposed model outperforms state-of-the-art methods

White Rose E-theses Online

Outlier Detection Using K-Means Clustering with Minkowski-Chebyshev distances for Inquiry-Based Learning Results in Students Dataset

Author: Joko Eliyanto Joko
Sugiyarto Sugiyarto
Wahyuni Endang
Publication venue
Publication date
Field of study

Universitas Ahmad Dahlan Repository

Learning-based classification of software logs generated by a test automation framework

Author: Voloskin Juri
Publication venue
Publication date: 21/03/2022
Field of study

Managing large software development systems has become increasingly challenging, as large volumes of raw data generated by the production telemetry are intractable for manual processing. The client of this thesis seeks an effective scalable approach to tackle this issue by automatically classifying the software logs generated in case of integration test failures during software production. This thesis has developed two machine learning candidate solutions to demonstrate the feasibility of a learning-based approach for log classification. The first solution represents a canonical natural language processing pipeline, which performs step-by-step transformation of the input data using text preprocessing and numerical representation methods as well as permits using any traditional machine learning model for classification. The second solution employs the transfer learning approach and a deep neural language model from the family of bidirectional transformers, which incorporates an encoder for contextual text representation that is fine-tuned on a domain-specific corpus to improve classification performance. Both solutions achieved high accuracy scores, thus confirming the feasibility of a learning-based approach for software log classification. Experiments showed that contextual text representations using no text preprocessing contributed more to classification accuracy than other representation schemes attempted in this work. A transformer neural language model pre-trained on the general natural language domain successfully adapted to the domain of software logs with minimal preprocessing effort. At the same time, the experimental results indicated that careful vocabulary management and methodical log preprocessing could enhance similarity between the domains and thus further improve the classification accuracy of the transfer learning solution

Aaltodoc Publication Archive

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Author
Publication venue: 'OpenEdition'
Publication date: 10/06/2022
Field of study

Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

Directory of Open Access Books (DOAB)

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Author: Agerri Rodrigo
Aliprandi Carlo
Alkhalifa Rabab
Alzetta Chiara
Angel Jason
Anselmi Guido
Appiah Balaji Nitin Nikamanth
Aroyehun Segun Taofeek
Artigas Herold Maria Fernanda
Attanasio Giuseppe
Attardi Giuseppe
Badryzlova Yulia
Bai Yang
Baldissin Gioia
Ballarè Silvia
Barrón-Cedeño Alberto
Bartle Anna-Sophie
Basile Pierpaolo
Basile Valerio
Basili Roberto
Belotti Federico
Bennici Mauro
Bharathi B.
Bhuvana J.
Bianchi Federico
Bisconti Elia
Bolanos Luis
Bondielli Alessandro
Bosco Cristina
Breazzano Claudia
Brivio Matteo
Brunato Dominique
Cafagna Michele
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Castañeda Enrique
Castro Castro Daniel
Centeno Roberto
Cercel Dumitru-Clementin
Cerruti Massimo
Chandrabose Aravindan
Chesi Cristiano
Chiarello Filippo
Cignarella Alessandra Teresa
Cimino Andrea
Comandini Gloria
Croce Danilo
Dai Hongbing
Dascalu Mihai
Dell’Orletta Felice
Delmonte Rodolfo
Deng Tao
De Francesco Nazareno
De Martino Graziella
De Mattei Lorenzo
Di Buccio Emanuele
Di Maro Maria
di Nuovo Elisa
Di Rosa Emanuele
dos S.R. da Silva Adriano
Durante Alberto
El Abassi Samer
Espinosa María S.
Fabrizi Samuel
Fantoni Gualtiero
Ferilli Stefano
Ferraccioli Federico
Fersini Elisabetta
Finos Livio
Fiorucci Stefano
Fontana Michele
Frenda Simona
Gambino Giuseppe
Gatt Albert
Gelbukh Alexander
Giorgi Giulia
Giorgioni Simone
Girardi Paolo
Goria Eugenio
Gregori Lorenzo
Hoffmann Julia
Iacono Maria
Iovine Andrea
Izzi Giovanni Luca
Jimenez Sergio
Kaiser Jens
Kayalvizhi S.
Kivlichan Ian
Klaus Svea
Koceva Frosina
Kovács György
Kruschwitz Udo
Labadie Tamayo Roberto
Lai Mirko
Laicher Severin
Lapesa Gabriella
Lavergne Eric
Lebani Gianluca E.
Lebani Gianluca E.
Lees Alyssa
Lenci Alessandro
Leonardelli Elisa
Li Hongling
Liakata Maria
Lovetere Marco
Madonna Domenico
Massidda Riccardo
Mattei Lorenzo De
Mauri Caterina
Mele Francesco
Melucci Massimo
Menini Stefano
Miaschi Alessio
Miliani Martina
Moggio Alessio
Montagnani Matteo
Montefinese Maria
Montemagni Simonetta
Monti Johanna
Moraca Maurizio
Moretti Giovanni
Morra Simone
Murphy Killian
Muti Arianna
Nakov Preslav
Nisioi Sergiu
Nissim Malvina
Nozza Debora
Occhipinti Daniela
Ortega Bueno Reynier
Ou Xiaozhi
Palmonari Matteo
Parizzi Andrea
Pascucci Antonio
Passaro Lucia C.
Pastor Eliana
Patti Viviana
Pirrone Roberto
Polignano Marco
Politi Marcello
Pont Mattia Da
Pražák Ondřej
Proisl Thomas
Puccetti Giovanni
Přibáň Pavel
Radicioni Daniele P.
Rama Ilir
Rambelli Giulia
Ravelli Andrea Amelio
Rodrigo Alvaro
Rodriguez-Diaz Carlos A.
Rodriguez Cisnero Mariano Jason
Roman Norton T.
Roman Norton Trevisan
Rossmann Daniela
Rosso Paolo
Rotaru Armand Stefan
Rubino Edoardo
Russo Irene
Sabella Gianluca
Saini Rajkumar
Salman Samir
Sangati Federico
Sanguinetti Manuela
Sarti Gabriele
Schlechtweg Dominik
Schulte im Walde Sabine
Sciandra Andrea
Setpal Jinen
Siciliani Lucia
Solari Dario
Sorensen Jeffrey
Sorgente Antonio
Sprugnoli Rachele
Stranisci Marco
Tamburini Fabio
Taylor Stephen
Tesei Andrea
Thenmozhi D.
Tonelli Sara
Torre Ilaria
Tsakalidis Adam
Varvara Rossella
Venturi Giulia
Vettigli Giuseppe
Vlad George-Alexandru
Wang Benyou
Zaharia George-Eduard
Zamparelli Roberto
Zubiaga Arkaitz
Publication venue: 'OpenEdition'
Publication date: 11/05/2021
Field of study

OpenEdition

Prioritisation of requests, bugs and enhancements pertaining to apps for remedial actions. Towards solving the problem of which app concerns to address initially for app developers

Author: Malgaonkar Saurabh Ramakant
Publication venue: 'University of Otago Library'
Publication date: 16/04/2021
Field of study

Useful app reviews contain information related to the bugs reported by the app’s end-users along with the requests or enhancements (i.e., suggestions for improvement) pertaining to the app. App developers expend exhaustive manual efforts towards the identification of numerous useful reviews from a vast pool of reviews and converting such useful reviews into actionable knowledge by means of prioritisation. By doing so, app developers can resolve the critical bugs and simultaneously address the prominent requests or enhancements in short intervals of apps’ maintenance and evolution cycles. That said, the manual efforts towards the identification and prioritisation of useful reviews have limitations. The most common limitations are: high cognitive load required to perform manual analysis, lack of scalability associated with limited human resources to process voluminous reviews, extensive time requirements and error-proneness related to the manual efforts. While prior work from the app domain have proposed prioritisation approaches to convert reviews pertaining to an app into actionable knowledge, these studies have limitations and lack benchmarking of the prioritisation performance. Thus, the problem to prioritise numerous useful reviews still persists. In this study, initially, we conducted a systematic mapping study of the requirements prioritisation domain to explore the knowledge on prioritisation that exists and seek inspiration from the eminent empirical studies to solve the problem related to the prioritisation of numerous useful reviews. Findings of the systematic mapping study inspired us to develop automated approaches for filtering useful reviews, and then to facilitate their subsequent prioritisation. To filter useful reviews, this work developed six variants of the Multinomial Naïve Bayes method. Next, to prioritise the order in which useful reviews should be addressed, we proposed a group-based prioritisation method which initially classified the useful reviews into specific groups using an automatically generated taxonomy, and later prioritised these reviews using a multi-criteria heuristic function. Subsequently, we developed an individual prioritisation method that directly prioritised the useful reviews after filtering using the same multi-criteria heuristic function. Some of the findings of the conducted systematic mapping study not only provided the necessary inspiration towards the development of automated filtering and prioritisation approaches but also revealed crucial dimensions such as accuracy and time that could be utilised to benchmark the performance of a prioritisation method. With regards to the proposed automated filtering approach, we observed that the performance of the Multinomial Naïve Bayes variants varied based on their algorithmic structure and the nature of labelled reviews (i.e., balanced or imbalanced) that were made available for training purposes. The outcome related to the automated taxonomy generation approach for classifying useful review into specific groups showed a substantial match with the manual taxonomy generated from domain knowledge. Finally, we validated the performance of the group-based prioritisation and individual prioritisation methods, where we found that the performance of the individual prioritisation method was superior to that of the group-based prioritisation method when outcomes were assessed for the accuracy and time dimensions. In addition, we performed a full-scale evaluation of the individual prioritisation method which showed promising results. Given the outcomes, it is anticipated that our individual prioritisation method could assist app developers in filtering and prioritising numerous useful reviews to support app maintenance and evolution cycles. Beyond app reviews, the utility of our proposed prioritisation solution can be evaluated on software repositories tracking bugs and requests such as Jira, GitHub and so on

Te Tumu Eprints Repository