4,786 research outputs found

    Hierarchical semantic representations of online news comments for emotion tagging using multiple information sources

    Get PDF
    With the development of online news services, users now can actively respond to online news by expressing subjective emotions, which can help us understand the predilections and opinions of an individual user, and help news publishers to provide more relevant services. Neural network methods have achieved promising results, but still have challenges in the field of emotion tagging. Firstly, these methods regard the whole document as a stream or bag of words and can't encode the intrinsic relations between sentences. So these methods cannot properly express the semantic meaning of the document in which sentences may have logical relations. Secondly, these methods only use semantics of the document itself, while ignoring the accompanying information sources, which can significantly influence the interpretation of the sentiment contained in documents. Therefore, this paper presents a hierarchical semantic representation model of news comments using multiple information sources, called Hierarchical Semantic Neural Network (HSNN). In particular, we begin with a novel neural network model to learn document representation in a bottom-up way, capturing not only the semantics within sentence but also semantics or logical relations between sentences. On top of this, we tackle the task of predicting emotions for online news comments by exploiting multiple information sources including the content of comments, the content of news articles, and the user-generated emotion votes. A series of experiments and tests on real-world datasets have demonstrated the effectiveness of our proposed approach

    Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish

    Get PDF
    Tagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field.Fil: Tessore, Juan Pablo. Universidad Nacional del Noroeste de la Pcia.de Bs.as.. Escuela de Tecnologia. Instituto de Investigacion y Transferencia En Tecnologia. - Comision de Investigaciones Cientificas de la Provincia de Buenos Aires. Instituto de Investigacion y Transferencia En Tecnologia.; ArgentinaFil: Esnaola, Leonardo Martín. Universidad Nacional del Noroeste de la Pcia.de Bs.as.. Escuela de Tecnologia. Instituto de Investigacion y Transferencia En Tecnologia. - Comision de Investigaciones Cientificas de la Provincia de Buenos Aires. Instituto de Investigacion y Transferencia En Tecnologia.; ArgentinaFil: Lanzarini, Laura Cristina. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; ArgentinaFil: Baldassarri, Sandra Silvia. Universidad de Zaragoza; Españ

    Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish

    Get PDF
    Tagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field

    Towards the Global SentiWordNet

    Get PDF

    Comparison of machine learning for sentiment analysis in detecting anxiety based on social media data

    Get PDF
    All groups of people felt the impact of the COVID-19 pandemic. This situation triggers anxiety, which is bad for everyone. The government's role is very influential in solving these problems with its work program. It also has many pros and cons that cause public anxiety. For that, it is necessary to detect anxiety to improve government programs that can increase public expectations. This study applies machine learning to detecting anxiety based on social media comments regarding government programs to deal with this pandemic. This concept will adopt a sentiment analysis in detecting anxiety based on positive and negative comments from netizens. The machine learning methods implemented include K-NN, Bernoulli, Decision Tree Classifier, Support Vector Classifier, Random Forest, and XG-boost. The data sample used is the result of crawling YouTube comments. The data used amounted to 4862 comments consisting of negative and positive data with 3211 and 1651. Negative data identify anxiety, while positive data identifies hope (not anxious). Machine learning is processed based on feature extraction of count-vectorization and TF-IDF. The results showed that the sentiment data amounted to 3889 and 973 in testing, and training with the greatest accuracy was the random forest with feature extraction of vectorization count and TF-IDF of 84.99% and 82.63%, respectively. The best precision test is K-NN, while the best recall is XG-Boost. Thus, Random Forest is the best accurate to detect someone's anxiety based-on data from social media
    • …
    corecore