302 research outputs found

    A HYBRID DEEP LEARNING APPROACH FOR SENTIMENT ANALYSIS IN PRODUCT REVIEWS

    Get PDF
    Product reviews play a crucial role in providing valuable insights to consumers and producers. Analyzing the vast amount of data generated around a product, such as posts, comments, and views, can be challenging for business intelligence purposes. Sentiment analysis of this content helps both consumers and producers gain a better understanding of the market status, enabling them to make informed decisions. In this study, we propose a novel hybrid approach based on deep neural networks (DNNs) for sentiment analysis in product reviews, focusing on the classification of sentiments expressed. Our approach utilizes the recursive neural network (RNN) algorithm for sentiment classification. To address the imbalanced distribution of positive and negative samples in social network data, we employ a resampling technique that balances the dataset by increasing samples from the minority class and decreasing samples from the majority class. We evaluate our approach using Amazon data, comprising four product categories: clothing, cars, luxury goods, and household appliances. Experimental results demonstrate that our proposed approach performs well in sentiment analysis for product reviews, particularly in the context of digital marketing. Furthermore, the attention-based RNN algorithm outperforms the baseline RNN by approximately 5%. Notably, the study reveals consumer sentiment variations across different products, particularly in relation to appearance and price aspects

    Detection of Offensive YouTube Comments, a Performance Comparison of Deep Learning Approaches

    Get PDF
    Social media data is open, free and available in massive quantities. However, there is a significant limitation in making sense of this data because of its high volume, variety, uncertain veracity, velocity, value and variability. This work provides a comprehensive framework of text processing and analysis performed on YouTube comments having offensive and non-offensive contents. YouTube is a platform where every age group of people logs in and finds the type of content that most appeals to them. Apart from this, a massive increase in the use of offensive language has been apparent. As there are massive volume of new comments, each comment cannot be removed manually or it will be bad for business for youtubers if they make their comment section unavailable as they will not be able to get any feedback of any kind

    A performance comparison of oversampling methods for data generation in imbalanced learning tasks

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Marketing Research e CRMClass Imbalance problem is one of the most fundamental challenges faced by the machine learning community. The imbalance refers to number of instances in the class of interest being relatively low, as compared to the rest of the data. Sampling is a common technique for dealing with this problem. A number of over - sampling approaches have been applied in an attempt to balance the classes. This study provides an overview of the issue of class imbalance and attempts to examine some common oversampling approaches for dealing with this problem. In order to illustrate the differences, an experiment is conducted using multiple simulated data sets for comparing the performance of these oversampling methods on different classifiers based on various evaluation criteria. In addition, the effect of different parameters, such as number of features and imbalance ratio, on the classifier performance is also evaluated

    Relationship Between Personality Patterns and Harmfulness : Analysis and Prediction Based on Sentence Embedding

    Get PDF
    This paper hypothesizes that harmful utterances need to be judged in the context of whole sentences, and the authors extract features of harmful expressions using a general-purpose language model. Based on the extracted features, the authors propose a method to predict the presence or absence of harmful categories. In addition, the authors believe that it is possible to analyze users who incite others by combining this method with research on analyzing the personality of the speaker from statements on social networking sites. The results confirmed that the proposed method can judge the possibility of harmful comments with higher accuracy than simple dictionary-based models or models using a distributed representation of words. The relationship between personality patterns and harmful expressions was also confirmed by an analysis based on a harmful judgment model

    Natural language content evaluation system for multiclass detection of hate speech in tweets using transformers

    Get PDF
    In natural language processing, accurate categorization of tweets, including detecting hate speech, plays a pivotal role in efficient information organization and analysis. This paper presents a Natural Language Contents Evaluation System specifically tailored for multi-class tweet categorization, focusing on hate speech detection. Our system enhances classification accuracy and efficiency by harnessing the power of Transformers, namely BERT and DistilBERT. By leveraging feature extraction techniques, we capture pertinent information from tweets, enabling practical analysis, categorization, and identification of hate speech instances. During training, we also tackle imbalanced corpora by employing techniques to ensure fair representation of different tweet categories, including hate speech. Our system achieves impressive accuracy through extensive training of 95%, showcasing Transformers' effectiveness in comprehending and categorizing tweets, including identifying hate speech. Furthermore, our system maintains a good accuracy during testing of 83%, highlighting the robustness and generalizability of the trained models for hate speech detection. This system contributes to advancing automated tweet categorization, specifically in hate speech detection, providing a reliable and efficient solution for organizing and analyzing diverse tweet datasets.Universidad Tecnología de Bolíva

    Irony Detection in Twitter with Imbalanced Class Distributions

    Full text link
    [EN] Irony detection is a not trivial problem and can help to improve natural language processing tasks as sentiment analysis. When dealing with social media data in real scenarios, an important issue to address is data skew, i.e. the imbalance between available ironic and non-ironic samples available. In this work, the main objective is to address irony detection in Twitter considering various degrees of imbalanced distribution between classes. We rely on the emotIDM irony detection model. We evaluated it against both benchmark corpora and skewed Twitter datasets collected to simulate a realistic distribution of ironic tweets. We carry out a set of classification experiments aimed to determine the impact of class imbalance on detecting irony, and we evaluate the performance of irony detection when different scenarios are considered. We experiment with a set of classifiers applying class imbalance techniques to compensate class distribution. Our results indicate that by using such techniques, it is possible to improve the performance of irony detection in imbalanced class scenarios.The first author was funded by CONACYT project FC-2016/2410. Ronaldo Prati was supported by the São Paulo State (Brazil) research council FAPESP under project 2015/20606-6. Francisco Herrera was partially supported by the Spanish National Research Project TIN2017-89517-P. The work of Paolo Rosso was partially supported by the Spanish MICINN under the research project MISMIS (PGC2018-096212- B-C31) and by the Generalitat Valenciana under the grant PROMETEO/2019/121.Hernandez-Farias, DI.; Prati, R.; Herrera, F.; Rosso, P. (2020). Irony Detection in Twitter with Imbalanced Class Distributions. Journal of Intelligent & Fuzzy Systems. 39(2):2147-2163. https://doi.org/10.3233/JIFS-179880S21472163392Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20-29. doi:10.1145/1007730.1007735Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357. doi:10.1613/jair.953Fernández A. , García S. , Galar M. , Prati R.C. , Krawczyk B. and Herrera F. , Learning from imbalanced data sets, Springer, (2018).Haibo He, & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. doi:10.1109/tkde.2008.239Farías, D. I. H., Patti, V., & Rosso, P. (2016). Irony Detection in Twitter. ACM Transactions on Internet Technology, 16(3), 1-24. doi:10.1145/2930663Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study1. Intelligent Data Analysis, 6(5), 429-449. doi:10.3233/ida-2002-6504Kumon-Nakamura, S., Glucksberg, S., & Brown, M. (1995). How about another piece of pie: The allusional pretense theory of discourse irony. Journal of Experimental Psychology: General, 124(1), 3-21. doi:10.1037/0096-3445.124.1.3López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113-141. doi:10.1016/j.ins.2013.07.007Mohammad, S. M., & Turney, P. D. (2012). CROWDSOURCING A WORD-EMOTION ASSOCIATION LEXICON. Computational Intelligence, 29(3), 436-465. doi:10.1111/j.1467-8640.2012.00460.xMohammad, S. M., Zhu, X., Kiritchenko, S., & Martin, J. (2015). Sentiment, emotion, purpose, and style in electoral tweets. Information Processing & Management, 51(4), 480-499. doi:10.1016/j.ipm.2014.09.003Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., & Bandyopadhyay, S. (2013). Enhanced SenticNet with Affective Labels for Concept-Based Opinion Mining. IEEE Intelligent Systems, 28(2), 31-38. doi:10.1109/mis.2013.4Prati, R. C., Batista, G. E. A. P. A., & Silva, D. F. (2014). Class imbalance revisited: a new experimental setup to assess the performance of treatment methods. Knowledge and Information Systems, 45(1), 247-270. doi:10.1007/s10115-014-0794-3Reyes, A., Rosso, P., & Veale, T. (2012). A multidimensional approach for detecting irony in Twitter. Language Resources and Evaluation, 47(1), 239-268. doi:10.1007/s10579-012-9196-xSulis, E., Irazú Hernández Farías, D., Rosso, P., Patti, V., & Ruffo, G. (2016). Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not. Knowledge-Based Systems, 108, 132-143. doi:10.1016/j.knosys.2016.05.035Utsumi, A. (2000). Verbal irony as implicit display of ironic environment: Distinguishing ironic utterances from nonirony. Journal of Pragmatics, 32(12), 1777-1806. doi:10.1016/s0378-2166(99)00116-2Whissell, C. (2009). Using the Revised Dictionary of Affect in Language to Quantify the Emotional Undertones of Samples of Natural Language. Psychological Reports, 105(2), 509-521. doi:10.2466/pr0.105.2.509-521Wilson, D., & Sperber, D. (1992). On verbal irony. Lingua, 87(1-2), 53-76. doi:10.1016/0024-3841(92)90025-

    A study of feature exraction techniques for classifying topics and sentiments from news posts

    Get PDF
    Recently, many news channels have their own Facebook pages in which news posts have been released in a daily basis. Consequently, these news posts contain temporal opinions about social events that may change over time due to external factors as well as may use as a monitor to the significant events happened around the world. As a result, many text mining researches have been conducted in the area of Temporal Sentiment Analysis, which one of its most challenging tasks is to detect and extract the key features from news posts that arrive continuously overtime. However, extracting these features is a challenging task due to post’s complex properties, also posts about a specific topic may grow or vanish overtime leading in producing imbalanced datasets. Thus, this study has developed a comparative analysis on feature extraction Techniques which has examined various feature extraction techniques (TF-IDF, TF, BTO, IG, Chi-square) with three different n-gram features (Unigram, Bigram, Trigram), and using SVM as a classifier. The aim of this study is to discover the optimal Feature Extraction Technique (FET) that could achieve optimum accuracy results for both topic and sentiment classification. Accordingly, this analysis is conducted on three news channels’ datasets. The experimental results for topic classification have shown that Chi-square with unigram have proven to be the best FET compared to other techniques. Furthermore, to overcome the problem of imbalanced data, this study has combined the best FET with OverSampling technology. The evaluation results have shown an improvement in classifier’s performance and has achieved a higher accuracy at 93.37%, 92.89%, and 91.92 for BBC, Al-Arabiya, and Al-Jazeera, respectively, compared to what have been obtained on original datasets. Similarly, same combination (Chi-square+Unigram) has been used for sentiment classification and obtained accuracies at rates of 81.87%, 70.01%, 77.36%. However, testing the recognized optimal FET on unseen randomly selected news posts has shown a relatively very low accuracies for both topic and sentiment classification due to the changes of topics and sentiments over time
    corecore