78,678 research outputs found

    Monitoring Public Sentiment of NFL Draft Picks via Machine Learning Techniques

    Get PDF
    Sentiment analysis is a topic in natural language processing that seeks to automatically extract positive and negative polarity from text data. Its applications are diverse, ranging from marketing and sales to forum moderation to gauging public opinion. One particularly interesting application area is found in professional sports: fans share a huge volume of opinions, predictions, and reactions online that can be used to monitor public opinion on specific teams, coaches, and players. This paper explores the application of machine learning based sentiment analysis on a hand-labeled social media dataset focused on reacting to National Football League draft picks. The resulting model, called DraftSense, provides information that can be used for future analysis, including attitude towards drafted players, comparison between fan reactions and on-field performance, and comparison between drafted players based on the language used to describe them. Additionally, a labeled dataset for sentiment analysis on professional football will be created for further use

    Joint Distribution pada Weighted Majority Vote (WMV) untuk Peningkatan Kinerja Sentiment Analysis Tersupervisi pada Dataset Twitter

    Get PDF
    Sentiment analysis adalah teknik komputasi text mining berbasis natural language processing (NLP) untuk mengekstraksi pendapat seseorang yang diungkapkan dalam platform online, termasuk dalam platform microblogging Twitter, salah satu platform microblogging yang paling popular digunakan di Indonesia. Ada dua pendekatan yang umum digunakan dalam teknik sentiment analysis yaitu pendekatan berbasis machine learning (ML) dan pendekatan berbasis sentiment lexicon (SL). Fokus penelitian ini adalah untuk pengembangan teknik sentiment analysis berbasis machine learning yang disebut juga teknik tersupervisi pada dataset Twitter. Sebagian besar sentiment analysis pada dataset Twitter berbahasa Indonesia mengandalkan single machine learning algorithm. Penelitian ini menggabungkan kinerja berbagai algoritma/experts seraya mengurangi tingkat kesalahan klasifikasi dengan meng-update bobot secara dinamis menggunakan weighted majority vote (WMV) berbasis joint distribution dari Bayesian Network. Pada tahap pertama, data di grabbing dari Twitter dengan 3 hashtag terkait Covid-19 sebagai data eksperimen. Selanjutnya kinerja weighted majority vote secara ekstensif dibandingkan dengan 4 metode baseline sebagai pembanding, yaitu: Naïve Bayes, Gaussian Naïve Bayes, Multinomial Naïve Bayes dan Majority Vote dari ketiga single classifier tersebut. Metrics kinerja yang digunakan adalah precision, recall, fmeasure, accuracy dan Mathews correlation coeficient (MCCC). Dalam eksperimen, terbukti bahwa WMV mampu meningkatkan kinerja sentiment analysis pada ketiga topik dataset dengan evaluator berbagai metrics kinerja sentiment analysis. AbstractSentiment analysis is a computational text mining technique based on natural language processing (NLP) to extract someone's opinion expressed in online platforms, including the Twitter microblogging platform, one of the most popular microblogging platforms used in Indonesia. There are two approaches that are commonly used in sentiment analysis techniques, namely the machine learning (ML) based approach and the sentiment lexicon (SL) based approach. The focus of this research is the development of machine learning-based sentiment analysis techniques which are also called supervised techniques on the Twitter dataset. Most of the sentiment analysis on the Indonesian language Twitter dataset relies on a single machine learning algorithm. This study combines the performance of various algorithms/experts while reducing the level of misclassification by updating the weights dynamically using a joint distribution-based weighted majority vote (WMV) from the Bayesian Network. In the first stage, data was grabbed from Twitter with 3 hashtags related to Covid-19 as experimental data. Furthermore, the performance of the weighted majority vote was extensively compared with 4 baseline methods for comparison, namely: Naïve Bayes, Gaussian Naïve Bayes, Multinomial Nave Bayes and Majority Vote from the three single classifiers. Performance metrics used are precision, recall, fmeasure, accuracy and Mathews correlation coeficient. In experiments, it is proven that WMV is able to improve sentiment analysis performance on the three dataset topics with various evaluators of sentiment analysis performance metrics

    Text Analytics Methods for Sentence-level Sentiment Analysis

    Get PDF
    Opinions have important effects on the process of decision making. With the explosion of text information on networks, sentiment analysis, which aims at predicting the opinions of people about specific entities, has become a popular tool to make sense of countless text information. There are multiple approaches for sentence-level sentiment analysis, including machine-learning methods and lexicon-based methods. In this MSc thesis we studied two typical sentiment analysis techniques -- AFINN and RNTN, which are also the representation of lexicon-based and machine-learning methods, respectively. The assumption of a lexicon-based method is that the sum of sentiment orientation of each word or phrase predicts the contextual sentiment polarity. AFINN is a word list with sentiment strength ranging from -5 to +5, which is constructed with the inclusion of Internet slang and obscene words. With AFINN, we extract sentiment words from sentences and sentiment scores are then assigned to these words. The sentiment of a sentence is aggregated as the sum of scores from all its words. The Stanford Sentiment Treebank is a corpus with labeled parse trees, which provides the community with the possibility to train compositional models based on supervised machine learning techniques. The labels of Stanford Sentiment Treebank involve 5 categories: negative, somewhat negative, neutral, somewhat positive and positive. Compared to the standard recursive neural network (RNN) and Matrix-Vector RNN, Recursive Neural Tensor Network (RNTN) is a more powerful composition model to compute compositional vector representations for input sentences. Dependent on the Stanford Sentiment Treebank, RNTN can predict the sentiment of input sentences by its computed vector representations. With the benchmark datasets that cover diverse data sources, we carry out a thorough comparison between AFINN and RNTN. Our results highlight that although RNTN is much more complicated than AFINN, the performance of RNTN is not better than that of AFINN. To some extent, AFINN is more simple, more generic and takes less computation resources than RNTN in sentiment analysis

    A Generalized Method for Sentiment Analysis across Different Sources

    Get PDF
    Sentiment analysis is widely used in a variety of applications such as online opinion gathering for policy directives in government, monitoring of customers, and staff satisfactions in corporate bodies, in politics and security structures for public tension monitoring, and so on. In recent times, the field met with new set of challenges where new algorithms have to contend with highly unstructured sources for sentiment expressions emanating from online social media fora. In this study, a rule and lexical-based procedure is proposed together with unsupervised machine learning to implement sentiment analysis with an improved generalization ability across different sources. To deal with sources devoid of syntactic and grammatical structure, the approach incorporates a ruled-based technique for emoticon detection, word contraction expansion, noise removal, and lexicon-based text preprocessing using lexical features such as part of speech (POS), stop words, and lemmatization for local context analysis. A text is broken into number of tokens with each representing a sentence and then lexicon-dependent features are extracted from each token. )e features are merged together using a combining function for a given text before being used to train a machine learning classifier. )e proposed combining functions leverage on averaging and information gain concepts. Experimental results with different machine leaning classifiers indicate that improved performance with great deal of generalization capacity across both structured and nonstructured sources can be realized. )e finding shows that carefully designed lexical features reinforce learning process in unsupervised learning more than using word embeddings alone as the features. Obtained experimental results from movie review dataset (recall � 74.9%, precision � 70.9%, F1-score � 72.9%, and accuracy � 72.0%) and twitter samples’ datasets (recall � 93.4%, precision � 89.5%, F1-score � 91.4%, and accuracy � 91.1%) show the efficacy of the proposed approach in comparison with other state-of-the-art research studies

    A Comprehensive Analysis of Approaches for Sentiment Analysis Using Twitter Data on COVID-19 Vaccines

    Get PDF
    Sentiment Analysis has paved routes for opinion analysis of masses over unrestricted territorial limits. With the advent and growth of social media like Twitter, Facebook, WhatsApp, Snapchat in today’s world, stakeholders and the public often takes to ex-pressing their opinion on them and drawing conclusions. While these social media data are extremely informative and well connected, the major challenge lies in incorporating efficient Text Classification strategies which not only overcomes the unstructured and humongous nature of data but also generates correct polarity of opinions (i.e. positive, negative, and neutral) . This paper is a thorough effort to provide a brief study about various approaches to SA including Machine Learning, Lexicon Based, and Automatic Approaches. The paper also highlights the comparison of positive, negative, and neu-tral tweets of the Sputnik V, Moderna, and Covaxin vaccines used for preventive and emergency use of COVID-19 disease

    Sentiment Analysis using an ensemble of Feature Selection Algorithms

    Get PDF
    To determine the opinion of any person experiencing any services or buying any product, the usage of Sentiment Analysis, a continuous research in the field of text mining, is a common practice. It is a process of using computation to identify and categorize opinions expressed in a piece of text. Individuals post their opinion via reviews, tweets, comments or discussions which is our unstructured information. Sentiment analysis gives a general conclusion of audits which benefit clients, individuals or organizations for decision making. The primary point of this paper is to perform an ensemble approach on feature reduction methods identified with natural language processing and performing the analysis based on the results. An ensemble approach is a process of combining two or more methodologies. The feature reduction methods used are Principal Component Analysis (PCA) for feature extraction and Pearson Chi squared statistical test for feature selection. The fundamental commitment of this paper is to experiment whether combined use of cautious feature determination and existing classification methodologies can yield better accuracy

    Multimodal Sentiment Analysis of Instagram Using Cross-media Bag-of-words Model

    Full text link
    Instagram, one of social media sharing services has increasing growth of use and popularity during recent years. Photos or videos shared by Instagram users are challenging to be mined and analyzed for some purposes. One type of studies can be applied to Instagram data is sentiment analysis, a field of study that learn and analyze people opinion, sentiment, and (or) evaluation about something. Sentiment analysis applied to Instagram can be used as analytics tool for some business purposes such as user behavior, market intelligence and user evaluation. This research aimed to analyze sentiment contained on Instagrams post by considering two modalities: images and English text on its caption. The Cross-media Bag-of-Words Model (CBM) was applied for analyzing the sentiment contained on Instagrams post. CBM treated text and image features as a unit of vector representation. These cross-media features then classified using logistic regression to predict sentiment values which categorized into three classes: positive, negative and neutral. Simulation results showed that the combination of unigram text features and 56-length images features achieves the highest accuracy. The accuracy achieved is 87.2%. Keywords : Instagram, sentiment analysis, Cross-media Bag-of-Words Model (CBM), logistic regression, classification Bibliography [1] D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scale visual sentiment ontology and detectors using adjective noun pairs,” in Proceedings of the 21st ACM International Conference on Multimedia, ser. MM '13. New York, NY, USA: ACM, 2013, pp. 223–232. [2] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “Liblinear: A library for large linear classification,” J. Mach. Learn. Res., vol. 9, pp. 1871– 1874, Jun. 2008. [3] E. Ferrara, R. Interdonato, and A. Tagarelli, “Online popularity and topical interests through the lens of instagram,” in Proceedings of the 25th ACM Conference on Hypertext and Social Media, ser. HT '14. New York, NY, USA: ACM, 2014, pp. 24–34. [4] N. Gunawardena, J. Plumb, N. Xiao, and H. Zhang, “Instagram hashtag sentiment analysis,” in University of Utah CS530/CS630 Conference of Machine Learning 2013. Universiti of Utah CS530/CS630 Conference of Machine Learning 2013, 2013. [5] J. Han, Data Mining: Concepts and Techniques. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2005. [6] M. Hu and B. Liu, “Mining and summarizing customer reviews,” inProceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2004, pp. 168–177.[7] Y. Hu, L. Manikonda, and S. Kambhampati, “What we instagram: A first analysis of instagram photo content and user types,” International AAAI Conference on Weblogs and Social Media, 2014. [8] L. S. Huey and R. Yazdanifard, “How instagram can be used as a tool in social networking marketing,” Help College of Art and Technology Malaysia, Tech. Rep., 2014. [9] D. Jurafsky and J. H. Martin, Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition(2nd Edition). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 2009. [10] S. G. K and S. Joseph, “Text classification by augmenting bag of words ( bow ) representation with co-occurrence feature,” IOSR Journal of Computer Engineering (IOSR-JCE), vol. 16, pp. 34–38, 1 2014. [11] B. B. Kachru, The Alchemy of English: The Spread, Functions and Models of Non-native Englishes. Champaign: University of Illinois Press, 1990. [12] S. S. Keerthi and C.-J. Lin, “Asymptotic behaviors of support vector machines with gaussian kernel,” Neural Comput., vol. 15, no. 7, pp. 1667–1689, Jul. 2003. [13] M. Koppel and J. Schler, “The importance of neutral examples for learning sentiment,” in In Workshop on the Analysis of Informal and Formal Information Exchange during Negotiations, 2005.[14] A. Kowcika, A. Gupta, K. Sondhi, N. Shivhre, and R. Kumar, “Sentiment analysis for social media,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 3, no. 7, 7 2013. [15] F.-F. Li and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02, ser. CVPR '05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 524–531. [16] B. Liu, Sentiment Analysis and Opinion Mining. Morgan and Claypool Publisher, 2012. [17] W.-Y. Ma and K.-J. Chen, “A bottom-up merging algorithm for chinese unknown word extraction,” in Proceedings of the Second SIGHAN Workshop on Chinese Language Processing - Volume 17, ser. SIGHAN '03. Stroudsburg, PA, USA: Association for Computational Linguistics, 2003, pp. 31–38. [18] Z. McCune, “Consumer production in social media networks: A case study of the instagram iphone app,” Ph.D. dissertation, Dr. John Thompson, 2011. [19] W. Medhata, A. Hassanb, and H. Korashyb, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, 2014. [20] L.-P. Morency, R. Mihalcea, and P. Doshi, “Toward multimodal sentiment analysis: Harvesting opinion from the web,” in International Conference on Multimodal Interface, 2011. [21] A. Pak and P. Paroubek, “Twitter as a corpus for sentiment analysis and opinion mining,” in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), N. C. C. Chair), K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, and D. Tapias, Eds. Valletta, Malta: European Language Resources Association (ELRA), may 2010.[22] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foundation and Trends in Information Retrieval, vol. 2, no. 1-2, p. 4, 2008. [23] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: Sentiment classification using machine learning techniques,” in Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, ser. EMNLP '02. Stroudsburg, PA, USA: Association for Computational Linguistics, 2002, pp. 79–86. [24] C.-Y. J. Peng, K. L. Lee, and G. M. Ingersol, “An introduction to logistic regression analysis and reporting,” The Journal of Educational Research, vol. 96, no. 1, September/October 2002. [25] V. V. Piyush Bansal, Romil Bansal, “Towards deep semantic analysis of hashtags,” 37th European Conference on Information Retrieval, 2015. [26] R. Plutchik, Emotion: A Psycho-evolutionary Synthesis. Harper and Row, 1980. [27] S. Poria, A. Hussain, and E. Cambria, “Beyond text based sentiment analysis: Towards multi-modal systems,” University of Stirling, Stirling FK9 4LA, UK, Tech. Rep., 2013. [Online]. Available: http://www.cs.stir.ac.uk/~spo/publication/resources/cogcomp.pdf [28] E. Praseyto, Data Mining Konsep dan Aplikasi Menggunakan Matlab. Yogyakarta: Andi, 2012.[29] A. Qazi, R. G. Raj, M. Tahir, E. Cambria, and K. B. S. Syed, “Enhancing business intelligence by means of suggestive reviews,” The Scientific World Journal, vol. 2014, June 2014. [30] R. Schapire, “Machine learning algorithms for classification,” Princeton University, Tech. Rep. [31] S. Siersdorfer, E. Minack, F. Deng, and J. Hare, “Analyzing and predicting sentiment of images on the social web,” in Proceedings of the International Conference on Multimedia, ser. MM '10. New York, NY, USA: ACM, 2010, pp. 715–718. [32] T. H. Silva, P. O. S. V. de Melo, J. M. Almeida, J. Salles, and A. A. F. Loureiro, “A comparison of foursquare and instagram to the study of city dynamics and urban social behavior,” in Proceedings of the 2Nd ACM SIGKDD International Workshop on Urban Computing, ser. UrbComp '13. New York, NY, USA: ACM, 2013, pp. 4:1–4:8. [33] P. N. Stuart Russell, Artificial Intelligence A Modern Approach, M. Hirsch, Ed. New Jersey: Pearson Education, 2010. [34] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-based methods for sentiment analysis,” Comput. Linguist., vol. 37, no. 2, pp. 267– 307, Jun. 2011. [35] C.-F. Tsai, “Bag-of-words representation in image annotation: A review,” ISRN Artificial Intelligence, p. 19, 2012. [36] A. J. Viera and J. M. Garrett, “Understanding interobserver agreement: The kappa statistic,” Family Medicine, vol. 37, no. 5, pp. 360–363, May 2005. [37] M. Wang, D. Cao, L. Li, S. Li, and R. Ji, “Microblog sentiment analysis based on cross-media bag-of-words model,” in Proceedings of International Conference on Internet Multimedia Computing and Service, ser. ICIMCS '14. New York, NY, USA: ACM, 2014, pp. 76:76–76:80. [38] A. Westerski, “Sentiment analysis: Introduction and the state of the art overview,” Universidad Politecnica de Madrid, Spain, Tech. Rep., 2009. [39] F. Yu, L. Cao, R. Feris, J. Smith, and S.-F. Chang, “Designing category-level attributes for discriminative visual recognition,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, June 2013. [40] L. Yu and H. Liu, “Efficiently handling feature redundancy in high-dimensional data,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD '03. New York, NY, USA: ACM, 2003, pp. 685–690. [41] J. Yuan, S. Mcdonough, Q. You, and J. Luo, “Sentribute: Image sentiment analysis from a mid-level perspective,” in Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, ser. WISDOM '13. New York, NY, USA: ACM, 2013, pp. 10:1– 10:8

    The Role of Text Pre-processing in Sentiment Analysis

    Get PDF
    It is challenging to understand the latest trends and summarise the state or general opinions about products due to the big diversity and size of social media data, and this creates the need of automated and real time opinion extraction and mining. Mining online opinion is a form of sentiment analysis that is treated as a difficult text classification task. In this paper, we explore the role of text pre-processing in sentiment analysis, and report on experimental results that demonstrate that with appropriate feature selection and representation, sentiment analysis accuracies using support vector machines (SVM) in this area may be significantly improved. The level of accuracy achieved is shown to be comparable to the ones achieved in topic categorisation although sentiment analysis is considered to be a much harder problem in the literature
    • …
    corecore