419,928 research outputs found

    Multimodal Sentiment Analysis of Instagram Using Cross-media Bag-of-words Model

    Full text link
    Instagram, one of social media sharing services has increasing growth of use and popularity during recent years. Photos or videos shared by Instagram users are challenging to be mined and analyzed for some purposes. One type of studies can be applied to Instagram data is sentiment analysis, a field of study that learn and analyze people opinion, sentiment, and (or) evaluation about something. Sentiment analysis applied to Instagram can be used as analytics tool for some business purposes such as user behavior, market intelligence and user evaluation. This research aimed to analyze sentiment contained on Instagrams post by considering two modalities: images and English text on its caption. The Cross-media Bag-of-Words Model (CBM) was applied for analyzing the sentiment contained on Instagrams post. CBM treated text and image features as a unit of vector representation. These cross-media features then classified using logistic regression to predict sentiment values which categorized into three classes: positive, negative and neutral. Simulation results showed that the combination of unigram text features and 56-length images features achieves the highest accuracy. The accuracy achieved is 87.2%. Keywords : Instagram, sentiment analysis, Cross-media Bag-of-Words Model (CBM), logistic regression, classification Bibliography [1] D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scale visual sentiment ontology and detectors using adjective noun pairs,” in Proceedings of the 21st ACM International Conference on Multimedia, ser. MM '13. New York, NY, USA: ACM, 2013, pp. 223–232. [2] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “Liblinear: A library for large linear classification,” J. Mach. Learn. Res., vol. 9, pp. 1871– 1874, Jun. 2008. [3] E. Ferrara, R. Interdonato, and A. Tagarelli, “Online popularity and topical interests through the lens of instagram,” in Proceedings of the 25th ACM Conference on Hypertext and Social Media, ser. HT '14. New York, NY, USA: ACM, 2014, pp. 24–34. [4] N. Gunawardena, J. Plumb, N. Xiao, and H. Zhang, “Instagram hashtag sentiment analysis,” in University of Utah CS530/CS630 Conference of Machine Learning 2013. Universiti of Utah CS530/CS630 Conference of Machine Learning 2013, 2013. [5] J. Han, Data Mining: Concepts and Techniques. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2005. [6] M. Hu and B. Liu, “Mining and summarizing customer reviews,” inProceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2004, pp. 168–177.[7] Y. Hu, L. Manikonda, and S. Kambhampati, “What we instagram: A first analysis of instagram photo content and user types,” International AAAI Conference on Weblogs and Social Media, 2014. [8] L. S. Huey and R. Yazdanifard, “How instagram can be used as a tool in social networking marketing,” Help College of Art and Technology Malaysia, Tech. Rep., 2014. [9] D. Jurafsky and J. H. Martin, Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition(2nd Edition). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 2009. [10] S. G. K and S. Joseph, “Text classification by augmenting bag of words ( bow ) representation with co-occurrence feature,” IOSR Journal of Computer Engineering (IOSR-JCE), vol. 16, pp. 34–38, 1 2014. [11] B. B. Kachru, The Alchemy of English: The Spread, Functions and Models of Non-native Englishes. Champaign: University of Illinois Press, 1990. [12] S. S. Keerthi and C.-J. Lin, “Asymptotic behaviors of support vector machines with gaussian kernel,” Neural Comput., vol. 15, no. 7, pp. 1667–1689, Jul. 2003. [13] M. Koppel and J. Schler, “The importance of neutral examples for learning sentiment,” in In Workshop on the Analysis of Informal and Formal Information Exchange during Negotiations, 2005.[14] A. Kowcika, A. Gupta, K. Sondhi, N. Shivhre, and R. Kumar, “Sentiment analysis for social media,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 3, no. 7, 7 2013. [15] F.-F. Li and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02, ser. CVPR '05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 524–531. [16] B. Liu, Sentiment Analysis and Opinion Mining. Morgan and Claypool Publisher, 2012. [17] W.-Y. Ma and K.-J. Chen, “A bottom-up merging algorithm for chinese unknown word extraction,” in Proceedings of the Second SIGHAN Workshop on Chinese Language Processing - Volume 17, ser. SIGHAN '03. Stroudsburg, PA, USA: Association for Computational Linguistics, 2003, pp. 31–38. [18] Z. McCune, “Consumer production in social media networks: A case study of the instagram iphone app,” Ph.D. dissertation, Dr. John Thompson, 2011. [19] W. Medhata, A. Hassanb, and H. Korashyb, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, 2014. [20] L.-P. Morency, R. Mihalcea, and P. Doshi, “Toward multimodal sentiment analysis: Harvesting opinion from the web,” in International Conference on Multimodal Interface, 2011. [21] A. Pak and P. Paroubek, “Twitter as a corpus for sentiment analysis and opinion mining,” in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), N. C. C. Chair), K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, and D. Tapias, Eds. Valletta, Malta: European Language Resources Association (ELRA), may 2010.[22] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foundation and Trends in Information Retrieval, vol. 2, no. 1-2, p. 4, 2008. [23] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: Sentiment classification using machine learning techniques,” in Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, ser. EMNLP '02. Stroudsburg, PA, USA: Association for Computational Linguistics, 2002, pp. 79–86. [24] C.-Y. J. Peng, K. L. Lee, and G. M. Ingersol, “An introduction to logistic regression analysis and reporting,” The Journal of Educational Research, vol. 96, no. 1, September/October 2002. [25] V. V. Piyush Bansal, Romil Bansal, “Towards deep semantic analysis of hashtags,” 37th European Conference on Information Retrieval, 2015. [26] R. Plutchik, Emotion: A Psycho-evolutionary Synthesis. Harper and Row, 1980. [27] S. Poria, A. Hussain, and E. Cambria, “Beyond text based sentiment analysis: Towards multi-modal systems,” University of Stirling, Stirling FK9 4LA, UK, Tech. Rep., 2013. [Online]. Available: http://www.cs.stir.ac.uk/~spo/publication/resources/cogcomp.pdf [28] E. Praseyto, Data Mining Konsep dan Aplikasi Menggunakan Matlab. Yogyakarta: Andi, 2012.[29] A. Qazi, R. G. Raj, M. Tahir, E. Cambria, and K. B. S. Syed, “Enhancing business intelligence by means of suggestive reviews,” The Scientific World Journal, vol. 2014, June 2014. [30] R. Schapire, “Machine learning algorithms for classification,” Princeton University, Tech. Rep. [31] S. Siersdorfer, E. Minack, F. Deng, and J. Hare, “Analyzing and predicting sentiment of images on the social web,” in Proceedings of the International Conference on Multimedia, ser. MM '10. New York, NY, USA: ACM, 2010, pp. 715–718. [32] T. H. Silva, P. O. S. V. de Melo, J. M. Almeida, J. Salles, and A. A. F. Loureiro, “A comparison of foursquare and instagram to the study of city dynamics and urban social behavior,” in Proceedings of the 2Nd ACM SIGKDD International Workshop on Urban Computing, ser. UrbComp '13. New York, NY, USA: ACM, 2013, pp. 4:1–4:8. [33] P. N. Stuart Russell, Artificial Intelligence A Modern Approach, M. Hirsch, Ed. New Jersey: Pearson Education, 2010. [34] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-based methods for sentiment analysis,” Comput. Linguist., vol. 37, no. 2, pp. 267– 307, Jun. 2011. [35] C.-F. Tsai, “Bag-of-words representation in image annotation: A review,” ISRN Artificial Intelligence, p. 19, 2012. [36] A. J. Viera and J. M. Garrett, “Understanding interobserver agreement: The kappa statistic,” Family Medicine, vol. 37, no. 5, pp. 360–363, May 2005. [37] M. Wang, D. Cao, L. Li, S. Li, and R. Ji, “Microblog sentiment analysis based on cross-media bag-of-words model,” in Proceedings of International Conference on Internet Multimedia Computing and Service, ser. ICIMCS '14. New York, NY, USA: ACM, 2014, pp. 76:76–76:80. [38] A. Westerski, “Sentiment analysis: Introduction and the state of the art overview,” Universidad Politecnica de Madrid, Spain, Tech. Rep., 2009. [39] F. Yu, L. Cao, R. Feris, J. Smith, and S.-F. Chang, “Designing category-level attributes for discriminative visual recognition,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, June 2013. [40] L. Yu and H. Liu, “Efficiently handling feature redundancy in high-dimensional data,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD '03. New York, NY, USA: ACM, 2003, pp. 685–690. [41] J. Yuan, S. Mcdonough, Q. You, and J. Luo, “Sentribute: Image sentiment analysis from a mid-level perspective,” in Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, ser. WISDOM '13. New York, NY, USA: ACM, 2013, pp. 10:1– 10:8

    Language identification of multilingual posts from Twitter: a case study

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10115-016-0997-xThis paper describes a method for handling multi-class and multi-label classification problems based on the support vector machine formalism. This method has been applied to the language identification problem in Twitter. The system evaluation was performed mainly on a Twitter data set developed in the TweetLID workshop. This data set contains bilingual tweets written in the most commonly used Iberian languages (i.e., Spanish, Portuguese, Catalan, Basque, and Galician) as well as the English language. We address the following problems: (1) social media texts. We propose a suitable tokenization that processes the peculiarities of Twitter; (2) multilingual tweets. Since a tweet can belong to more than one language, we need to use a multi-class and multi-label classifier; (3) similar languages. We study the main confusions among similar languages; and (4) unbalanced classes. We propose threshold-based strategy to favor classes with less data. We have also studied the use of Wikipedia and the addition of new tweets in order to increase the training data set. Additionally, we have tested our system on Bergsma corpus, a collection of tweets in nine languages, focusing on confusable languages using the Cyrillic, Arabic, and Devanagari alphabets. To our knowledge, we obtained the best results published on the TweetLID data set and results that are in line with the best results published on Bergsma data set.This work has been partially funded by the project ASLP-MULAN: Audio, Speech and Language Processing for Multimedia Analytics (MINECO TIN2014-54288-C4-3-R).Pla Santamaría, F.; Hurtado Oliver, LF. (2016). Language identification of multilingual posts from Twitter: a case study. Knowledge and Information Systems. 51(3):965-989. https://doi.org/10.1007/s10115-016-0997-xS965989513Baldwin T, Lui M (2010) Language identification: the long and the short of the matter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, HLT ‘10. Association for Computational Linguistics, Stroudsburg, PA, pp 229–237Bergsma S, McNamee P, Bagdouri M, Fink C, Wilson T (2012) Language identification for creating language-specific twitter collections. In: Proceedings of the second workshop on language in social media, LSM ‘12. Association for Computational Linguistics, Stroudsburg, PA, pp 65–74Carter S, Weerkamp W, Tsagkias M (2013) Microblog language identification: overcoming the limitations of short, unedited and idiomatic text. Lang Resour Eval 47(1):195–215Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, pp. 161–175Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297Gamallo P, García M, Sotelo S, Campos JRP (2014) Comparing ranking-based and naive bayes approaches to language detection on tweets. ‘TweetLID@SEPLN’, pp 12–16Goldszmidt M, Najork M, Paparizos S (2013) Boot-strapping language identifiers for short colloquial postings. In: Proceeding of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECMLPKDD 2013). SpringerGrefenstette G (1995) Comparing two language identification schemes. In: 3rd international conference on statistical analysis of textural dataHurtado LF, Pla F, Giménez M, Arnal ES (2014) Elirf-upv en tweetlid: Identificación del idioma en twitter, In: Proceedings of the Tweet language identification workshop co-located with 30th conference of the Spanish society for natural language processing, TweetLID@SEPLN 2014, Girona, 16 Sept 2014, pp 35–38Jauhiainen T, Lindén K, Jauhiainen H (2015) Language set identification in noisy synthetic multilingual documents. In: Gelbukh A (ed) Computational linguistics and intelligent text processing, vol 9041 of lecture notes in computer science. Springer International Publishing, pp 633–643Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning, no. 1398. Springer, Heidelberg, pp 137–142Liu B (2012) Sentiment analysis and opinion mining. A comprehensive introduction and survey. Morgan & Claypool Publishers, San RafaelLjubešić N, Mikelić N, Boras D (2007) Language identification: How to distinguish similar languages, In: Lužar-Stifter V, Hljuz Dobrić V (eds), Proceedings of the 29th international conference on information technology interfaces. SRCE University Computing Centre, Zagreb, pp 541–546Lui M, Baldwin T (2014) Accurate language identification of twitter messages. In: Proceedings of the EACL 2014 workshop on language analysis in social media (LASM 2014), pp 17–25Lui M, Lau JH, Baldwin T (2014) Automatic detection and language identification of multilingual documents. Trans Assoc Comput Linguist 2:27–40Nguyen D, Dogruoz AS (2014) Word level language identification in online multilingual communication. In: Proceedings of the 2013 conference on empirical methods in natural language processingO’Connor B, Krieger M, Ahn D (2010) Tweetmotif: exploratory search and topic summarization for twitter. In: Cohen WW, Gosling S (eds) Proceedings of the fourth international conference on weblogs and social media, ICWSM 2010, Washington, DC. The AAAI Press, 23–26 May 2010Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830Pla F, Hurtado L-F (2014) Political tendency identification in twitter using sentiment analysis techniques. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, pp 183–192Prager JM (1999) Linguini: language identification for multilingual documents. J Manage Inf Syst 16(3):71–101Ramón Quevedo J, Luaces O, Bahamonde A (2012) Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recogn 45(2):876–883Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents, SMUC ‘10. ACM, New York, NY, pp 37–44Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 2007:1–13Zubiaga A, Vicente IS, Gamallo P, Campos JRP, Loinaz IA, Aranberri N, Ezeiza A Fresno-Fernández V (2014) Overview of tweetlid: Tweet language identification at SEPLN 2014. In: Proceedings of the Tweet language identification workshop co-located with 30th conference of the Spanish society for natural language processing. TweetLID@SEPLN 2014, Girona, Spain, 16 Sept 2014, pp 1–11Zubiaga A, San Vicente I, Gamallo P, Pichel JR, Alegria I, Aranberri N, Ezeiza A, Fresno V (2015) TweetLID: a benchmark for tweet language identification. J Lang Res Eval. Springer, pp 1–38. doi: 10.1007/s10579-015-9317-

    Upgrading a Social Media Strategy to Increase Twitter Engagement During the Spring Annual Meeting of the American Society of Regional Anesthesia and Pain Medicine.

    Get PDF
    Microblogs known as tweets are a rapid, effective method of information dissemination in health care. Although several medical specialties have described their Twitter conference experiences, Twitter-related data in the fields of anesthesiology and pain medicine are sparse. We therefore analyzed the Twitter content of 2 consecutive spring meetings of the American Society of Regional Anesthesia and Pain Medicine using publicly available online transcripts. We also examined the potential contribution of a targeted social media campaign on Twitter engagement during the conferences. The original Twitter meeting content was largely scientific in nature and created by meeting attendees, the majority of whom were nontrainee physicians. Physician trainees, however, represent an important and increasing minority of Twitter contributors. Physicians not in attendance predominantly contributed via retweeting original content, particularly picture-containing tweets, and thus increased reach to nonattendees. A social media campaign prior to meetings may help increase the reach of conference-related Twitter discussion

    The Democratization of Social Media A Critical Perspective in Technology

    Get PDF
    Social Media is part of contemporary technology that is the contentious subject matter within the society. It is paradoxical when social media should provide techniques and objects that serve human being in a positive way, but at the same time, it can dehumanize human being such as alienation. The main problem is because the lack of impact of public policy, which does not involve society in the democratic sphere. The article is about the possibility of democratization social media in the discourse of philosophy of technology. I refer to Andrew Feenberg’s Critical Theory of Technology (CTT) for opening discourse and criticizing social media. Social Media should be changed by the critical view to analyze the internal contradictions in technocracy, which view social media merely as an instrument and value-free. In the other hand, CTT will lead into the discourse of instrumentalization theory, technological rationality, technical code and democratization of social media. I conclude this article by applying CTT to delineate extant approach and consideration of democratization of social media in Indonesian through critical thinking participation and emotional education in the public sphere

    Learning Social Image Embedding with Deep Multimodal Attention Networks

    Full text link
    Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain both link information and multimodal contents (e.g., text description, and visual content), simply employing the embedding learnt from network structure or data content results in sub-optimal social image representation. In this paper, we propose a novel social image embedding approach called Deep Multimodal Attention Networks (DMAN), which employs a deep model to jointly embed multimodal contents and link information. Specifically, to effectively capture the correlations between multimodal contents, we propose a multimodal attention network to encode the fine-granularity relation between image regions and textual words. To leverage the network structure for embedding learning, a novel Siamese-Triplet neural network is proposed to model the links among images. With the joint deep model, the learnt embedding can capture both the multimodal contents and the nonlinear network information. Extensive experiments are conducted to investigate the effectiveness of our approach in the applications of multi-label classification and cross-modal search. Compared to state-of-the-art image embeddings, our proposed DMAN achieves significant improvement in the tasks of multi-label classification and cross-modal search
    • …
    corecore