171 research outputs found

    Semantic Sentiment Analysis of Twitter Data

    Full text link
    Internet and the proliferation of smart mobile devices have changed the way information is created, shared, and spreads, e.g., microblogs such as Twitter, weblogs such as LiveJournal, social networks such as Facebook, and instant messengers such as Skype and WhatsApp are now commonly used to share thoughts and opinions about anything in the surrounding world. This has resulted in the proliferation of social media content, thus creating new opportunities to study public opinion at a scale that was never possible before. Naturally, this abundance of data has quickly attracted business and research interest from various fields including marketing, political science, and social studies, among many others, which are interested in questions like these: Do people like the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about the Brexit? Answering these questions requires studying the sentiment of opinions people express in social media, which has given rise to the fast growth of the field of sentiment analysis in social media, with Twitter being especially popular for research due to its scale, representativeness, variety of topics discussed, as well as ease of public access to its messages. Here we present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition. 201

    Media monitoring and information extraction for the highly inflected agglutinative language Hungarian

    Get PDF
    The Europe Media Monitor (EMM) is a fully-automatic system that analyses written online news by gathering articles in over 70 languages and by applying text analysis software for currently 21 languages, without using linguistic tools such as parsers, part-of-speech taggers or morphological analysers. In this paper, we describe the effort of adding to EMM Hungarian text mining tools for news gathering; document categorisation; named entity recognition and classification for persons, organisations and locations; name lemmatisation; quotation recognition; and cross-lingual linking of related news clusters. The major challenge of dealing with the Hungarian language is its high degree of inflection and agglutination. We present several experiments where we apply linguistically light-weight methods to deal with inflection and we propose a method to overcome the challenges. We also present detailed frequency lists of Hungarian person and location name suffixes, as found in real-life news texts. This empirical data can be used to draw further conclusions and to improve existing Named Entity Recognition software. Within EMM, the solutions described here will also be applied to other morphologically complex languages such as those of the Slavic language family. The media monitoring and analysis system EMM is freely accessible online via the web pag

    An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification

    Get PDF
    End-to-end neural machine translation has overtaken statistical machine translation in terms of translation quality for some language pairs, specially those with large amounts of parallel data. Besides this palpable improvement, neural networks provide several new properties. A single system can be trained to translate between many languages at almost no additional cost other than training time. Furthermore, internal representations learned by the network serve as a new semantic representation of words -or sentences- which, unlike standard word embeddings, are learned in an essentially bilingual or even multilingual context. In view of these properties, the contribution of the present work is two-fold. First, we systematically study the NMT context vectors, i.e. output of the encoder, and their power as an interlingua representation of a sentence. We assess their quality and effectiveness by measuring similarities across translations, as well as semantically related and semantically unrelated sentence pairs. Second, as extrinsic evaluation of the first point, we identify parallel sentences in comparable corpora, obtaining an F1=98.2% on data from a shared task when using only NMT context vectors. Using context vectors jointly with similarity measures F1 reaches 98.9%.Comment: 11 pages, 4 figure

    Reading News Data

    Get PDF
    Information overflow is both inspiring and depressing. Inspiring is more or less the easy access to various information and communication resources, which in turn facilitate the exchange of ideas, knowledge and creativity. However, the impossible fathomability of infinite information space creates a feeling of depression and anxiety. Thanks to digital technology, the speed at which information is generated and distributed significantly exceeds the speed at which it can be perceived. The more the hypertext ocean of the web is filled with content, the more impossible it is to be ‘tamed’ by human senses. In the ambivalent nature of information super-abundance, technological optimism and technological pessimism constantly compete. In fact, the two perspectives on the role of technology in the world of people have always been in conflict, but are now strongly intensified with the evolution and spread of the internet.1 This text looks at a conditional technological optimism, aiming not at postulating utopian aspirations, but at illustrating how scientific and technical elites do not lose their desire to overcome depressing complexity by seeking bold optimization solutions. The focus of this paper is on an innovative technological system for processing online news, namely the publicly available Europe Media Monitor (EMM).2 The interest in this monitoring tool is multifaceted. In most general terms it is interesting to trace the technological solutions with which computer science specialists are trying to discipline the information flow. However, EMM is also interesting as a powerful tool for understanding reality on the basis of statistically processed news databases. This study provides examples of how EMM-enabled media content processing options may be used as a basis for further detailed analysis. Of importance are also the social and institutional intentions behind the development of such a system: what are the motives and the uses associated with such an intersection between news and intelligent software

    D8.6 Dissemination, training and exploitation results

    Get PDF
    Mauerhofer, C., Rajagopal, K., & Greller, W. (2011). D8.6 Dissemination, training and exploitation results. LTfLL-project.Report on sustainability, dissemination and exploitation of the LtfLL projectThe work on this publication has been sponsored by the LTfLL STREP that is funded by the European Commission's 7th Framework Programme. Contract 212578 [http://www.ltfll-project.org

    Sentiment analysis of health care tweets: review of the methods used.

    Get PDF
    BACKGROUND: Twitter is a microblogging service where users can send and read short 140-character messages called "tweets." There are several unstructured, free-text tweets relating to health care being shared on Twitter, which is becoming a popular area for health care research. Sentiment is a metric commonly used to investigate the positive or negative opinion within these messages. Exploring the methods used for sentiment analysis in Twitter health care research may allow us to better understand the options available for future research in this growing field. OBJECTIVE: The first objective of this study was to understand which tools would be available for sentiment analysis of Twitter health care research, by reviewing existing studies in this area and the methods they used. The second objective was to determine which method would work best in the health care settings, by analyzing how the methods were used to answer specific health care questions, their production, and how their accuracy was analyzed. METHODS: A review of the literature was conducted pertaining to Twitter and health care research, which used a quantitative method of sentiment analysis for the free-text messages (tweets). The study compared the types of tools used in each case and examined methods for tool production, tool training, and analysis of accuracy. RESULTS: A total of 12 papers studying the quantitative measurement of sentiment in the health care setting were found. More than half of these studies produced tools specifically for their research, 4 used open source tools available freely, and 2 used commercially available software. Moreover, 4 out of the 12 tools were trained using a smaller sample of the study's final data. The sentiment method was trained against, on an average, 0.45% (2816/627,024) of the total sample data. One of the 12 papers commented on the analysis of accuracy of the tool used. CONCLUSIONS: Multiple methods are used for sentiment analysis of tweets in the health care setting. These range from self-produced basic categorizations to more complex and expensive commercial software. The open source and commercial methods are developed on product reviews and generic social media messages. None of these methods have been extensively tested against a corpus of health care messages to check their accuracy. This study suggests that there is a need for an accurate and tested tool for sentiment analysis of tweets trained using a health care setting-specific corpus of manually annotated tweets first
    corecore