50 research outputs found

    Mathematical Modeling of Public Opinion using Traditional and Social Media

    Get PDF
    With the growth of the internet, data from text sources has become increasingly available to researchers in the form of online newspapers, journals, and blogs. This data presents a unique opportunity to analyze human opinions and behaviors without soliciting the public explicitly. In this research, I utilize newspaper articles and the social media service Twitter to infer self-reported public opinions and awareness of climate change. Climate change is one of the most important and heavily debated issues of our time, and analyzing large-scale text surrounding this issue reveals insights surrounding self-reported public opinion. First, I inquire about public discourse on both climate change and energy system vulnerability following two large hurricanes. I apply topic modeling techniques to a corpus of articles about each hurricane in order to determine how these topics were reported on in the post event news media. Next, I perform sentiment analysis on a large collection of data from Twitter using a previously developed tool called the hedonometer . I use this sentiment scoring technique to investigate how the Twitter community reports feeling about climate change. Finally, I generalize the sentiment analysis technique to many other topics of global importance, and compare to more traditional public opinion polling methods. I determine that since traditional public opinion polls have limited reach and high associated costs, text data from Twitter may be the future of public opinion polling

    Irony and Sarcasm Detection in Twitter: The Role of Affective Content

    Full text link
    Tesis por compendioSocial media platforms, like Twitter, offer a face-saving ability that allows users to express themselves employing figurative language devices such as irony to achieve different communication purposes. Dealing with such kind of content represents a big challenge for computational linguistics. Irony is closely associated with the indirect expression of feelings, emotions and evaluations. Interest in detecting the presence of irony in social media texts has grown significantly in the recent years. In this thesis, we introduce the problem of detecting irony in social media under a computational linguistics perspective. We propose to address this task by focusing, in particular, on the role of affective information for detecting the presence of such figurative language device. Attempting to take advantage of the subjective intrinsic value enclosed in ironic expressions, we present a novel model, called emotIDM, for detecting irony relying on a wide range of affective features. For characterising an ironic utterance, we used an extensive set of resources covering different facets of affect from sentiment to finer-grained emotions. Results show that emotIDM has a competitive performance across the experiments carried out, validating the effectiveness of the proposed approach. Another objective of the thesis is to investigate the differences among tweets labeled with #irony and #sarcasm. Our aim is to contribute to the less investigated topic in computational linguistics on the separation between irony and sarcasm in social media, again, with a special focus on affective features. We also studied a less explored hashtag: #not. We find data-driven arguments on the differences among tweets containing these hashtags, suggesting that the above mentioned hashtags are used to refer different figurative language devices. We identify promising features based on affect-related phenomena for discriminating among different kinds of figurative language devices. We also analyse the role of polarity reversal in tweets containing ironic hashtags, observing that the impact of such phenomenon varies. In the case of tweets labeled with #sarcasm often there is a full reversal, whereas in the case of those tagged with #irony there is an attenuation of the polarity. We analyse the impact of irony and sarcasm on sentiment analysis, observing a drop in the performance of NLP systems developed for this task when irony is present. Therefore, we explored the possible use of our findings in irony detection for the development of an irony-aware sentiment analysis system, assuming that the identification of ironic content could help to improve the correct identification of sentiment polarity. To this aim, we incorporated emotIDM into a pipeline for determining the polarity of a given Twitter message. We compared our results with the state of the art determined by the "Semeval-2015 Task 11" shared task, demonstrating the relevance of considering affective information together with features alerting on the presence of irony for performing sentiment analysis of figurative language for this kind of social media texts. To summarize, we demonstrated the usefulness of exploiting different facets of affective information for dealing with the presence of irony in Twitter.Las plataformas de redes sociales, como Twitter, ofrecen a los usuarios la posibilidad de expresarse de forma libre y espontanea haciendo uso de diferentes recursos lingüísticos como la ironía para lograr diferentes propósitos de comunicación. Manejar ese tipo de contenido representa un gran reto para la lingüística computacional. La ironía está estrechamente vinculada con la expresión indirecta de sentimientos, emociones y evaluaciones. El interés en detectar la presencia de ironía en textos de redes sociales ha aumentado significativamente en los últimos años. En esta tesis, introducimos el problema de detección de ironía en redes sociales desde una perspectiva de la lingüística computacional. Proponemos abordar dicha tarea enfocándonos, particularmente, en el rol de información relativa al afecto y las emociones para detectar la presencia de dicho recurso lingüístico. Con la intención de aprovechar el valor intrínseco de subjetividad contenido en las expresiones irónicas, presentamos un modelo para detectar la presencia de ironía denominado emotIDM, el cual está basado en una amplia variedad de rasgos afectivos. Para caracterizar instancias irónicas, utilizamos un amplio conjunto de recursos que cubren diferentes ámbitos afectivos: desde sentimientos (positivos o negativos) hasta emociones específicas definidas con una granularidad fina. Los resultados obtenidos muestran que emotIDM tiene un desempeño competitivo en los experimentos realizados, validando la efectividad del enfoque propuesto. Otro objetivo de la tesis es investigar las diferencias entre tweets etiquetados con #irony y #sarcasm. Nuestra finalidad es contribuir a un tema menos investigado en lingüística computacional: la separación entre el uso de ironía y sarcasmo en redes sociales, con especial énfasis en rasgos afectivos. Además, estudiamos un hashtag que ha sido menos analizado: #not. Nuestros resultados parecen evidenciar que existen diferencias entre los tweets que contienen dichos hashtags, sugiriendo que son utilizados para hacer referencia de diferentes recursos lingüísticos. Identificamos un conjunto de características basadas en diferentes fenómenos afectivos que parecen ser útiles para discriminar entre diferentes tipos de recursos lingüísticos. Adicionalmente analizamos la reversión de polaridad en tweets que contienen hashtags irónicos, observamos que el impacto de dicho fenómeno es diferente en cada uno de ellos. En el caso de los tweets que están etiquetados con el hashtag #sarcasm, a menudo hay una reversión total, mientras que en el caso de los tweets etiquetados con el hashtag #irony se produce una atenuación de la polaridad. Llevamos a cabo un estudio del impacto de la ironía y el sarcasmo en el análisis de sentimientos, observamos una disminución en el rendimiento de los sistemas de PLN desarrollados para dicha tarea cuando la ironía está presente. Por consiguiente, exploramos la posibilidad de utilizar nuestros resultados en detección de ironía para el desarrollo de un sistema de análisis de sentimientos que considere de la presencia de ironía, suponiendo que la detección de contenido irónico podría ayudar a mejorar la correcta identificación del sentimiento expresado en un texto dado. Con este objetivo, incorporamos emotIDM como la primera fase en un sistema de análisis de sentimientos para determinar la polaridad de mensajes en Twitter. Comparamos nuestros resultados con el estado del arte establecido en la tarea de evaluación "Semeval-2015 Task 11", demostrando la importancia de utilizar información afectiva en conjunto con características que alertan de la presencia de la ironía para desempeñar análisis de sentimientos en textos con lenguaje figurado que provienen de redes sociales. En resumen, demostramos la utilidad de aprovechar diferentes aspectos de información relativa al afecto y las emociones para tratar cuestiones relativas a la presencia de la ironíLes plataformes de xarxes socials, com Twitter, oferixen als usuaris la possibilitat d'expressar-se de forma lliure i espontània fent ús de diferents recursos lingüístics com la ironia per aconseguir diferents propòsits de comunicació. Manejar aquest tipus de contingut representa un gran repte per a la lingüística computacional. La ironia està estretament vinculada amb l'expressió indirecta de sentiments, emocions i avaluacions. L'interés a detectar la presència d'ironia en textos de xarxes socials ha augmentat significativament en els últims anys. En aquesta tesi, introduïm el problema de detecció d'ironia en xarxes socials des de la perspectiva de la lingüística computacional. Proposem abordar aquesta tasca enfocant-nos, particularment, en el rol d'informació relativa a l'afecte i les emocions per detectar la presència d'aquest recurs lingüístic. Amb la intenció d'aprofitar el valor intrínsec de subjectivitat contingut en les expressions iròniques, presentem un model per a detectar la presència d'ironia denominat emotIDM, el qual està basat en una àmplia varietat de trets afectius. Per caracteritzar instàncies iròniques, utilitzàrem un ampli conjunt de recursos que cobrixen diferents àmbits afectius: des de sentiments (positius o negatius) fins emocions específiques definides de forma molt detallada. Els resultats obtinguts mostres que emotIDM té un rendiment competitiu en els experiments realitzats, validant l'efectivitat de l'enfocament proposat. Un altre objectiu de la tesi és investigar les diferències entre tweets etiquetats com a #irony i #sarcasm. La nostra finalitat és contribuir a un tema menys investigat en lingüística computacional: la separació entre l'ús d'ironia i sarcasme en xarxes socials, amb especial èmfasi amb els trets afectius. A més, estudiem un hashtag que ha sigut menys estudiat: #not. Els nostres resultats pareixen evidenciar que existixen diferències entre els tweets que contenen els hashtags esmentats, cosa que suggerix que s'utilitzen per fer referència de diferents recursos lingüístics. Identifiquem un conjunt de característiques basades en diferents fenòmens afectius que pareixen ser útils per a discriminar entre diferents tipus de recursos lingüístics. Addicionalment analitzem la reversió de polaritat en tweets que continguen hashtags irònics, observant que l'impacte del fenomen esmentat és diferent per a cadascun d'ells. En el cas dels tweet que estan etiquetats amb el hashtag #sarcasm, a sovint hi ha una reversió total, mentre que en el cas dels tweets etiquetats amb el hashtag #irony es produïx una atenuació de polaritat. Duem a terme un estudi de l'impacte de la ironia i el sarcasme en l'anàlisi de sentiments, on observem una disminució en el rendiment dels sistemes de PLN desenvolupats per a aquestes tasques quan la ironia està present. Per consegüent, vam explorar la possibilitat d'utilitzar els nostres resultats en detecció d'ironia per a desenvolupar un sistema d'anàlisi de sentiments que considere la presència d'ironia, suposant que la detecció de contingut irònic podria ajudar a millorar la correcta identificació del sentiment expressat en un text donat. Amb aquest objectiu, incorporem emotIDM com la primera fase en un sistema d'anàlisi de sentiments per determinar la polaritat de missatges en Twitter. Hem comparat els nostres resultats amb l'estat de l'art establert en la tasca d'avaluació "Semeval-2015 Task 11", demostrant la importància d'utilitzar informació afectiva en conjunt amb característiques que alerten de la presència de la ironia per exercir anàlisi de sentiments en textos amb llenguatge figurat que provenen de xarxes socials. En resum, hem demostrat la utilitat d'aprofitar diferents aspectes d'informació relativa a l'afecte i les emocions per tractar qüestions relatives a la presència d'ironia en Twitter.Hernández Farias, DI. (2017). Irony and Sarcasm Detection in Twitter: The Role of Affective Content [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90544TESISCompendi

    Tuning in to Terrorist Signals

    Get PDF

    Explainable Argument Mining

    Get PDF

    데이터 과학 분석 방법에 기반한 온라인 상의 사용자 행동 및 콘텐트 전파 패턴 이해 및 예측

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2019. 2. 권태경.It becomes a norm for people to communicate with one another through various online social channels, such as message boards, online social networks, and social media. As these online digital channels of communications are producing a deluge of social data, computational data-driven studies have in turn spurred to understand human behaviors and communication patterns. As part of such studies, this thesis studies online communications from the following topics: (i) characterizing threaded conversations in terms of content, user, and community perspectives, (ii) characterizing popular and viral image propagation, and (iii) understanding content publishing and sharing patterns. To this end, three large-scale datasets that contain (i) 0.7 million threaded conversations from 1.5 million users from Reddit, (ii) 0.3 million images shared by 1 million users from Pinterest, and (iii) 4.2 billion requests for 80 million URLs created through Bitly are collected. The data-driven analysis on the datasets reveals that content, user behavioral, and topical community factors (e.g., difficulties of texts, portion of reciprocal communications, or discussion-encouraged communities) are highly associated with the large, responsive, or viral conversations. Through in-depth analysis on Pinterest dataset, this thesis shows that structural virality of image cascade differentiates large cascades in terms of its shape (i.e., broadcast or diffusion) and factors such as propagating time are differently related to the volume and virality. By modeling the relations among web sites (e.g., twitter.com, amazon.com) for content sources and publishing spaces from Bitly dataset, this thesis finds that they play different roles in publishing short URLs. For example, search engines, online social networks, and computer \& electronics sites like newsfeed services are popular spaces for content publishing while news and streaming services are widely used as content sources. The analysis of content publishing and sharing patterns through URL shortening reveals that users are likely to access different types of content via different websites. For example, adult or malicious content tend to be requested from search engines, shopping content is primarily accessed through online social networks, and news content is usually clicked through computer \& electronics websites. This thesis also reports that news or shopping content, published through online social networks, tend to be requested quickly and virally. Lastly, based on the lessons learned, a learning-based model to predict whether a conversation or an image cascade would be large or viral is proposed, which achieves a high performance. By giving valuable insights on understanding (i) how different users interact with others across different content, topics, and communities, (ii) what and how content is propagated in a viral manner, and (iii) how different content is published and accessed through different online spaces, this thesis is believed to contribute to better online services such as marketing or novel platform design.사회 관계망 서비스, 소셜 미디어, 게시판 등 다양한 온라인 서비스의 발달로 한 사람이 다른 사람들과 다양한 채널을 통해 의사소통을 하는 것이 일반화 되었다. 이러한 온라인 디지털 채널들이 사용자들의 의사소통에 관련된 많은 데이터를 축적해 옴에 따라, 데이터에 기반하여 사람들의 행동이나 의사소통 방식을 모델링, 분석하고 예측하는 연구가 가능하게 되었다. 본 학위 논문에서는 이러한 연구의 한 부분으로 다음과 같은 데이터 기반 분석을 수행한다.: (i) 사용자 행동, 콘텐트, 사용자 집단 특성에 기반한 온라인 대화 패턴 분석, (ii) 인기있고 전염성 높은 (viral) 이미지 전파 특성 분석 및 예측, (iii) 온라인 콘텐트의 게시 및 소비 등 유통 흐름에 대한 분석. 이를 위해, (i) 약 150만 명의 레딧 유저로부터 생성된 70만개의 온라인 대화, (ii) 핀터레스트 내에 유포된 약 33만 개의 이미지 및 전파 데이터, (iii) Bitl를 통해 게시된 약 8천만개의 짧은 URL 및 42억개의 요청 데이터셋을 수집하고 분석한다. 이러한 분석들을 통해, 콘텐트, 사용자의 행동특성 및 집단적 특성이 각각 크고, 반응적이고, 전염적인 온라인 대화와 관련이 있음을 밝혀내었으며, 핀터레스트 데이터셋에 기반한 분석을 통해 이미지 전파에서 구조적 전염도 (Structural virality)가 단순히 큰 전파와 전파 모양 측면에서 차이가 있음을 밝혀내었다. 또한, Bitly 데이터셋에 기반하여 콘텐트와 리퍼러 (Referrer) 도메인 간의 관련성을 모델링함으로써, 서비스 별 특성 (뉴스피드, 스트리밍, 온라인 쇼핑 등) 에 따라 콘텐츠 게시 및 소비 패턴이 다름을 입증하였다. 이러한 발견들에 기반하여, 최종적으로 하나의 온라인 대화나 이미지 콘텐트가 커질지 혹은 전염적으로 확산될지를 예측하기 위한 기계학습 기반 모델을 제안하였다. 본 논문에서 제안된 모델은 최초에 관측된 코멘트 혹은 이미지 전파 패턴, 사용자의 행동 특성, 콘텐트의 특성을 모두 활용하여 높은 확률로 크거나 전염성이 높은 대화 및 이미지 전파를 예측할 수 있었다. 본 학위 논문을 통해 발견된 현상 및 예측 모델은 온라인 사회 관계망 서비스 제공자, 마케터, 콘텐트 제공자 등 정보나 콘텐츠의 확산을 목적으로 하는 사람들은 물론, 전파 패턴이나 확산 규모 등에 대한 해석가능한 인공지능 모델을 개발하는데 있어서 큰 기여를 할 수 있을 것으로 기대한다.Abstract Chapter 1 Introduction 1 Chapter 2 Background 8 2.1 Reddit, Pinterest, and Bitly 8 2.1.1 Reddit 8 2.1.2 Pinterest 9 2.1.3 Bit.ly: A URL Shortening Service 10 2.2 Related Works 12 Chapter 3 Methodology 20 3.1 Data Collection 20 3.1.1 Reddit 20 3.1.2 Pinterest 22 3.1.3 Bitly 24 3.2 Models 26 3.2.1 Comment Tree: Threaded Conversation Model 26 3.2.2 Pin Tree:Image Cascade Model 28 3.2.3 Content-Referrer Graph Model 30 Chapter 4 Analysis on Online Conversations in Reddit 33 4.1 Comment Tree Analysis 33 4.1.1 Content Perspectives 33 4.1.2 User Participation in Comment Trees 40 4.2 Conversation Patterns across Communities 46 4.2.1 Conversations in Subreddits 46 4.2.2 Content and User Characteristics 49 4.2.3 Groups of Subreddits 54 Chapter 5 Analysis on Image Cascade in Pinterest 58 5.1 Characteristics of image cascades 58 5.2 Are popular images also viral? 61 Chapter 6 Analyzing Content Publishing and Sharing Patterns through Bitly 66 6.1 Content Sharing Patterns thorough Bit.ly 66 6.1.1 URL Shortening Patterns 66 6.1.2 URL Request Pattern 70 6.2 Content-Referrer Graph 75 6.2.1 Basic Analysis 75 6.2.2 Relations among Domains 77 6.2.3 Role of Domains 78 6.3 Referrer Analysis 81 6.3.1 Referrer Preference 81 6.3.2 Referrer Responsiveness 83 Chapter 7 Predicting Large/Viral Conversations and Image Cascades 86 7.1 Predicting Large/Viral Conversations 86 7.1.1 Problem Formulation 87 7.1.2 Experiment Setup 87 7.1.3 Performance Analysis 90 7.2 Popular and Viral Image Prediction 93 7.2.1 Predictive Power of Image Itself 95 7.2.2 Predictive Power of Image Meta and Pinner Information 97 7.2.3 Predictive Power of Initial Propagation Pattern 100 Chapter 8 Conclusion 104 초록 115Docto
    corecore