413 research outputs found
Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets
Metaphor is one of the most important elements of human communication, especially in informal settings such as social media. There have been a number of datasets created for metaphor identification, however, this task has proven difficult due to the nebulous nature of metaphoricity. In this paper, we present a crowd-sourcing approach for the creation of a dataset for metaphor identification, that is able to rapidly achieve large coverage over the different usages of metaphor in a given corpus while maintaining high accuracy. We validate this methodology by creating a set of 2,500 manually annotated tweets in English, for which we achieve inter-annotator agreement scores over 0.8, which is higher than other reported results that did not limit the task. This methodology is based on the use of an existing classifier for metaphor in order to assist in the identification and the selection of the examples for annotation, in a way that reduces the cognitive load for annotators and enables quick and accurate annotation. We selected a corpus of both general language tweets and political tweets relating to Brexit and we compare the resulting corpus on these two domains. As a result of this work, we have published the first dataset of tweets annotated for metaphors, which we believe will be invaluable for the development, training and evaluation of approaches for metaphor identification in tweets
Spatial and Temporal Sentiment Analysis of Twitter data
The public have used Twitter world wide for expressing opinions. This study focuses on spatio-temporal variation of georeferenced Tweets’ sentiment polarity, with a view to understanding how opinions evolve on Twitter over space and time and across communities of users. More specifically, the question this study tested is whether sentiment polarity on Twitter exhibits specific time-location patterns. The aim of the study is to investigate the spatial and temporal distribution of georeferenced Twitter sentiment polarity within the area of 1 km buffer around the Curtin Bentley campus boundary in Perth, Western Australia. Tweets posted in campus were assigned into six spatial zones and four time zones. A sentiment analysis was then conducted for each zone using the sentiment analyser tool in the Starlight Visual Information System software. The Feature Manipulation Engine was employed to convert non-spatial files into spatial and temporal feature class. The spatial and temporal distribution of Twitter sentiment polarity patterns over space and time was mapped using Geographic Information Systems (GIS). Some interesting results were identified. For example, the highest percentage of positive Tweets occurred in the social science area, while science and engineering and dormitory areas had the highest percentage of negative postings. The number of negative Tweets increases in the library and science and engineering areas as the end of the semester approaches, reaching a peak around an exam period, while the percentage of negative Tweets drops at the end of the semester in the entertainment and sport and dormitory area. This study will provide some insights into understanding students and staff ’s sentiment variation on Twitter, which could be useful for university teaching and learning management
Irony and Sarcasm Detection in Twitter: The Role of Affective Content
Tesis por compendioSocial media platforms, like Twitter, offer a face-saving ability that allows users to express themselves employing figurative language devices such as irony to achieve different communication purposes. Dealing with such kind of content represents a big challenge for computational linguistics. Irony is closely associated with the indirect expression of feelings, emotions and evaluations. Interest in detecting the presence of irony in social media texts has grown significantly in the recent years.
In this thesis, we introduce the problem of detecting irony in social media under a computational linguistics perspective. We propose to address this task by focusing, in particular, on the role of affective information for detecting the presence of such figurative language device.
Attempting to take advantage of the subjective intrinsic value enclosed in ironic expressions, we present a novel model, called emotIDM, for detecting irony relying on a wide range of affective features. For characterising an ironic utterance, we used an extensive set of resources covering different facets of affect from sentiment to finer-grained emotions. Results show that emotIDM has a competitive performance across the experiments carried out, validating the effectiveness of the proposed approach.
Another objective of the thesis is to investigate the differences among tweets labeled with #irony and #sarcasm. Our aim is to contribute to the less investigated topic in computational linguistics on the separation between irony and sarcasm in social media, again, with a special focus on affective features. We also studied a less explored hashtag: #not. We find data-driven arguments on the differences among tweets containing these hashtags, suggesting that the above mentioned hashtags are used to refer different figurative language devices.
We identify promising features based on affect-related phenomena for discriminating among different kinds of figurative language devices. We also analyse the role of polarity reversal in tweets containing ironic hashtags, observing that the impact of such phenomenon varies.
In the case of tweets labeled with #sarcasm often there is a full reversal, whereas in the case of those tagged with #irony there is an attenuation of the polarity.
We analyse the impact of irony and sarcasm on sentiment analysis, observing a drop in the performance of NLP systems developed for this task when irony is present. Therefore, we explored the possible use of our findings in irony detection for the development of an irony-aware sentiment analysis system, assuming that the identification of ironic content could help to improve the correct identification of sentiment polarity. To this aim, we incorporated emotIDM into a pipeline for determining the polarity of a given Twitter message.
We compared our results with the state of the art determined by the "Semeval-2015 Task 11" shared task, demonstrating the relevance of considering affective information together with features alerting on the presence of irony for performing sentiment analysis of figurative language for this kind of social media texts. To summarize, we demonstrated the usefulness of exploiting different facets of affective information for dealing with the presence of irony in Twitter.Las plataformas de redes sociales, como Twitter, ofrecen a los usuarios la posibilidad de expresarse de forma libre y espontanea haciendo uso de diferentes recursos lingüÃsticos como la ironÃa para lograr diferentes propósitos de comunicación. Manejar ese tipo de contenido representa un gran reto para la lingüÃstica computacional. La ironÃa está estrechamente vinculada con la expresión indirecta de sentimientos, emociones y evaluaciones. El interés en detectar la presencia de ironÃa en textos de redes sociales ha aumentado significativamente en los últimos años.
En esta tesis, introducimos el problema de detección de ironÃa en redes sociales desde una perspectiva de la lingüÃstica computacional. Proponemos abordar dicha tarea enfocándonos, particularmente, en el rol de información relativa al afecto y las emociones para detectar la presencia de dicho recurso lingüÃstico. Con la intención de aprovechar el valor intrÃnseco de subjetividad contenido en las expresiones irónicas, presentamos un modelo para detectar la presencia de ironÃa denominado emotIDM, el cual está basado en una amplia variedad de rasgos afectivos. Para caracterizar instancias irónicas, utilizamos un amplio conjunto de recursos que cubren diferentes ámbitos afectivos: desde sentimientos (positivos o negativos) hasta emociones especÃficas definidas con una granularidad fina. Los resultados obtenidos muestran que emotIDM tiene un desempeño competitivo en los experimentos realizados, validando la efectividad del enfoque propuesto.
Otro objetivo de la tesis es investigar las diferencias entre tweets etiquetados con #irony y #sarcasm. Nuestra finalidad es contribuir a un tema menos investigado en lingüÃstica computacional: la separación entre el uso de ironÃa y sarcasmo en redes sociales, con especial énfasis en rasgos afectivos. Además, estudiamos un hashtag que ha sido menos analizado: #not. Nuestros resultados parecen evidenciar que existen diferencias entre los tweets que contienen dichos hashtags, sugiriendo que son utilizados para hacer referencia de diferentes recursos lingüÃsticos. Identificamos un conjunto de caracterÃsticas basadas en diferentes fenómenos afectivos que parecen ser útiles para discriminar entre diferentes tipos de recursos lingüÃsticos. Adicionalmente analizamos la reversión de polaridad en tweets que contienen hashtags irónicos, observamos que el impacto de dicho fenómeno es diferente en cada uno de ellos. En el caso de los tweets que están etiquetados con el hashtag #sarcasm, a menudo hay una reversión total, mientras que en el caso de los tweets etiquetados con el hashtag #irony se produce una atenuación de la polaridad.
Llevamos a cabo un estudio del impacto de la ironÃa y el sarcasmo en el análisis de sentimientos, observamos una disminución en el rendimiento de los sistemas de PLN desarrollados para dicha tarea cuando la ironÃa está presente. Por consiguiente, exploramos la posibilidad de utilizar nuestros resultados en detección de ironÃa para el desarrollo de un sistema de análisis de sentimientos que considere de la presencia de ironÃa, suponiendo que la detección de contenido irónico podrÃa ayudar a mejorar la correcta identificación del sentimiento expresado en un texto dado. Con este objetivo, incorporamos emotIDM como la primera fase en un sistema de análisis de sentimientos para determinar la polaridad de mensajes en Twitter. Comparamos nuestros resultados con el estado del arte establecido en la tarea de evaluación "Semeval-2015 Task 11", demostrando la importancia de utilizar información afectiva en conjunto con caracterÃsticas que alertan de la presencia de la ironÃa para desempeñar análisis de sentimientos en textos con lenguaje figurado que provienen de redes sociales. En resumen, demostramos la utilidad de aprovechar diferentes aspectos de información relativa al afecto y las emociones para tratar cuestiones relativas a la presencia de la ironÃLes plataformes de xarxes socials, com Twitter, oferixen als usuaris la possibilitat d'expressar-se de forma lliure i espontà nia fent ús de diferents recursos lingüÃstics com la ironia per aconseguir diferents propòsits de comunicació. Manejar aquest tipus de contingut representa un gran repte per a la lingüÃstica computacional. La ironia està estretament vinculada amb l'expressió indirecta de sentiments, emocions i avaluacions. L'interés a detectar la presència d'ironia en textos de xarxes socials ha augmentat significativament en els últims anys.
En aquesta tesi, introduïm el problema de detecció d'ironia en xarxes socials des de la perspectiva de la lingüÃstica computacional. Proposem abordar aquesta tasca enfocant-nos, particularment, en el rol d'informació relativa a l'afecte i les emocions per detectar la presència d'aquest recurs lingüÃstic. Amb la intenció d'aprofitar el valor intrÃnsec de subjectivitat contingut en les expressions iròniques, presentem un model per a detectar la presència d'ironia denominat emotIDM, el qual està basat en una à mplia varietat de trets afectius. Per caracteritzar instà ncies iròniques, utilitzà rem un ampli conjunt de recursos que cobrixen diferents à mbits afectius: des de sentiments (positius o negatius) fins emocions especÃfiques definides de forma molt detallada. Els resultats obtinguts mostres que emotIDM té un rendiment competitiu en els experiments realitzats, validant l'efectivitat de l'enfocament proposat.
Un altre objectiu de la tesi és investigar les diferències entre tweets etiquetats com a #irony i #sarcasm. La nostra finalitat és contribuir a un tema menys investigat en lingüÃstica computacional: la separació entre l'ús d'ironia i sarcasme en xarxes socials, amb especial èmfasi amb els trets afectius. A més, estudiem un hashtag que ha sigut menys estudiat: #not. Els nostres resultats pareixen evidenciar que existixen diferències entre els tweets que contenen els hashtags esmentats, cosa que suggerix que s'utilitzen per fer referència de diferents recursos lingüÃstics. Identifiquem un conjunt de caracterÃstiques basades en diferents fenòmens afectius que pareixen ser útils per a discriminar entre diferents tipus de recursos lingüÃstics. Addicionalment analitzem la reversió de polaritat en tweets que continguen hashtags irònics, observant que l'impacte del fenomen esmentat és diferent per a cadascun d'ells. En el cas dels tweet que estan etiquetats amb el hashtag #sarcasm, a sovint hi ha una reversió total, mentre que en el cas dels tweets etiquetats amb el hashtag #irony es produïx una atenuació de polaritat.
Duem a terme un estudi de l'impacte de la ironia i el sarcasme en l'anà lisi de sentiments, on observem una disminució en el rendiment dels sistemes de PLN desenvolupats per a aquestes tasques quan la ironia està present. Per consegüent, vam explorar la possibilitat d'utilitzar els nostres resultats en detecció d'ironia per a desenvolupar un sistema d'anà lisi de sentiments que considere la presència d'ironia, suposant que la detecció de contingut irònic podria ajudar a millorar la correcta identificació del sentiment expressat en un text donat. Amb aquest objectiu, incorporem emotIDM com la primera fase en un sistema d'anà lisi de sentiments per determinar la polaritat de missatges en Twitter. Hem comparat els nostres resultats amb l'estat de l'art establert en la tasca d'avaluació "Semeval-2015 Task 11", demostrant la importà ncia d'utilitzar informació afectiva en conjunt amb caracterÃstiques que alerten de la presència de la ironia per exercir anà lisi de sentiments en textos amb llenguatge figurat que provenen de xarxes socials. En resum, hem demostrat la utilitat d'aprofitar diferents aspectes d'informació relativa a l'afecte i les emocions per tractar qüestions relatives a la presència d'ironia en Twitter.Hernández Farias, DI. (2017). Irony and Sarcasm Detection in Twitter: The Role of Affective Content [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90544TESISCompendi
European Handbook of Crowdsourced Geographic Information
"This book focuses on the study of the remarkable new source of geographic information that has become available in the form of user-generated content accessible over the Internet through mobile and Web applications. The exploitation, integration and application of these sources, termed volunteered geographic information (VGI) or crowdsourced geographic information (CGI), offer scientists an unprecedented opportunity to conduct research on a variety of topics at multiple scales and for diversified objectives. The Handbook is organized in five parts, addressing the fundamental questions: What motivates citizens to provide such information in the public domain, and what factors govern/predict its validity?What methods might be used to validate such information? Can VGI be framed within the larger domain of sensor networks, in which inert and static sensors are replaced or combined by intelligent and mobile humans equipped with sensing devices? What limitations are imposed on VGI by differential access to broadband Internet, mobile phones, and other communication technologies, and by concerns over privacy? How do VGI and crowdsourcing enable innovation applications to benefit human society?
Chapters examine how crowdsourcing techniques and methods, and the VGI phenomenon, have motivated a multidisciplinary research community to identify both fields of applications and quality criteria depending on the use of VGI. Besides harvesting tools and storage of these data, research has paid remarkable attention to these information resources, in an age when information and participation is one of the most important drivers of development.
The collection opens questions and points to new research directions in addition to the findings that each of the authors demonstrates. Despite rapid progress in VGI research, this Handbook also shows that there are technical, social, political and methodological challenges that require further studies and research.
Recommended from our members
The IMPED Model of Information Quality
This paper introduces a model for detecting low-quality information we refer to as the Index of Measured-diversity, Partisan-certainty, Ephemerality, and Domain (IMPED). The model purports that low-quality information is characterized by ephemerality, as opposed to quality content that is designed for permanence. The IMPED model leverages linguistic and temporal patterns in the content of social media messages and linked webpages to estimate a parametric survival model and the likelihood the content will be removed from the Internet. We review the limitations of current approaches for the detection of problematic content, including misinformation and false news, which are largely based on fact-checking and machine learning, and detail the requirements for a successful implementation of the IMPED model. The paper concludes with a review of examples taken from the 2018 election cycle and the performance of the model in identifying low-quality information as a proxy for problematic content
- …