16 research outputs found
Sentiment Lexicon Adaptation with Context and Semantics for the Social Web
Sentiment analysis over social streams offers governments and organisations a fast and effective way to monitor the publics' feelings towards policies, brands, business, etc. General purpose sentiment lexicons have been used to compute sentiment from social streams, since they are simple and effective. They calculate the overall sentiment of texts by using a general collection of words, with predetermined sentiment orientation and strength. However, words' sentiment often vary with the contexts in which they appear, and new words might be encountered that are not covered by the lexicon, particularly in social media environments where content emerges and changes rapidly and constantly. In this paper, we propose a lexicon adaptation approach that uses contextual as well as semantic information extracted from DBPedia to update the words' weighted sentiment orientations and to add new words to the lexicon. We evaluate our approach on three different Twitter datasets, and show that enriching the lexicon with contextual and semantic information improves sentiment computation by 3.4% in average accuracy, and by 2.8% in average F1 measure
Multidimensional opinion mining from social data
Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This thesis focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm, and irony, from user-generated content represented across multiple social media platforms and in various media formats, like textual, visual, and audio. Mining people’s social opinions from social sources, such as social media platforms and newswires commenting
sections, is a valuable business asset that can be utilised in many ways and in multiple domains, such as Politics, Finance, and Government. The main objective of this research is to investigate how a multidimensional approach to Social Opinion Mining affects fine-grained opinion search and summarisation at an aspect-based level and whether such a multidimensional approach outperforms single dimension approaches in the context of an extrinsic human evaluation conducted in a real-world context: the Malta Government Budget, where five social opinion dimensions are taken into consideration, namely subjectivity, sentiment polarity, emotion, irony, and sarcasm. This human evaluation determines whether the multidimensional opinion summarisation results provide added-value to potential end-users, such as policy-makers and decision-takers, thereby providing a nuanced voice to the general public on their social opinions on topics of a national importance. Results obtained indicate that a more fine-grained aspect-based opinion summary based on the combined dimensions of subjectivity, sentiment polarity, emotion, and sarcasm or
irony is more informative and more useful than one based on sentiment polarity only. This research contributes towards the advancement of intelligent search and information retrieval from social data and impacts entities utilising Social Opinion Mining results towards effective policy formulation, policy-making, decision-making, and decision-taking at
a strategic level
Recommended from our members
Sentiment analysis of dialectical Arabic social media content using a hybrid linguistic-machine learning approach
Despite the enormous increase in the number of Arabic posts on social networks, the sentiment analysis research into extracting opinions from these posts lags behind that for the English language. This is largely attributed to the challenges in processing the morphologically complex Arabic natural language and the scarcity of Arabic NLP tools and resources. This complex task is further exacerbated when analysing dialectal Arabic that do not abide by the formal grammatical structure. Based on the semantic modelling of the target domain’s knowledge and multi-factor lexicon-based sentiment analysis, the intent of this research is to use a hybrid approach, integrating linguistic and machine learning methods for sentiment analysis classification of dialectal Arabic. First, a dataset of dialectal Arabic tweets was collected focusing on the unemployment domain, which is annotated manually. The tweets cover different dialectal Arabic in Saudi Arabia for which a comprehensive Arabic sentiment lexicon was constructed. This approach to sentiment analysis also integrated a novel light stemming mechanism towards improved Saudi dialectal Arabic stemming. Subsequently, a novel multi-factor lexicon-based sentiment analysis algorithm was developed for domain-specific social media posts written in dialectal Arabic. The algorithm considers several factors (emoji, intensifiers, negations, supplications) to improve the accuracy of the classifications. Applying this model to a central problem of sentiment analysis in dialectical Arabic, these operational techniques were deployed in order to assess analytical performance across social media channels which are vulnerable to semantic and colloquial variations. Finally, this study presented a new hybrid approach to sentiment analysis where domain knowledge is utilised in two methods to combine computational linguistics and machine learning; the first method integrates the problem domain semantic knowledgebase in the machine learning training features set, while the second uses the outcome of the lexicon-based sentiment classification in the training of the machine learning methods. By integrating these techniques into a single, hybridised solution, a greater degree of accuracy and consistency was achieved than applying each approach independently, confirming a pragmatic solution to sentiment classification in dialectical Arabic text
Automated Assessment of the Aftermath of Typhoons Using Social Media Texts
Disasters are one of the major threats to economics and human societies, causing substantial losses of human lives, properties and infrastructures. It has been our persistent endeavors to understand, prevent and reduce such disasters, and the popularization of social media is offering new opportunities to enhance disaster management in a crowd-sourcing approach. However, social media data is also characterized by its undue brevity, intense noise, and informality of language. The existing literature has not completely addressed these disadvantages, otherwise vast manual efforts are devoted to tackling these problems.
The major focus of this research is on constructing a holistic framework to exploit social media data in typhoon damage assessment. The scope of this research covers data collection, relevance classification, location extraction and damage assessment while assorted approaches are utilized to overcome the disadvantages of social media data. Moreover, a semi-supervised or unsupervised approach is prioritized in forming the framework to minimize manual intervention.
In data collection, query expansion strategy is adopted to optimize the search recall of typhoon-relevant information retrieval. Multiple filtering strategies are developed to screen the keywords and maintain the relevance to search topics in the keyword updates. A classifier based on a convolutional neural network is presented for relevance classification, with hashtags and word clusters as extra input channels to augment the information. In location extraction, a model is constructed by integrating Bidirectional Long Short-Time Memory and Conditional Random Fields. Feature noise correction layers and label smoothing are leveraged to handle the noisy training data. Finally, a multi-instance multi-label classifier identifies the damage relations in four categories, and the damage categories of a message are integrated with the damage descriptions score to obtain damage severity score for the message.
A case study is conducted to verify the effectiveness of the framework. The outcomes indicate that the approaches and models developed in this study significantly improve in the classification of social media texts especially under the framework of semi-supervised or unsupervised learning. Moreover, the results of damage assessment from social media data are remarkably consistent with the official statistics, which demonstrates the practicality of the proposed damage scoring scheme
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
First steps in the study of cyber-psycho-cognitive operations
Dissertação (mestrado)—Universidade de Brasília, Instituto de Relações Internacionais, Programa de Pós-Graduação em Relações Internacionais, 2019.O presente trabalho é uma análise dos mecanismos informáticos e tecno-comunicacionais envolvidos na articulação de mundos da vida orientados estrategicamente para estimular, prever ou minar o desenvolvimento das condições psico-cognitivas adequadas para a construção e sustento da legitimidade racional de uma autoridade ou ação política. A aplicação de instrumentos “arqueológicos” Foucauldianos ao estudo das narrativas políticas que engendraram e surgiram de “Russiagate” permitiu situar a teoria num contexto histórico e validar a premissa da convergência e incorporação de tendências de agendamento comuns e de práticas típicas de operações psicológicas tradicionais. Contudo, os efeitos tanto da disponibilidade comercial das TICs com capacidade de “deep learning”, quanto da estruturação baseada em conhecimento permitida pela ubiquidade e centralidade econômica dessas tecnologias, tornam o conjunto de mecanismos analisados num fenômeno que merece uma conceptualização e marco investigativo únicos. A obra é uma contribuição a esse empreendimento.This is an analysis of the ICT-based mechanisms involved in the articulation of lifeworlds that are strategically oriented to foster, prevent or undermine the development of psycho-cognitive conditions adequate for the construction or sustainability of an authority’s or a political action’s rational legitimacy. While grounding theory to a historical context, the application of Foucauldian “archeological” instruments to the study of the political narratives giving birth and springing from “Russiagate” also served to validate the premised convergence and incorporation of common agenda-setting trends and practices typical of traditional psychological operations. However, the effects of both the commercial availability of deep-learning ICTs and the cognition-based structuration afforded by their ubiquity and economic centrality set this “dispositif” apart, thereby deserving a unique conceptualization and research framework. This study is a contribution to such endeavor
Tune your brown clustering, please
Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal
On Measuring Social Dynamics of Online Social Media
Due to the complex nature of human behaviour and to our inability
to directly measure thoughts and feelings, social psychology has
long struggled for empirical grounding for its theories and
models. Traditional techniques involving groups of people in
controlled environments are limited to small numbers and may not
be a good analogue for real social interactions in natural
settings due to their controlled and artificial nature. Their
application as a foundation for simulation of social processes
suffers similarly.
The proliferation of online social media offers new opportunities
to observe social phenomena “in the wild” that have only just
begun to be realised. To date, analysis of social media data has
been largely focussed on specific, commercially relevant goals
(such as sentiment analysis) that are of limited use to social
psychology, and the dynamics critical to an understanding of
social processes is rarely addressed or even present in collected
data.
This thesis addresses such shortfalls by: (i) presenting a novel
data collection strategy and system for rich dynamic data from
communities operating on Twitter; (ii) a data set encompassing
longitudinal dynamic information over two and a half years from
the online pro-ana (pro-anorexia) movement; and (iii) two
approaches to identifying active social psychological processes
in collections of online text and network metadata: an approach
linking traditional psychometric studies with topic models and an
algorithm combining community detection in user networks with
topic models of the social media text they generate, enabling
identification of community specific topic usage