Search CORE

1,060 research outputs found

Nowcasting user behaviour with social media and smart devices on a longitudinal basis: from macro- to micro-level modelling

Author: Tsakalidis Adam
Publication venue
Publication date: 01/09/2018
Field of study

The adoption of social media and smart devices by millions of users worldwide over the last decade has resulted in an unprecedented opportunity for NLP and social sciences. Users publish their thoughts and opinions on everyday issues through social media platforms, while they record their digital traces through their smart devices. Mining these rich resources offers new opportunities in sensing real-world events and indices (e.g., political preference, mental health indices) in a longitudinal fashion, either at the macro (population)-, or at the micro(user)-level. The current project aims at developing approaches to “nowcast" (predict the current state of) such indices at both levels of granularity. First, we build natural language resources for the static tasks of sentiment analysis, emotion disclosure and sarcasm detection over user-generated content. These are important for opinion monitoring on a large scale. Second, we propose a general approach that leverages textual data derived from generic social media streams to nowcast political indices at the macro-level. Third, we leverage temporally sensitive and asynchronous information to nowcast the political stance of social media users, at the micro-level using multiple kernel learning. We then focus further on the micro-level modelling, to account for heterogeneous data sources, such as information derived from users' smart phones, SMS and social media messages, to nowcast time-varying mental health indices of a small cohort of users on a longitudinal basis. Finally, we present the challenges faced when applying such micro-level approaches in a real-world setting and propose directions for future research

Warwick Research Archives Portal Repository

Multilingual sentiment analysis in social media.

Author: San Vicente Roncal Iñaki
Publication venue
Publication date: 01/01/2019
Field of study

252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Recommended from our members

Sentiment Analysis for the Low-Resourced Latinised Arabic "Arabizi"

Author: Tobaili Taha
Publication venue
Publication date: 02/11/2020
Field of study

The expansion of digital communication mediums from private mobile messaging into the public through social media presented an opportunity for the data science research and industry to mine the generated big data for artificial information extraction. A popular information extraction task is sentiment analysis, which aims at extracting polarity opinions, positive, negative, or neutral, from the written natural language. This science helped organisations better understand the public’s opinion towards events, news, public figures, and products. However, sentiment analysis has advanced for the English language ahead of Arabic. While sentiment analysis for Arabic is developing in the literature of Natural Language Processing (NLP), a popular variety of Arabic, Arabizi, has been overlooked for sentiment analysis advancements. Arabizi is an informal transcription of the spoken dialectal Arabic in Latin script used for social texting. It is known to be common among the Arab youth, yet it is overlooked in efforts on Arabic sentiment analysis for its linguistic complexities. As to Arabic, Arabizi is rich in inflectional morphology, but also codeswitched with English or French, and distinctively transcribed without adhering to a standard orthography. The rich morphology, inconsistent orthography, and codeswitching challenges are compounded together to have a multiplied effect on the lexical sparsity of the language, where each Arabizi word becomes eligible to be spelled in many ways, that, in addition to the mixing of other languages within the same textual context. The resulting high degree of lexical sparsity defies the very basics of sentiment analysis, classification of positive and negative words. Arabizi is even faced with a severe shortage of data resources that are required to set out any sentiment analysis approach. In this thesis, we tackle this gap by conducting research on sentiment analysis for Arabizi. We addressed the sparsity challenge by harvesting Arabizi data from multi-lingual social media text using deep learning to build Arabizi resources for sentiment analysis. We developed six new morphologically and orthographically rich Arabizi sentiment lexicons and set the baseline for Arabizi sentiment analysis on social media

Open Research Online (The Open University)

Adaptive sentiment analysis

Author: Mudinas Andrius
Publication venue
Publication date
Field of study

Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can — in a nearlyunsupervised manner—adapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal dynamics of sentiment in such contexts

Birkbeck Institutional Research Online

Multilingual sentiment analysis in social media.

Author: San Vicente Roncal Iñaki
Publication venue
Publication date: 11/03/2019
Field of study

Archivo Digital para la Docencia y la Investigación

On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages

Author: Agarwal
Arora
Baccianella
Bae
Bakliwal
Batista
Bird
Boiy
Brill
Brooke
Brown
Campos
Carter
Choi
Costa-jussà
Devitt
Feldman
Foster
Gamon
Gimpel
Greene
Guo
Hall
Han
He
Heerschop
Jiang
Joshi
Kaufmann
Kennedy
Kumar
Li
Liu
Martínez-Cámara
Miller
Mitchell
Moilanen
Montejo-Ráez
Nakagawa
Nivre
Nivre
Pak
Pang
Pang
Pang
Pennebaker
Perea-Ortega
Petrov
Platt
Ramírez-Esparza
Shaikh
Sidorov
Socher
Taboada
Taulé
Thelwall
Thelwall
Thelwall
Trnavac
Turney
Turney
Vilares
Vilares
Vilares
Villena-Román
Villena-Román
Wang
Wu
Zhang
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

A Comprehensive Survey on Word Representation Models: From Classical to State-Of-The-Art Word Representation Language Models

Author: Khan Shah Khalid
Naseem Usman
Prasad Mukesh
Razzak Imran
Publication venue
Publication date: 28/10/2020
Field of study

Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP related tasks. In the end, this survey briefly discusses the commonly used ML and DL based classifiers, evaluation metrics and the applications of these word embeddings in different NLP tasks

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

ResearchOnline at James Cook University

Data analytics 2016: proceedings of the fifth international conference on data analytics

Author: Bhulai Sandjai
Semanjski Ivana
Publication venue: The International Academy, Research and Industry Association
Publication date: 01/01/2016
Field of study

VU Research Portal

Ghent University Academic Bibliography

Graph-based approaches for semi-supervised and cross-domain sentiment analysis

Author: Ponomareva Natalia
Publication venue: University of Wolverhampton
Publication date: 01/01/2014
Field of study

A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of PhilosophyThe rapid development of Internet technologies has resulted in a sharp increase in the number of Internet users who create content online. Usergenerated content often represents people's opinions, thoughts, speculations and sentiments and is a valuable source of information for companies, organisations and individual users. This has led to the emergence of the eld of sentiment analysis, which deals with the automatic extraction and classi cation of sentiments expressed in texts. Sentiment analysis has been intensively researched over the last ten years, but there are still many issues to be addressed. One of the main problems is the lack of labelled data necessary to carry out precise supervised sentiment classi cation. In response, research has moved towards developing semi-supervised and crossdomain techniques. Semi-supervised approaches still need some labelled data and their e ectiveness is largely determined by the amount of these data, whereas cross-domain approaches usually perform poorly if training data are very di erent from test data. The majority of research on sentiment classi cation deals with the binary classi cation problem, although for many practical applications this rather coarse sentiment scale is not su cient. Therefore, it is crucial to design methods which are able to perform accurate multiclass sentiment classi cation. iii The aims of this thesis are to address the problem of limited availability of data in sentiment analysis and to advance research in semi-supervised and cross-domain approaches for sentiment classi cation, considering both binary and multiclass sentiment scales. We adopt graph-based learning as our main method and explore the most popular and widely used graph-based algorithm, label propagation. We investigate various ways of designing sentiment graphs and propose a new similarity measure which is unsupervised, easy to compute, does not require deep linguistic analysis and, most importantly, provides a good estimate for sentiment similarity as proved by intrinsic and extrinsic evaluations. The main contribution of this thesis is the development and evaluation of a graph-based sentiment analysis system that a) can cope with the challenges of limited data availability by using semi-supervised and crossdomain approaches b) is able to perform multiclass classi cation and c) achieves highly accurate results which are superior to those of most stateof- the-art semi-supervised and cross-domain systems. We systematically analyse and compare semi-supervised and cross-domain approaches in the graph-based framework and propose recommendations for selecting the most pertinent learning approach given the data available. Our recommendations are based on two domain characteristics, domain similarity and domain complexity, which were shown to have a signi cant impact on semi-supervised and cross-domain performance

Wolverhampton Intellectual Repository and E-theses