Search CORE

3,424 research outputs found

More blogging features for author identification

Author: Ahmed Amr
Mohtasseb Haytham
Publication venue
Publication date: 01/01/2009
Field of study

In this paper we present a novel improvement in the field of authorship identification in personal blogs. The improvement in authorship identification, in our work, is by utilizing a hybrid collection of linguistic features that best capture the style of users in diaries blogs. The features sets contain LIWC with its psychology background, a collection of syntactic features & part-of-speech (POS), and the misspelling errors features. Furthermore, we analyze the contribution of each feature set on the final result and compare the outcome of using different combination from the selected feature sets. Our new categorization of misspelling words which are mapped into numerical features, are noticeably enhancing the classification results. The paper also confirms the best ranges of several parameters that affect the final result of authorship identification such as the author numbers, words number in each post, and the number of documents/posts for each author/user. The results and evaluation show that the utilized features are compact, while their performance is highly comparable with other much larger feature sets

University of Lincoln Institutional Repository

CiteSeerX

Edge Hill University Research Information Repository

Classification of the Stance in Online Debates Using the Dependency Relations Feature

Author: Aung Sint Sint
Kyaw Thiri
Publication venue: 'International Journal of Computer Engineering and Applications'
Publication date: 20/07/2020
Field of study

Online discussion forums offer Internet users a medium for discussions about current political debates. The debate is a system of claims regarding interactivity and representation. Users make claims in an online discussion with superior content to support their position. Factual accuracy and emotional appeal are critical attributes used to convince readers. A key challenge in debate forums is to identify the participants’ stance, each of which is inter-dependent and inter-connected. This research work aims to construct a classifier that takes the linguistic features of the posts as input and outputs predictions for the stance label of each post. Three types of features which include Lexical, Dependency, and Morphology are used to detect the stance of the posts. Lexical features such as cue words are employed as surface features, and deep features include dependency and morphology features. Multinomial Naïve Bayes classifier is used to build a model for classifying stance and the Chi-Square method is used to select the good feature set. The performance of the stance classification system is evaluated in terms of accuracy. The result of stance labels for this proposed research represents as for and against by analyzing the surface and deep features that capture the content of a post

International Journal of Computer (IJC - Global Society of Scientific Research and Researchers, GSSRR)

Lightme: Analysing Language in Internet Support Groups for Mental Health

Author: Ferraro Gabriela
Gee Brendan Loo
Ji Shenjia
Salvador-Carulla Luis
Publication venue
Publication date: 02/07/2020
Field of study

Background: Assisting moderators to triage harmful posts in Internet Support Groups is relevant to ensure its safe use. Automated text classification methods analysing the language expressed in posts of online forums is a promising solution. Methods: Natural Language Processing and Machine Learning technologies were used to build a triage post classifier using a dataset from Reachout mental health forum for young people. Results: When comparing with the state-of-the-art, a solution mainly based on features from lexical resources, received the best classification performance for the crisis posts (52%), which is the most severe class. Six salient linguistic characteristics were found when analysing the crisis post; 1) posts expressing hopelessness, 2) short posts expressing concise negative emotional responses, 3) long posts expressing variations of emotions, 4) posts expressing dissatisfaction with available health services, 5) posts utilising storytelling, and 6) posts expressing users seeking advice from peers during a crisis. Conclusion: It is possible to build a competitive triage classifier using features derived only from the textual content of the post. Further research needs to be done in order to translate our quantitative and qualitative findings into features, as it may improve overall performance

arXiv.org e-Print Archive

PubMed Central

University of Canberra Research Repository

Argumentation Mining in User-Generated Web Discourse

Author: Gurevych Iryna
Habernal Ivan
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2015
Field of study

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

arXiv.org e-Print Archive

TUbiblio

Crossref

Directory of Open Access Journals

TUdatalib Repository (TU Darmstadt)

Collective emotions online and their influence on community life

Author: A Chmiel
A Chmiel
A Chmiel
A Czaplicka
A Kappas
A Tumasjan
A-L Barabási
A-L Barabási
AJ Gerber
Anna Chmiel
Arvid Kappas
Attila Szolnoki
B Kujawski
B Pang
BA Huberman
C Castellano
C Darwin
C Macdonald
F Radicchi
F Schweitzer
F Sebastiani
G Paltoglou
G Paltoglou
Georgios Paltoglou
H Rheingold
J Posner
J Suler
J Walther
J-P Onnela
Janusz A. Hołyst
Julian Sienkiewicz
Kevan Buckley
LA Feldman
M Gamon
M Mitrović
M Mitrović
M Mitrović
M Skowron
M Szell
M Taboada
Mike Thelwall
NH Frijda
P Krapivsky
P Krapivsky
P Sobkowicz
PJ Lang
PS Dodds
R Reisenzein
RB Zajonc
Riloff E
RIM Dunbar
S Gobron
SH Hemenover
T Wilson
W James
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/07/2011
Field of study

E-communities, social groups interacting online, have recently become an object of interdisciplinary research. As with face-to-face meetings, Internet exchanges may not only include factual information but also emotional information - how participants feel about the subject discussed or other group members. Emotions are known to be important in affecting interaction partners in offline communication in many ways. Could emotions in Internet exchanges affect others and systematically influence quantitative and qualitative aspects of the trajectory of e-communities? The development of automatic sentiment analysis has made large scale emotion detection and analysis possible using text messages collected from the web. It is not clear if emotions in e-communities primarily derive from individual group members' personalities or if they result from intra-group interactions, and whether they influence group activities. We show the collective character of affective phenomena on a large scale as observed in 4 million posts downloaded from Blogs, Digg and BBC forums. To test whether the emotions of a community member may influence the emotions of others, posts were grouped into clusters of messages with similar emotional valences. The frequency of long clusters was much higher than it would be if emotions occurred at random. Distributions for cluster lengths can be explained by preferential processes because conditional probabilities for consecutive messages grow as a power law with cluster length. For BBC forum threads, average discussion lengths were higher for larger values of absolute average emotional valence in the first ten comments and the average amount of emotion in messages fell during discussions. Our results prove that collective emotional states can be created and modulated via Internet communication and that emotional expressiveness is the fuel that sustains some e-communities.Comment: 23 pages including Supporting Information, accepted to PLoS ON

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

What Goes Around Comes Around: Learning Sentiments in Online Medical Forums

Author: A Chmiel
A Esposito
A Weichselbraun
ACR Tsai
C Chew
C Hung
CA Smith
D Osman
DJ Stanley
E Cambria
E Mantikou
E Sillence
G Jurman
H Baayen
Janyce Wiebe
JW Pennebaker
M Skowron
M Taboada
M Thelwall
Marina Sokolova
Michael Oakes
N Zillen
P Biyani
P Dodds
P Ekman
Q Cai
R Xia
S Malik
T Nichols
T Pantasri
Victoria Bobicev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/04/2015
Field of study

Currently 19%-28% of Internet users participate in online health discussions. A 2011 survey of the US population estimated that 59% of all adults have looked online for information about health topics such as a specific disease or treatment. Although empirical evidence strongly supports the importance of emotions in health-related messages, there are few studies of the relationship between a subjective lan-guage and online discussions of personal health. In this work, we study sentiments expressed on online medical forums. As well as considering the predominant sentiments expressed in individual posts, we analyze sequences of sentiments in online discussions. Individual posts are classified into one of five categories. We identified three categories as sentimental (encouragement, gratitude, confusion) and two categories as neutral (facts, endorsement). 1438 messages from 130 threads were annotated manually by two annotators with a strong inter-annotator agreement (Fleiss kappa = 0.737 and 0.763 for posts in se-quence and separate posts respectively). The annotated posts were used to analyse sentiments in consec-utive posts. In four multi-class classification problems, we assessed HealthAffect, a domain-specific af-fective lexicon, as well general sentiment lexicons in their ability to represent messages in sentiment recognition

Crossref

Wolverhampton Intellectual Repository and E-theses

Mining online diaries for blogger identification

Author: Ahmed Amr
Mohtasseb Haytham
Publication venue
Publication date: 01/01/2009
Field of study

In this paper, we present an investigation of authorship identification on personal blogs or diaries, which are different from other types of text such as essays, emails, or articles based on the text properties. The investigation utilizes couple of intuitive feature sets and studies various parameters that affect the identification performance. Many studies manipulated the problem of authorship identification in manually collected corpora, but only few utilized real data from existing blogs. The complexity of the language model in personal blogs is motivating to identify the correspondent author. The main contribution of this work is at least three folds. Firstly, we utilize the LIWC and MRC feature sets together, which have been developed with Psychology background, for the first time for authorship identification on personal blogs. Secondly, we analyze the effect of various parameters, and feature sets, on the identification performance. This includes the number of authors in the data corpus, the post size or the word count, and the number of posts for each author. Finally, we study applying authorship identification over a limited set of users that have a common personality attributes. This analysis is motivated by the lack of standard or solid recommendations in literature for such task, especially in the domain of personal blogs. The results and evaluation show that the utilized features are compact while their performance is highly comparable with other larger feature sets. The analysis also confirmed the most effective parameters, their ranges in the data corpus, and the usefulness of the common users classifier in improving the performance, for the author identification task

University of Lincoln Institutional Repository

CiteSeerX

Directory of Open Access Journals

Edge Hill University Research Information Repository

Proceedings of the LREC 2020 workshop on Resources and Techniques for User and Author Profiling in Abusive Language (ResT-UP 2020)

Author: di Buono Maria Pia
MANNA RAFFAELE
MONTI JOHANNA
PASCUCCI ANTONIO
Sara Tonelli
Valerio Basile
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

Università degli Studi di Napoli L'Orientale: CINECA IRIS