432 research outputs found

    On the Promotion of the Social Web Intelligence

    Get PDF
    Given the ever-growing information generated through various online social outlets, analytical research on social media has intensified in the past few years from all walks of life. In particular, works on social Web intelligence foster and benefit from the wisdom of the crowds and attempt to derive actionable information from such data. In the form of collective intelligence, crowds gather together and contribute to solving problems that may be difficult or impossible to solve by individuals and single computers. In addition, the consumer insight revealed from social footprints can be leveraged to build powerful business intelligence tools, enabling efficient and effective decision-making processes. This dissertation is broadly concerned with the intelligence that can emerge from the social Web platforms. In particular, the two phenomena of social privacy and online persuasion are identified as the two pillars of the social Web intelligence, studying which is essential in the promotion and advancement of both collective and business intelligence. The first part of the dissertation is focused on the phenomenon of social privacy. This work is mainly motivated by the privacy dichotomy problem. Users often face difficulties specifying privacy policies that are consistent with their actual privacy concerns and attitudes. As such, before making use of social data, it is imperative to employ multiple safeguards beyond the current privacy settings of users. As a possible solution, we utilize user social footprints to detect their privacy preferences automatically. An unsupervised collaborative filtering approach is proposed to characterize the attributes of publicly available accounts that are intended to be private. Unlike the majority of earlier studies, a variety of social data types is taken into account, including the social context, the published content, as well as the profile attributes of users. Our approach can provide support in making an informed decision whether to exploit one\u27s publicly available data to draw intelligence. With the aim of gaining insight into the strategies behind online persuasion, the second part of the dissertation studies written comments in online deliberations. Specifically, we explore different dimensions of the language, the temporal aspects of the communication, as well as the attributes of the participating users to understand what makes people change their beliefs. In addition, we investigate the factors that are perceived to be the reasons behind persuasion by the users. We link our findings to traditional persuasion research, hoping to uncover when and how they apply to online persuasion. A set of rhetorical relations is known to be of importance in persuasive discourse. We further study the automatic identification and disambiguation of such rhetorical relations, aiming to take a step closer towards automatic analysis of online persuasion. Finally, a small proof of concept tool is presented, showing the value of our persuasion and rhetoric studies

    Beyond The Wall Street Journal: Anchoring and Comparing Discourse Signals across Genres

    Full text link
    Recent research on discourse relations has found that they are cued not only by discourse markers (DMs) but also by other textual signals and that signaling information is indicative of genres. While several corpora exist with discourse relation signaling information such as the Penn Discourse Treebank (PDTB, Prasad et al. 2008) and the Rhetorical Structure Theory Signalling Corpus (RST-SC, Das and Taboada 2018), they both annotate the Wall Street Journal (WSJ) section of the Penn Treebank (PTB, Marcus et al. 1993), which is limited to the news domain. Thus, this paper adapts the signal identification and anchoring scheme (Liu and Zeldes, 2019) to three more genres, examines the distribution of signaling devices across relations and genres, and provides a taxonomy of indicative signals found in this dataset.Comment: 10 pages. In Proceedings of 7th Workshop on Discourse Relation Parsing and Treebanking (DISRPT) at NAACL-HLT 2019, Minneapolis, M

    Weak and strong discourse markers in speech, chat and writing:Do signals compensate for ambiguity in explicit relations?

    Get PDF
    Ambiguity in discourse is pervasive, yet mechanisms of production and processing suggest that it tends to be compensated in context. The present study sets out to analyze the combination of discourse markers (such as but or moreover) with other discourse signals (such as semantic relations or punctuation marks) across three genres (discussion, chat, and essay). The presence of discourse signals is expected to vary with the ambiguity of the discourse marker and with the genre. This analysis complements recent approaches to discourse signalling by zooming in on the different types of discourse markers with which other signals combine. The corpus annotation study uncovered three categories of marker strength—weak, intermediate, and strong—thus refining the concept of “explicitness.” Statistical modeling reveals that weak discourse markers are more often compensated than intermediate and strong markers, and that this compensation is not affected by genre variation

    ChangeMyView Through Concessions: Do Concessions Increase Persuasion?

    Get PDF
    In Discourse Studies concessions are considered among those argumentative strategies that increase persuasion. We aim to empirically test this hypothesis by calculating the distribution of argumentative concessions in persuasive vs. non-persuasive comments from the the ChangeMyView subreddit. This constitutes a challenging task since concessions do not always bear an argumentative role and are expressed through polysemous lexical markers. Drawing from a theoretically-informed typology of concessions, we first conduct a crowdsourcing task to label a set of polysemous lexical markers as introducing an argumentative concession relation or not. Second, we present a self-training method to automatically identify argumentative concessions using linguistically motivated features. While we achieve a moderate F1 of 57.4% via the self-training method, our subsequent error analysis highlights that the self training method is able to generalize and identify other types of concessions that are argumentative, but were not considered in the annotation guidelines. Our findings from the manual labeling and the classification experiments indicate that the type of argumentative concessions we investigated is almost equally likely to be used in winning and losing arguments. While this result seems to contradict theoretical assumptions, we provide some reasons related to the ChangeMyView subreddit

    Measuring the coherence of normal and aphasic discourse production in Chinese using rhetorical structure theory (RST)

    Get PDF
    The study investigated the difference in discourse coherence between healthy speakers and speakers with anomic aphasia using Rhetorical Structure Theory (RST). The effect of genre types on coherence and potential factors contributing to the differences were also examined. Fifteen native Cantonese participants of anomic aphasia and their control matched in age, education and gender participated. Sixty language samples were obtained using the story-telling and sequential description tasks of the Cantonese AphasiaBank protocol. Twenty naïve listeners provided subjective ratings on the coherence, completeness, correctness of order, and clarity of each speech sample. Results demonstrated that the control group showed significantly higher production fluency, total number of discourse units, and fewer errors than the aphasia group. Controls used a richer set of relations than the aphasic group, particularly those to describe settings, to express causality, and to elaborate. The aphasic group tended to omit more essential information content and was rated with significantly lower coherence and clarity than controls. The findings suggested that speakers with anomic aphasia had reduced proportion of essential information content, lower degree of elaboration, and more structural disruptions than the controls, which may have contributed to the reduced overall discourse coherence.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Can humain association norm evaluate latent semantic analysis?

    Get PDF
    This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

    Evaluation in Discourse: a Corpus-Based Study

    Get PDF
    This paper describes the CASOAR corpus, the first manually annotated corpus that explores the impact of discourse structure on sentiment analysis with a study of movie reviews in French and in English as well as letters to the editor in French. While annotating opinions at the expression, the sentence or the document level is a well-established task and relatively straightforward, discourse annotation remains difficult, especially for non-experts. Therefore, combining both annotations poses several methodological problems that we address here. We propose a multi-layered annotation scheme that includes: the complete discourse structure according to the Segmented Discourse Representation Theory, the opinion orientation of elementary discourse units and opinion expressions, and their associated features. We detail each layer, explore the interactions between them and discuss our results. In particular, we examine the correlation between discourse and semantic category of opinion expressions, the impact of discourse relations on both subjectivity and polarity analysis and the impact of discourse on the determination of the overall opinion of a document. Our results demonstrate that discourse is an important cue for sentiment analysis, at least for the corpus genres we have studied
    corecore