540 research outputs found

    A latent variable model for viewpoint discovery from threaded forum posts

    Get PDF
    Threaded discussion forums provide an important social media platform. Its rich user generated content has served as an important source of public feedback. To automatically discover the viewpoints or stances on hot issues from forum threads is an important and useful task. In this paper, we propose a novel latent variable model for viewpoint discovery from threaded forum posts. Our model is a principled generative latent variable model which captures three important factors: viewpoint specific topic preference, user identity and user interactions. Evaluation results show that our model clearly outperforms a number of baseline models in terms of both clustering posts based on viewpoints and clustering users with different viewpoints.

    Viewpoint Discovery and Understanding in Social Networks

    Full text link
    The Web has evolved to a dominant platform where everyone has the opportunity to express their opinions, to interact with other users, and to debate on emerging events happening around the world. On the one hand, this has enabled the presence of different viewpoints and opinions about a - usually controversial - topic (like Brexit), but at the same time, it has led to phenomena like media bias, echo chambers and filter bubbles, where users are exposed to only one point of view on the same topic. Therefore, there is the need for methods that are able to detect and explain the different viewpoints. In this paper, we propose a graph partitioning method that exploits social interactions to enable the discovery of different communities (representing different viewpoints) discussing about a controversial topic in a social network like Twitter. To explain the discovered viewpoints, we describe a method, called Iterative Rank Difference (IRD), which allows detecting descriptive terms that characterize the different viewpoints as well as understanding how a specific term is related to a viewpoint (by detecting other related descriptive terms). The results of an experimental evaluation showed that our approach outperforms state-of-the-art methods on viewpoint discovery, while a qualitative analysis of the proposed IRD method on three different controversial topics showed that IRD provides comprehensive and deep representations of the different viewpoints

    Mining user viewpoints in online discussions

    Get PDF

    Modeling Interaction Features for Debate Side Clustering

    Get PDF
    Online discussion forums are popular social media platforms for users to express their opinions and discuss controversial issues with each other. To automatically identify the sides/stances of posts or users from textual content in forums is an important task to help mine online opinions. To tackle the task, it is important to exploit user posts that implicitly contain support and dispute (interaction) information. The challenge we face is how to mine such interaction information from the content of posts and how to use them to help identify stances. This paper proposes a two-stage solution based on latent variable models: an interaction feature identification stage to mine interaction features from structured debate posts with known sides and reply intentions; and a clustering stage to incorporate interaction features and model the interplay between interactions and sides for debate side clustering. Empirical evaluation shows that the learned interaction features provide good insights into user interactions and that with these features our debate side model shows significant improvement over other baseline methods. Copyright is held by the owner/author(s).EI

    Argumentation Mining in User-Generated Web Discourse

    Full text link
    The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

    Topic modelling of Finnish Internet discussion forums as a tool for trend identification and marketing applications

    Get PDF
    The increasing availability of public discussion text data on the Internet motivates to study methods to identify current themes and trends. Being able to extract and summarize relevant information from public data in real time gives rise to competitive advantage and applications in the marketing actions of a company. This thesis presents a method of topic modelling and trend identification to extract information from Finnish Internet discussion forums. The development of text analytics, and especially topic modelling techniques, is reviewed and suitable methods are identified from the literature. The Latent Dirichlet Allocation topic model and the Dynamic Topic Model are applied in finding underlying topics from the Internet discussion forum data. The discussion data collection with web scarping and text data preprocessing methods are presented. Trends are identified with a method derived from outlier detection. Real world events, such as the news about Finnish army vegetarian meal day and the Helsinki summit of presidents Trump and Putin, were identified in an unsupervised manner. Applications for marketing are considered, e.g. automatic search engine advert keyword generation and website content recommendation. Future prospects for further improving the developed topical trend identification method are proposed. This includes the use of more complex topic models, extensive framework for tuning trend identification parameters and studying the use of more domain specific text data sources such as blogs, social media feeds or customer feedback

    Online discussions through the lens of interaction patterns

    Get PDF
    Computer-mediated communication is arguably prevailing over face-to-face. However, many of the subtleties that make in-person communication personal, cues such as an ironic tone of voice or an effortless posture, are inherently impossible to render through a screen. The context vanishes from the conversation - what is left is therefore mostly text, enlivened by occasional multimedia. At least, this seems the dominant opinion of both industry and academia, that recently focused considerable resources on a deeper understanding of natural and visual language. I argue instead that richer cues are missing from online interaction only because current applications do not acknowledge them -- indeed, communication online is already infused with nonverbal codes, and the effort needed to leverage them is well worth the amount of information they carry. This dissertation therefore focuses on what is left out of the traditional definition of content: I refer to these aspects of communication as content-agnostic. Specifically, this dissertation makes three contributions. First, I formalize what constitutes content-agnostic information in computer-mediated communication, and prove content-agnostic information is as personal to each user as its offline counterpart. For this reason, I choose as a venue of research the web forum, a supposedly text-based, impersonal communication environment, and show that it is possible to attribute a message to the corresponding author solely on the basis of its content-agnostic features -- in other words, without looking at the content of the message at all. Next, I display how abundant and how varied is the content-agnostic information that lies untapped in current applications.To this end, I analyze the content-agnostic aspects of one type of interaction, the quote, and draw conclusions on how these may support discussion, signal user status, mark relationships between users, and characterize the discussion forum as a community. One interesting implication is that discussion platforms may not need to introduce new features for supporting social signals, and conversely social networks may better integrate discussion by enhancing its content-agnostic qualities. Finally, I demonstrate how content-agnostic information reveals user behavior. I focus specifically on trolls, malicious users that disrupt communities through deceptive or manipulative actions. In fact, the language of trolls blends in with that of civil users in heated discussions, which makes collecting irrefutable evidence of trolling difficult even for human moderators. Nonetheless, I show that a combination of content-agnostic and linguistic features sets apart discussions that will eventually be trolled, and reactions to trolling posts. This provides evidence of how content-agnostic information can offer a point of view on user behavior that is at the same time different from, and complementary to, that offered by the actual content of the contribution. Popular up and coming platforms, such as Snapchat, Tumblr, or Yik Yak, are increasingly abandoning persistent, threaded, text-based discussion, in favor of ephemeral, loosely structured, mixed-media content. Although the results of this dissertation are mostly drawn from discussion forums, its research frame and methods should apply directly to these other venues, and to a broad range of communication paradigms. Also, this is but a preliminary step towards a fuller understanding of what additional cues can or should complement content to overcome the limitations of computer-mediated communication

    Using Natural Language Processing to Mine Multiple Perspectives from Social Media and Scientific Literature.

    Full text link
    This thesis studies how Natural Language Processing techniques can be used to mine perspectives from textual data. The first part of the thesis focuses on analyzing the text exchanged by people who participate in discussions on social media sites. We particularly focus on threaded discussions that discuss ideological and political topics. The goal is to identify the different viewpoints that the discussants have with respect to the discussion topic. We use subjectivity and sentiment analysis techniques to identify the attitudes that the participants carry toward one another and toward the different aspects of the discussion topic. This involves identifying opinion expressions and their polarities, and identifying the targets of opinion. We use this information to represent discussions in one of two representations: discussant attitude vectors or signed attitude networks. We use data mining and network analysis techniques to analyze these representations to detect rifts in discussion groups and study how the discussants split into subgroups with contrasting opinions. In the second part of the thesis, we use linguistic analysis to mine scholars perspectives from scientific literature through the lens of citations. We analyze the text adjacent to reference anchors in scientific articles as a means to identify researchers' viewpoints toward previously published work. We propose methods for identifying, extracting, and cleaning citation text. We analyze this text to identify the purpose (author's intention) and polarity (author's sentiment) of citation. Finally, we present several applications that can benefit from this analysis such as generating multi-perspective summaries of scientific articles and predicting future prominence of publications.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99934/1/amjbara_1.pd

    Community, Conversation, and Conflict: a Study of Deliberation and Moderation in a Collaborative Political Weblog

    Get PDF
    Concerns about the feasibility of the Internet as an appropriate venue for deliberation have emerged based on the adverse effects of depersonalization, anonymity, and lack of accountability on the part of online discussants. As in face-to-face communication, participants in online conversations are best situated to determine for themselves what type of communication is appropriate. Earlier research on Usenet groups was not optimistic, but community-administered moderation may provide a valuable tool for online political discussion groups who wish to support and enforce deliberative communication among a diverse or disagreeing membership. This research examines individual comments and their rating and moderation within a week-long Pie Fight discussion about community ownership and values in the Daily Kos political blog. Specific components of deliberation were identified and a content analysis was conducted for each. Salient issues included community reputation, agreement and disagreement, meta-communication, and appropriate expression of emotion, humor, and profanity. Data subsets were analyzed in conjunction with the comment ratings given by community members to determine what types of interaction received the most attention, and how the community used the comment ratings system to promote or demote specific comment types. The use of middle versus high or low ratings, the value of varied ratings format, and the use of moderation as a low-impact means of expressing dissent were also explored. The Daily Kos community members effectively used both comments and ratings to mediate conflict, assert their desired kind of community, demonstrate a deliberative self-concept, and support specific conditions of deliberation. The moderation system was used to sanction uncivil or unproductive communication, as intended, and was also shown to facilitate deliberation of disagreement rather than creating an echo chamber of opinion
    • …
    corecore