3,018 research outputs found

    Document controversy classification based on the Wikipedia category structure

    Get PDF
    Dispute and controversy are parts of our culture and cannot be omitted on the Internet (where it becomes more anonymous). There have been many studies on controversy, especially on social networks such as Wikipedia. This free on-line encyclopedia has become a very popular data source among many researchers studying behavior or natural language processing. This paper presents using the category structure of Wikipedia to determine the controversy of a single article. This is the first part of the proposed system for classification of topic controversy score for any given text

    Automatically Characterizing Product and Process Incentives in Collective Intelligence

    Get PDF
    Social media facilitate interaction and information dissemination among an unprecedented number of participants. Why do users contribute, and why do they contribute to a specific venue? Does the information they receive cover all relevant points of view, or is it biased? The substantial and increasing importance of online communication makes these questions more pressing, but also puts answers within reach of automated methods. I investigate scalable algorithms for understanding two classes of incentives which arise in collective intelligence processes. Product incentives exist when contributors have a stake in the information delivered to other users. I investigate product-relevant user behavior changes, algorithms for characterizing the topics and points of view presented in peer-produced content, and the results of a field experiment with a prediction market framework having associated product incentives. Process incentives exist when users find contributing to be intrinsically rewarding. Algorithms which are aware of process incentives predict the effect of feedback on where users will make contributions, and can learn about the structure of a conversation by observing when users choose to participate in it. Learning from large-scale social interactions allows us to monitor the quality of information and the health of venues, but also provides fresh insights into human behavior

    Data science methods for the analysis of controversial social dedia discussions

    Get PDF
    Social media communities like Reddit and Twitter allow users to express their views on topics of their interest, and to engage with other users who may share or oppose these views. This can lead to productive discussions towards a consensus, or to contended debates, where disagreements frequently arise. Prior work on such settings has primarily focused on identifying notable instances of antisocial behavior such as hate-speech and “trolling”, which represent possible threats to the health of a community. These, however, are exceptionally severe phenomena, and do not encompass controversies stemming from user debates, differences of opinions, and off-topic content, all of which can naturally come up in a discussion without going so far as to compromise its development. This dissertation proposes a framework for the systematic analysis of social media discussions that take place in the presence of controversial themes, disagreements, and mixed opinions from participating users. For this, we develop a feature-based model to describe key elements of a discussion, such as its salient topics, the level of activity from users, the sentiments it expresses, and the user feedback it receives. Initially, we build our feature model to characterize adversarial discussions surrounding political campaigns on Twitter, with a focus on the factual and sentimental nature of their topics and the role played by different users involved. We then extend our approach to Reddit discussions, leveraging community feedback signals to define a new notion of controversy and to highlight conversational archetypes that arise from frequent and interesting interaction patterns. We use our feature model to build logistic regression classifiers that can predict future instances of controversy in Reddit communities centered on politics, world news, sports, and personal relationships. Finally, our model also provides the basis for a comparison of different communities in the health domain, where topics and activity vary considerably despite their shared overall focus. In each of these cases, our framework provides insight into how user behavior can shape a community’s individual definition of controversy and its overall identity.Social-Media Communities wie Reddit und Twitter ermöglichen es Nutzern, ihre Ansichten zu eigenen Themen zu äußern und mit anderen Nutzern in Kontakt zu treten, die diese Ansichten teilen oder ablehnen. Dies kann zu produktiven Diskussionen mit einer Konsensbildung führen oder zu strittigen Auseinandersetzungen über auftretende Meinungsverschiedenheiten. Frühere Arbeiten zu diesem Komplex konzentrierten sich in erster Linie darauf, besondere Fälle von asozialem Verhalten wie Hassrede und "Trolling" zu identifizieren, da diese eine Gefahr für die Gesprächskultur und den Wert einer Community darstellen. Die sind jedoch außergewöhnlich schwerwiegende Phänomene, die keinesfalls bei jeder Kontroverse auftreten die sich aus einfachen Diskussionen, Meinungsverschiedenheiten und themenfremden Inhalten ergeben. All diese Reibungspunkte können auch ganz natürlich in einer Diskussion auftauchen, ohne dass diese gleich den ganzen Gesprächsverlauf gefährden. Diese Dissertation stellt ein Framework für die systematische Analyse von Social-Media Diskussionen vor, die vornehmlich von kontroversen Themen, strittigen Standpunkten und Meinungsverschiedenheiten der teilnehmenden Nutzer geprägt sind. Dazu entwickeln wir ein Feature-Modell, um Schlüsselelemente einer Diskussion zu beschreiben. Dazu zählen der Aktivitätsgrad der Benutzer, die Wichtigkeit der einzelnen Aspekte, die Stimmung, die sie ausdrückt, und das Benutzerfeedback. Zunächst bauen wir unser Feature-Modell so auf, um bei Diskussionen gegensätzlicher politischer Kampagnen auf Twitter die oben genannten Schlüsselelemente zu bestimmen. Der Schwerpunkt liegt dabei auf den sachlichen und emotionalen Aspekten der Themen im Bezug auf die Rollen verschiedener Nutzer. Anschließend erweitern wir unseren Ansatz auf Reddit-Diskussionen und nutzen das Community-Feedback, um einen neuen Begriff der Kontroverse zu definieren und Konversationsarchetypen hervorzuheben, die sich aus Interaktionsmustern ergeben. Wir nutzen unser Feature-Modell, um ein Logistischer Regression Verfahren zu entwickeln, das zukünftige Kontroversen in Reddit-Communities in den Themenbereichen Politik, Weltnachrichten, Sport und persönliche Beziehungen vorhersagen kann. Schlussendlich bietet unser Modell auch die Grundlage für eine Vergleichbarkeit verschiedener Communities im Gesundheitsbereich, auch wenn dort die Themen und die Nutzeraktivität, trotz des gemeinsamen Gesamtfokus, erheblich variieren. In jedem der genannten Themenbereiche gibt unser Framework Erkenntnisgewinne, wie das Verhalten der Nutzer die spezifisch Definition von Kontroversen der Community prägt

    First Women, Second Sex: Gender Bias in Wikipedia

    Full text link
    Contributing to history has never been as easy as it is today. Anyone with access to the Web is able to play a part on Wikipedia, an open and free encyclopedia. Wikipedia, available in many languages, is one of the most visited websites in the world and arguably one of the primary sources of knowledge on the Web. However, not everyone is contributing to Wikipedia from a diversity point of view; several groups are severely underrepresented. One of those groups is women, who make up approximately 16% of the current contributor community, meaning that most of the content is written by men. In addition, although there are specific guidelines of verifiability, notability, and neutral point of view that must be adhered by Wikipedia content, these guidelines are supervised and enforced by men. In this paper, we propose that gender bias is not about participation and representation only, but also about characterization of women. We approach the analysis of gender bias by defining a methodology for comparing the characterizations of men and women in biographies in three aspects: meta-data, language, and network structure. Our results show that, indeed, there are differences in characterization and structure. Some of these differences are reflected from the off-line world documented by Wikipedia, but other differences can be attributed to gender bias in Wikipedia content. We contextualize these differences in feminist theory and discuss their implications for Wikipedia policy.Comment: 10 pages, ACM style. Author's version of a paper to be presented at ACM Hypertext 201

    The relationship of (perceived) epistemic cognition to interaction with resources on the internet

    Get PDF
    Information seeking and processing are key literacy practices. However, they are activities that students, across a range of ages, struggle with. These information seeking processes can be viewed through the lens of epistemic cognition: beliefs regarding the source, justification, complexity, and certainty of knowledge. In the research reported in this article we build on established research in this area, which has typically used self-report psychometric and behavior data, and information seeking tasks involving closed-document sets. We take a novel approach in applying established self-report measures to a large-scale, naturalistic, study environment, pointing to the potential of analysis of dialogue, web-navigation – including sites visited – and other trace data, to support more traditional self-report mechanisms. Our analysis suggests that prior work demonstrating relationships between self-report indicators is not paralleled in investigation of the hypothesized relationships between self-report and trace-indicators. However, there are clear epistemic features of this trace data. The article thus demonstrates the potential of behavioral learning analytic data in understanding how epistemic cognition is brought to bear in rich information seeking and processing tasks

    Next-Generation Media: The Global Shift

    Get PDF
    For over a decade the Aspen Institute Communications and Society Program has convened its CEO-level Forum on Communications and Society (FOCAS) to address specific issues relating to the impact of communications media on societal institutions and values. These small, invitation-only roundtables have addressed educational, democratic, and international issues with the aim of making recommendations to policy-makers, businesses and other institutions to improve our society through policies and actions in the information and communications sectors.In the summer of 2006 the forum took a different turn. It is clear there is a revolution affecting every media business, every consumer or user of media, and every institution affected by media. In a word, everyone. FOCAS sought to define the paradigm changes underway in the media, and to identify some of the significant repercussions of those changes on society."Next Generation Media" was a three-day meeting among leaders from new media (e.g., Google, craigslist, and Second Life) and mainstream media (e.g., The New York Times and Time), from business, government, academia and the non-profit sector, all seeking a broad picture of where the digital revolution is taking us.This report of the meeting, concisely and deftly written by Richard Adler, a longtime consultant in the field, weaves insights and anecdotes from the roundtable into a coherent document supplemented with his own research and data to form an accessible, coherent treatment of this very topical subject.The specific goals of the 2006 forum were to examine the profound changes ahead for the media industries, advertisers, consumers and users in the new attention economy; to understand how the development and delivery of content are creating new business models for commercial and non-commercial media; and to assess the impact of these developments on global relations, citizenship and leadership.The report thus examines the growth of the Internet and its effect on a rapidly changing topic: the impact of new media on politics, business, society, culture, and governments the world over. The report also sheds light on how traditional media will need to adapt to face the competition of the next generation media.Beginning, as the Forum did, with data from Jeff Cole's Center for the Digital Future at the University of Southern California, Adler documents the increasing popularity of the Internet for information, entertainment and communication. Users are increasingly generating and contributing content to the web and connecting to social networks. They are posting comments, uploading pictures, sharing videos, blogging and vlogging, chatting through instant messages or voice over Internet (VoIP), or emailing friends, business colleagues, neighbors and even strangers. As Cole observes, "Traditional media informed people but didn't empower them." New media do.The report describes three of the Internet's most successful ventures -- Wikipedia, Second Life, and craigslist. Wikipedia is a prime example of how an Internet platform allows its users to generate content and consume it. As a result of "wiki" software technology anyone can contribute or edit existing information free of cost. Second Life, a virtual world, sells virtual real estate where subscribers, in avatar form, can conduct conversations, go to lectures, even create a business. Craigslist, a predominantly free online classified site with listings in every major city in the United States, has become so popular that it is posing a significant threat to newspapers as it competes with their classified ad revenues.As a result of these and other new media phenomena, not the least being Google and Yahoo, print publications are wrestling with new business models that could entail fundamentally restructuring the way they operate. For instance, reporters are now expected to report a story on multiple media platforms and discuss them online with readers. Newspaper publisher Gannett is exploring the incorporation of usergenerated news or "citizen-journalism" into its news pages.In an era of abundant choices marketers have an even greater challenge to figure out how best to appeal to consumers. The report explores how marketers, e.g., of Hollywood movies or pomegranate juice, are moving from traditional or mainstream media to viral and other marketing techniques.For much of the world, the mobile phone rather than the computer is the most important communications device. Users depend on their phones to send and receive messages, pictures, and download information rather than just talk. In developing countries mobile phones are having an exceptional impact, penetrating regions which are not being serviced by land lines. Thus we are seeing new uses daily for this increased connectivity, from reporting election results in emerging democracies to opposing authoritarian governments in order to bring about new democracies.Meanwhile, the report discusses the need for the United States to develop a new form of public diplomacy rather than the traditional top-down approach to communicating to foreign citizens. This topic has been a recurring theme at FOCAS conferences the past few years, this year calling for more citizen diplomacy -- that is, more person-toperson contact across borders through uses of the new media. Indeed, Peter Hirshberg suggested that American leaders should listen more to the outside world to effectively manage what he called "Brand America."Finally, after acknowledging the detrimental effects that new technologies can bring about, the report discusses what role those technologies could play in expanding freedom and opportunity for the next generation. As a conclusion, FOCAS co-chair Marc Nathanson proposed adding a ninth goal to the United Nations Millennium Goals, namely, "to provide access to appropriate new technologies.

    When in doubt ask the crowd : leveraging collective intelligence for improving event detection and machine learning

    Get PDF
    [no abstract
    • …
    corecore