38,292 research outputs found

    Growing Story Forest Online from Massive Breaking News

    Full text link
    We describe our experience of implementing a news content organization system at Tencent that discovers events from vast streams of breaking news and evolves news story structures in an online fashion. Our real-world system has distinct requirements in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we 1) need to accurately and quickly extract distinguishable events from massive streams of long text documents that cover diverse topics and contain highly redundant information, and 2) must develop the structures of event stories in an online manner, without repeatedly restructuring previously formed stories, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. We conducted extensive evaluation based on 60 GB of real-world Chinese news data, although our ideas are not language-dependent and can easily be extended to other languages, through detailed pilot user experience studies. The results demonstrate the superior capability of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers, compared to multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page

    Satirical News Detection and Analysis using Attention Mechanism and Linguistic Features

    Full text link
    Satirical news is considered to be entertainment, but it is potentially deceptive and harmful. Despite the embedded genre in the article, not everyone can recognize the satirical cues and therefore believe the news as true news. We observe that satirical cues are often reflected in certain paragraphs rather than the whole document. Existing works only consider document-level features to detect the satire, which could be limited. We consider paragraph-level linguistic features to unveil the satire by incorporating neural network and attention mechanism. We investigate the difference between paragraph-level features and document-level features, and analyze them on a large satirical news dataset. The evaluation shows that the proposed model detects satirical news effectively and reveals what features are important at which level.Comment: EMNLP 2017, 11 page

    Who Contributes to the Knowledge Sharing Economy?

    Full text link
    Information sharing dynamics of social networks rely on a small set of influencers to effectively reach a large audience. Our recent results and observations demonstrate that the shape and identity of this elite, especially those contributing \emph{original} content, is difficult to predict. Information acquisition is often cited as an example of a public good. However, this emerging and powerful theory has yet to provably offer qualitative insights on how specialization of users into active and passive participants occurs. This paper bridges, for the first time, the theory of public goods and the analysis of diffusion in social media. We introduce a non-linear model of \emph{perishable} public goods, leveraging new observations about sharing of media sources. The primary contribution of this work is to show that \emph{shelf time}, which characterizes the rate at which content get renewed, is a critical factor in audience participation. Our model proves a fundamental \emph{dichotomy} in information diffusion: While short-lived content has simple and predictable diffusion, long-lived content has complex specialization. This occurs even when all information seekers are \emph{ex ante} identical and could be a contributing factor to the difficulty of predicting social network participation and evolution.Comment: 15 pages in ACM Conference on Online Social Networks 201

    Minimizing Polarization and Disagreement in Social Networks

    Full text link
    The rise of social media and online social networks has been a disruptive force in society. Opinions are increasingly shaped by interactions on online social media, and social phenomena including disagreement and polarization are now tightly woven into everyday life. In this work we initiate the study of the following question: given nn agents, each with its own initial opinion that reflects its core value on a topic, and an opinion dynamics model, what is the structure of a social network that minimizes {\em polarization} and {\em disagreement} simultaneously? This question is central to recommender systems: should a recommender system prefer a link suggestion between two online users with similar mindsets in order to keep disagreement low, or between two users with different opinions in order to expose each to the other's viewpoint of the world, and decrease overall levels of polarization? Our contributions include a mathematical formalization of this question as an optimization problem and an exact, time-efficient algorithm. We also prove that there always exists a network with O(n/ϵ2)O(n/\epsilon^2) edges that is a (1+ϵ)(1+\epsilon) approximation to the optimum. For a fixed graph, we additionally show how to optimize our objective function over the agents' innate opinions in polynomial time. We perform an empirical study of our proposed methods on synthetic and real-world data that verify their value as mining tools to better understand the trade-off between of disagreement and polarization. We find that there is a lot of space to reduce both polarization and disagreement in real-world networks; for instance, on a Reddit network where users exchange comments on politics, our methods achieve a ∼60 000\sim 60\,000-fold reduction in polarization and disagreement.Comment: 19 pages (accepted, WWW 2018

    A graph-based approach for learner-tailored teaching of Korean grammar constructions

    Get PDF
    • …
    corecore