38,292 research outputs found
Growing Story Forest Online from Massive Breaking News
We describe our experience of implementing a news content organization system
at Tencent that discovers events from vast streams of breaking news and evolves
news story structures in an online fashion. Our real-world system has distinct
requirements in contrast to previous studies on topic detection and tracking
(TDT) and event timeline or graph generation, in that we 1) need to accurately
and quickly extract distinguishable events from massive streams of long text
documents that cover diverse topics and contain highly redundant information,
and 2) must develop the structures of event stories in an online manner,
without repeatedly restructuring previously formed stories, in order to
guarantee a consistent user viewing experience. In solving these challenges, we
propose Story Forest, a set of online schemes that automatically clusters
streaming documents into events, while connecting related events in growing
trees to tell evolving stories. We conducted extensive evaluation based on 60
GB of real-world Chinese news data, although our ideas are not
language-dependent and can easily be extended to other languages, through
detailed pilot user experience studies. The results demonstrate the superior
capability of Story Forest to accurately identify events and organize news text
into a logical structure that is appealing to human readers, compared to
multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page
Satirical News Detection and Analysis using Attention Mechanism and Linguistic Features
Satirical news is considered to be entertainment, but it is potentially
deceptive and harmful. Despite the embedded genre in the article, not everyone
can recognize the satirical cues and therefore believe the news as true news.
We observe that satirical cues are often reflected in certain paragraphs rather
than the whole document. Existing works only consider document-level features
to detect the satire, which could be limited. We consider paragraph-level
linguistic features to unveil the satire by incorporating neural network and
attention mechanism. We investigate the difference between paragraph-level
features and document-level features, and analyze them on a large satirical
news dataset. The evaluation shows that the proposed model detects satirical
news effectively and reveals what features are important at which level.Comment: EMNLP 2017, 11 page
Who Contributes to the Knowledge Sharing Economy?
Information sharing dynamics of social networks rely on a small set of
influencers to effectively reach a large audience. Our recent results and
observations demonstrate that the shape and identity of this elite, especially
those contributing \emph{original} content, is difficult to predict.
Information acquisition is often cited as an example of a public good. However,
this emerging and powerful theory has yet to provably offer qualitative
insights on how specialization of users into active and passive participants
occurs.
This paper bridges, for the first time, the theory of public goods and the
analysis of diffusion in social media. We introduce a non-linear model of
\emph{perishable} public goods, leveraging new observations about sharing of
media sources. The primary contribution of this work is to show that
\emph{shelf time}, which characterizes the rate at which content get renewed,
is a critical factor in audience participation. Our model proves a fundamental
\emph{dichotomy} in information diffusion: While short-lived content has simple
and predictable diffusion, long-lived content has complex specialization. This
occurs even when all information seekers are \emph{ex ante} identical and could
be a contributing factor to the difficulty of predicting social network
participation and evolution.Comment: 15 pages in ACM Conference on Online Social Networks 201
Minimizing Polarization and Disagreement in Social Networks
The rise of social media and online social networks has been a disruptive
force in society. Opinions are increasingly shaped by interactions on online
social media, and social phenomena including disagreement and polarization are
now tightly woven into everyday life. In this work we initiate the study of the
following question: given agents, each with its own initial opinion that
reflects its core value on a topic, and an opinion dynamics model, what is the
structure of a social network that minimizes {\em polarization} and {\em
disagreement} simultaneously?
This question is central to recommender systems: should a recommender system
prefer a link suggestion between two online users with similar mindsets in
order to keep disagreement low, or between two users with different opinions in
order to expose each to the other's viewpoint of the world, and decrease
overall levels of polarization? Our contributions include a mathematical
formalization of this question as an optimization problem and an exact,
time-efficient algorithm. We also prove that there always exists a network with
edges that is a approximation to the optimum.
For a fixed graph, we additionally show how to optimize our objective function
over the agents' innate opinions in polynomial time.
We perform an empirical study of our proposed methods on synthetic and
real-world data that verify their value as mining tools to better understand
the trade-off between of disagreement and polarization. We find that there is a
lot of space to reduce both polarization and disagreement in real-world
networks; for instance, on a Reddit network where users exchange comments on
politics, our methods achieve a -fold reduction in polarization
and disagreement.Comment: 19 pages (accepted, WWW 2018
- …