23,867 research outputs found
Two Tales of the World: Comparison of Widely Used World News Datasets GDELT and EventRegistry
In this work, we compare GDELT and Event Registry, which monitor news
articles worldwide and provide big data to researchers regarding scale, news
sources, and news geography. We found significant differences in scale and news
sources, but surprisingly, we observed high similarity in news geography
between the two datasets.Comment: To be appeared in ICWSM'1
Indirect Match Highlights Detection with Deep Convolutional Neural Networks
Highlights in a sport video are usually referred as actions that stimulate
excitement or attract attention of the audience. A big effort is spent in
designing techniques which find automatically highlights, in order to
automatize the otherwise manual editing process. Most of the state-of-the-art
approaches try to solve the problem by training a classifier using the
information extracted on the tv-like framing of players playing on the game
pitch, learning to detect game actions which are labeled by human observers
according to their perception of highlight. Obviously, this is a long and
expensive work. In this paper, we reverse the paradigm: instead of looking at
the gameplay, inferring what could be exciting for the audience, we directly
analyze the audience behavior, which we assume is triggered by events happening
during the game. We apply deep 3D Convolutional Neural Network (3D-CNN) to
extract visual features from cropped video recordings of the supporters that
are attending the event. Outputs of the crops belonging to the same frame are
then accumulated to produce a value indicating the Highlight Likelihood (HL)
which is then used to discriminate between positive (i.e. when a highlight
occurs) and negative samples (i.e. standard play or time-outs). Experimental
results on a public dataset of ice-hockey matches demonstrate the effectiveness
of our method and promote further research in this new exciting direction.Comment: "Social Signal Processing and Beyond" workshop, in conjunction with
ICIAP 201
A Dynamic Embedding Model of the Media Landscape
Information about world events is disseminated through a wide variety of news
channels, each with specific considerations in the choice of their reporting.
Although the multiplicity of these outlets should ensure a variety of
viewpoints, recent reports suggest that the rising concentration of media
ownership may void this assumption. This observation motivates the study of the
impact of ownership on the global media landscape and its influence on the
coverage the actual viewer receives. To this end, the selection of reported
events has been shown to be informative about the high-level structure of the
news ecosystem. However, existing methods only provide a static view into an
inherently dynamic system, providing underperforming statistical models and
hindering our understanding of the media landscape as a whole.
In this work, we present a dynamic embedding method that learns to capture
the decision process of individual news sources in their selection of reported
events while also enabling the systematic detection of large-scale
transformations in the media landscape over prolonged periods of time. In an
experiment covering over 580M real-world event mentions, we show our approach
to outperform static embedding methods in predictive terms. We demonstrate the
potential of the method for news monitoring applications and investigative
journalism by shedding light on important changes in programming induced by
mergers and acquisitions, policy changes, or network-wide content diffusion.
These findings offer evidence of strong content convergence trends inside large
broadcasting groups, influencing the news ecosystem in a time of increasing
media ownership concentration
Selection Bias in News Coverage: Learning it, Fighting it
News entities must select and filter the coverage they broadcast through
their respective channels since the set of world events is too large to be
treated exhaustively. The subjective nature of this filtering induces biases
due to, among other things, resource constraints, editorial guidelines,
ideological affinities, or even the fragmented nature of the information at a
journalist's disposal. The magnitude and direction of these biases are,
however, widely unknown. The absence of ground truth, the sheer size of the
event space, or the lack of an exhaustive set of absolute features to measure
make it difficult to observe the bias directly, to characterize the leaning's
nature and to factor it out to ensure a neutral coverage of the news. In this
work, we introduce a methodology to capture the latent structure of media's
decision process on a large scale. Our contribution is multi-fold. First, we
show media coverage to be predictable using personalization techniques, and
evaluate our approach on a large set of events collected from the GDELT
database. We then show that a personalized and parametrized approach not only
exhibits higher accuracy in coverage prediction, but also provides an
interpretable representation of the selection bias. Last, we propose a method
able to select a set of sources by leveraging the latent representation. These
selected sources provide a more diverse and egalitarian coverage, all while
retaining the most actively covered events
- …