82 research outputs found
Two Tales of the World: Comparison of Widely Used World News Datasets GDELT and EventRegistry
In this work, we compare GDELT and Event Registry, which monitor news
articles worldwide and provide big data to researchers regarding scale, news
sources, and news geography. We found significant differences in scale and news
sources, but surprisingly, we observed high similarity in news geography
between the two datasets.Comment: To be appeared in ICWSM'1
A Dynamic Embedding Model of the Media Landscape
Information about world events is disseminated through a wide variety of news
channels, each with specific considerations in the choice of their reporting.
Although the multiplicity of these outlets should ensure a variety of
viewpoints, recent reports suggest that the rising concentration of media
ownership may void this assumption. This observation motivates the study of the
impact of ownership on the global media landscape and its influence on the
coverage the actual viewer receives. To this end, the selection of reported
events has been shown to be informative about the high-level structure of the
news ecosystem. However, existing methods only provide a static view into an
inherently dynamic system, providing underperforming statistical models and
hindering our understanding of the media landscape as a whole.
In this work, we present a dynamic embedding method that learns to capture
the decision process of individual news sources in their selection of reported
events while also enabling the systematic detection of large-scale
transformations in the media landscape over prolonged periods of time. In an
experiment covering over 580M real-world event mentions, we show our approach
to outperform static embedding methods in predictive terms. We demonstrate the
potential of the method for news monitoring applications and investigative
journalism by shedding light on important changes in programming induced by
mergers and acquisitions, policy changes, or network-wide content diffusion.
These findings offer evidence of strong content convergence trends inside large
broadcasting groups, influencing the news ecosystem in a time of increasing
media ownership concentration
Comparing Events Coverage in Online News and Social Media: The Case of Climate Change
Social media is becoming more and more integrated in the distribution and consumption of news. How is news in social media different from mainstream news? % This paper presents a comparative analysis covering a span of 17 months and hundreds of news events, using a method that combines automatic and manual annotations. We focus on climate change, a topic that is frequently present in the news through a number of arguments, from current practices and causes (e.g. fracking, CO2 emissions) to consequences and solutions (e.g. extreme weather, electric cars). The coverage that these different aspects receive is often dependent on how they are framed---typically by mainstream media. Yet, evidence suggests an existing gap between what the news media publishes online and what the general public shares in social media. Through the analysis of a series of events, including awareness campaigns, natural disasters, governmental meetings and publications, among others, we uncover differences in terms of the triggers, actions, and news values that are prevalent in both types of media. This methodology can be extended to other important topics present in the news
Creating an Agglomerative Clustering Approach Using GDELT
GDELT is a project with a large scale, continuously updated databank that provides a real-time image of the global news picture by outputting these as files that can be downloaded and used by anyone. However, this data is of low granularity, and each source of data does not provide much information on its own. This thesis attempts to leverage the large amount of data available by utilizing a Hierarchical Agglomerative Cluster method to identify news articles that report about the same real life event. To do this, the thesis also explores if the GDELT data is granular enough to be used without extensive preprocessing, and if a distance metric for the cluster algorithm can be created. The findings show promising results when regarded with qualitative measures, but the quantitative measures are not yet optimized. Inherent flaws in GDELT and clustering algorithms are a hurdle to be overcome before the real potential of GDELT’s data can be unleashed, and this thesis will explore some of these difficulties and make recommendations for how to circumvent them in future works.Masteroppgave i informasjonsvitenskapINFO390MASV-INF
D6.1 Report on the specifications and architecture of the EMT platform
This deliverable aims to provide a first view on the design principles of the EU MigraTool that will be developed within the ITFLOWS project. The EUMigraTool (EMT for short) is a software platform that will integrate all the knowledge created within the ITFLOWS project. It will provide to relevant stakeholders a set of tools to enable them to do simulations and predictions on various migration aspects, ranging from the number of people expected to leave a certain region within selected countries of origin towards EU, to potential challenges when migration populations arrive in EU territories
Recommended from our members
Lifting the veil on the use of big data news repositories: A documentation and critical discussion of a protest event analysis
This paper presents a critical discussion of the processing, reliability and implications of free big data repositories. We argue that big data is not only the starting point of scientific analyses but also the outcome of a long string of invisible or semi-visible tasks, often masked by the fetish of size that supposedly lends validity to big data. We unpack these notions by illustrating the process of extracting protest event data from the Global Database of Events, Language and Tone (GDELT) in six European countries over a period of seven years. To stand up to rigorous scientific scrutiny, we collected additional data by computational means and undertook large-scale neural-network translation tasks, dictionary-based content analyses, machine-learning classification tasks, and human coding. In a documentation and critical discussion of this process, we render visible opaque procedures that inevitably shape any dataset and show how this type of freely available datasets require significant additional resources of knowledge, labor, money, and computational power. We conclude that while these processes can ultimately yield more valid datasets, the supposedly free and ready-to-use big news data repositories should not be taken at face value
Recommended from our members
Conspiracy in the Time of Corona: Automatic detection of Emerging Covid-19 Conspiracy Theories in Social Media and the News
Abstract
Rumors and conspiracy theories thrive in environments of low confi- dence and low trust. Consequently, it is not surprising that ones related to the Covid-19 pandemic are proliferating given the lack of scientific consensus on the virus’s spread and containment, or on the long term social and economic ramifications of the pandemic. Among the stories currently circulating are ones suggesting that the 5G telecommunication network activates the virus, that the pandemic is a hoax perpetrated by a global cabal, that the virus is a bio-weapon released deliberately by the Chinese, or that Bill Gates is using it as cover to launch a broad vaccination program to facilitate a global surveillance regime. While some may be quick to dismiss these stories as having little impact on real-world behavior, recent events including the destruction of cell phone towers, racially fueled attacks against Asian Americans, demonstrations espousing resistance to public health orders, and wide-scale defiance of scientifically sound public mandates such as those to wear masks and practice social distancing, countermand such conclusions. Inspired by narrative theory, we crawl social media sites and news reports and, through the application of automated machine-learning methods, discover the underlying narrative frame- works supporting the generation of rumors and conspiracy theories. We show how the various narrative frameworks fueling these stories rely on the alignment of otherwise disparate domains of knowledge, and consider how they attach to the broader reporting on the pandemic. These alignments and attachments, which can be monitored in near real-time, may be useful for identifying areas in the news that are particularly vulnerable to reinterpretation by conspiracy theorists. Understanding the dynamics of storytelling on social media and the narrative frameworks that provide the generative basis for these stories may also be helpful for devising methods to disrupt their spread
- …