1,629 research outputs found
A Survey on Event-based News Narrative Extraction
Narratives are fundamental to our understanding of the world, providing us
with a natural structure for knowledge representation over time. Computational
narrative extraction is a subfield of artificial intelligence that makes heavy
use of information retrieval and natural language processing techniques.
Despite the importance of computational narrative extraction, relatively little
scholarly work exists on synthesizing previous research and strategizing future
research in the area. In particular, this article focuses on extracting news
narratives from an event-centric perspective. Extracting narratives from news
data has multiple applications in understanding the evolving information
landscape. This survey presents an extensive study of research in the area of
event-based news narrative extraction. In particular, we screened over 900
articles that yielded 54 relevant articles. These articles are synthesized and
organized by representation model, extraction criteria, and evaluation
approaches. Based on the reviewed studies, we identify recent trends, open
challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU
A Hybrid Neural Network for Stock Price Direction Forecasting
The volatility of stock markets makes them notoriously difficult to predict and is the reason that many investors sell out at the wrong time. Contrary to the efficient market hypothesis (EMH) and the random walk theory, contribution to the study of machine learning models for stock price forecasting has shown evidence of stock markets predictability with varying degrees of success. Contemporary approaches have sought to use a hybrid of convolutional neural network (CNN) for its feature extraction capabilities and long short-term memory (LSTM) neural network for its time series prediction. This comparative study aims to determine the predictability of stock price movements by using a hybrid convolutional neural network (CNN) and long short-term memory (LSTM) neural network, a standalone LSTM neural network, a random forest model, and support vectors machines (SVM) model. Specifically, the study seeks to explore the predictive ability using stock price data, technical indicators, and foreignexchange (FX) rates transformed into deterministic trend signals as features for a hybrid CNN-LSTM neural network. This paper additionally considered including news article sentiment scores relating to stocks as part of the training dataset, but significant correlation was not found. In this study, the predictive ability is the accuracy of predicting the direction a stock price moves not the actual price.
The experiment results suggest that a hybrid CNN-LSTM model can achieve around 60% accuracy trained with deterministic trend signals for stock trend prediction. This accuracy has higher than the accuracy of LSTM, random forest, and SVM. On this basis, one can conclude that the hybrid neural network model is superior to standalone LSTM, random forest, and SVM for stock price trend prediction
Profiling the news spreading barriers using news headlines
News headlines can be a good data source for detecting the news spreading
barriers in news media, which may be useful in many real-world applications. In
this paper, we utilize semantic knowledge through the inference-based model
COMET and sentiments of news headlines for barrier classification. We consider
five barriers including cultural, economic, political, linguistic, and
geographical, and different types of news headlines including health, sports,
science, recreation, games, homes, society, shopping, computers, and business.
To that end, we collect and label the news headlines automatically for the
barriers using the metadata of news publishers. Then, we utilize the extracted
commonsense inferences and sentiments as features to detect the news spreading
barriers. We compare our approach to the classical text classification methods,
deep learning, and transformer-based methods. The results show that the
proposed approach using inferences-based semantic knowledge and sentiment
offers better performance than the usual (the average F1-score of the ten
categories improves from 0.41, 0.39, 0.59, and 0.59 to 0.47, 0.55, 0.70, and
0.76 for the cultural, economic, political, and geographical respectively) for
classifying the news-spreading barriers.Comment: arXiv admin note: substantial text overlap with arXiv:2304.0816
Automatic caption generation for news images
This thesis is concerned with the task of automatically generating captions for images,
which is important for many image-related applications. Automatic description generation
for video frames would help security authorities manage more efficiently and
utilize large volumes of monitoring data. Image search engines could potentially benefit
from image description in supporting more accurate and targeted queries for end
users. Importantly, generating image descriptions would aid blind or partially sighted
people who cannot access visual information in the same way as sighted people can.
However, previous work has relied on fine-gained resources, manually created for specific
domains and applications In this thesis, we explore the feasibility of automatic
caption generation for news images in a knowledge-lean way. We depart from previous
work, as we learn a model of caption generation from publicly available data that
has not been explicitly labelled for our task. The model consists of two components,
namely extracting image content and rendering it in natural language.
Specifically, we exploit data resources where images and their textual descriptions
co-occur naturally. We present a new dataset consisting of news articles, images, and
their captions that we required from the BBC News website. Rather than laboriously
annotating images with keywords, we simply treat the captions as the labels. We show
that it is possible to learn the visual and textual correspondence under such noisy conditions
by extending an existing generative annotation model (Lavrenko et al., 2003).
We also find that the accompanying news documents substantially complements the
extraction of the image content. In order to provide a better modelling and representation
of image content,We propose a probabilistic image annotation model that exploits
the synergy between visual and textual modalities under the assumption that images
and their textual descriptions are generated by a shared set of latent variables (topics).
Using Latent Dirichlet Allocation (Blei and Jordan, 2003), we represent visual and
textual modalities jointly as a probability distribution over a set of topics. Our model
takes these topic distributions into account while finding the most likely keywords for
an image and its associated document.
The availability of news documents in our dataset allows us to perform the caption
generation task in a fashion akin to text summarization; save one important difference
that our model is not solely based on text but uses the image in order to select content
from the document that should be present in the caption. We propose both extractive
and abstractive caption generation models to render the extracted image content
in natural language without relying on rich knowledge resources, sentence-templates or grammars. The backbone for both approaches is our topic-based image annotation
model. Our extractive models examine how to best select sentences that overlap in
content with our image annotation model. We modify an existing abstractive headline
generation model to our scenario by incorporating visual information. Our own
model operates over image description keywords and document phrases by taking dependency
and word order constraints into account. Experimental results show that both
approaches can generate human-readable captions for news images. Our phrase-based
abstractive model manages to yield as informative captions as those written by the
BBC journalists
Explaining the distribution of implicit means of misrepresentation:A case study on Italian immigration discourse
This study analyzes Fillmore's frames in a large corpus of Italian news headlines concerning migrations, dating from 2013 to 2021 and taken from newspapers of diverse ideological stances. Our goal is to assess whether, how, and why migrants' representation varies over time and across ideological stances. Our approach combines corpus-assisted critical discourse analysis with cognitive linguistics. We present a new methodology that exploits SOCIOFILLMORE, a tool integrating a novel Natural Language Processing model for automatic frame annotation into a web-based user interface for exploring frame-annotated corpora. In our corpus, the frequency distribution of frames varies over time according to detectable contextual factors. Across political stances, instead, the most frequent frames remain more constant: both right-winged and left-winged news providers contribute to reifying migrants into non-agentive entities. Further, in religious (Christian) press migrants are given a more humanizing depiction, but they still often appear in non-agentive roles. The distributions of frames can be explained by the fact that the latter act as indirect, routinized, and implicit means of (mis)representation. We suggest that framing entails inferential operations that take place unconsciously and can therefore escape the cognitive screening not only of those who receive discourse, but also of those who (re)produce it.</p
A genre analysis on the roles of rhetorical structures and discourse markers in Malaysian newspaper reports
There are a number of issues related to the rhetorical structures of newspaper reports. From the ESP perspective, students of journalism courses still lack the knowledge of using appropriate rhetorical structures of newspaper reports. Understanding various forms, functions and positions of the linguistic structure of a genre is crucial to overcome the issues to achieve the communicative purposes embedded in the genre. However, there are limited studies on newspaper report’s structural linguistics patterns from the aspect of rhetorical structures (moves) and the use of discourse markers. Thus, there is a need to analyse how the rhetorical structures (moves) and the discourse markers are used in a
newspaper report. Using the genre theory, this study analysed the frequency, patterns and the functional relations of moves and discourse markers in the corpus of online newspaper articles. The corpus consisted of ninety articles from the crime, politics and environmental news published in The Star newspaper. A corpus-based approach was used to code and calculate the frequency and patterns of the moves and discourse markers. The functions of the moves and the discourse markers were examined to see the relations within the moves. The data indicated that nine-move structures were identified with five optional moves and four obligatory moves. A hybrid and cyclical pattern emerged from the distributional patterns. Different grammatical word classes of discourse markers existed in the newspaper report. Based on the distributional patterns, the discourse markers occurred in the initial, middle and end of the sentences. Most of these discourse markers were found to serve a number of roles with the most occurrences of concession, evaluation, reason, and elaboration. The findings of this study can be useful for journalists, curriculum designers, translators, learners and instructors. Implication of the study revealed the importance of the rhetorical moves and DMs in specific genres for writers. Further analyses indicated the need for research of various patterns of the moves and DMs on the basis of large-scale genre research, analyse other sub-genres and comparison of newspaper reports with other countries in the future
Mapping (Dis-)Information Flow about the MH17 Plane Crash
Digital media enables not only fast sharing of information, but also
disinformation. One prominent case of an event leading to circulation of
disinformation on social media is the MH17 plane crash. Studies analysing the
spread of information about this event on Twitter have focused on small,
manually annotated datasets, or used proxys for data annotation. In this work,
we examine to what extent text classifiers can be used to label data for
subsequent content analysis, in particular we focus on predicting pro-Russian
and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though
we find that a neural classifier improves over a hashtag based baseline,
labeling pro-Russian and pro-Ukrainian content with high precision remains a
challenging problem. We provide an error analysis underlining the difficulty of
the task and identify factors that might help improve classification in future
work. Finally, we show how the classifier can facilitate the annotation task
for human annotators
Argumentative zoning information extraction from scientific text
Let me tell you, writing a thesis is not always a barrel of laughs—and strange things can happen, too. For example, at the height of my thesis paranoia, I had a re-current dream in which my cat Amy gave me detailed advice on how to restructure the thesis chapters, which was awfully nice of her. But I also had a lot of human help throughout this time, whether things were going fine or beserk. Most of all, I want to thank Marc Moens: I could not have had a better or more knowledgable supervisor. He always took time for me, however busy he might have been, reading chapters thoroughly in two days. He both had the calmness of mind to give me lots of freedom in research, and the right judgement to guide me away, tactfully but determinedly, from the occasional catastrophe or other waiting along the way. He was great fun to work with and also became a good friend. My work has profitted from the interdisciplinary, interactive and enlightened atmosphere at the Human Communication Centre and the Centre for Cognitive Science (which is now called something else). The Language Technology Group was a great place to work in, as my research was grounded in practical applications develope
- …