1,629 research outputs found

    A Survey on Event-based News Narrative Extraction

    Full text link
    Narratives are fundamental to our understanding of the world, providing us with a natural structure for knowledge representation over time. Computational narrative extraction is a subfield of artificial intelligence that makes heavy use of information retrieval and natural language processing techniques. Despite the importance of computational narrative extraction, relatively little scholarly work exists on synthesizing previous research and strategizing future research in the area. In particular, this article focuses on extracting news narratives from an event-centric perspective. Extracting narratives from news data has multiple applications in understanding the evolving information landscape. This survey presents an extensive study of research in the area of event-based news narrative extraction. In particular, we screened over 900 articles that yielded 54 relevant articles. These articles are synthesized and organized by representation model, extraction criteria, and evaluation approaches. Based on the reviewed studies, we identify recent trends, open challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU

    A Hybrid Neural Network for Stock Price Direction Forecasting

    Get PDF
    The volatility of stock markets makes them notoriously difficult to predict and is the reason that many investors sell out at the wrong time. Contrary to the efficient market hypothesis (EMH) and the random walk theory, contribution to the study of machine learning models for stock price forecasting has shown evidence of stock markets predictability with varying degrees of success. Contemporary approaches have sought to use a hybrid of convolutional neural network (CNN) for its feature extraction capabilities and long short-term memory (LSTM) neural network for its time series prediction. This comparative study aims to determine the predictability of stock price movements by using a hybrid convolutional neural network (CNN) and long short-term memory (LSTM) neural network, a standalone LSTM neural network, a random forest model, and support vectors machines (SVM) model. Specifically, the study seeks to explore the predictive ability using stock price data, technical indicators, and foreignexchange (FX) rates transformed into deterministic trend signals as features for a hybrid CNN-LSTM neural network. This paper additionally considered including news article sentiment scores relating to stocks as part of the training dataset, but significant correlation was not found. In this study, the predictive ability is the accuracy of predicting the direction a stock price moves not the actual price. The experiment results suggest that a hybrid CNN-LSTM model can achieve around 60% accuracy trained with deterministic trend signals for stock trend prediction. This accuracy has higher than the accuracy of LSTM, random forest, and SVM. On this basis, one can conclude that the hybrid neural network model is superior to standalone LSTM, random forest, and SVM for stock price trend prediction

    Profiling the news spreading barriers using news headlines

    Full text link
    News headlines can be a good data source for detecting the news spreading barriers in news media, which may be useful in many real-world applications. In this paper, we utilize semantic knowledge through the inference-based model COMET and sentiments of news headlines for barrier classification. We consider five barriers including cultural, economic, political, linguistic, and geographical, and different types of news headlines including health, sports, science, recreation, games, homes, society, shopping, computers, and business. To that end, we collect and label the news headlines automatically for the barriers using the metadata of news publishers. Then, we utilize the extracted commonsense inferences and sentiments as features to detect the news spreading barriers. We compare our approach to the classical text classification methods, deep learning, and transformer-based methods. The results show that the proposed approach using inferences-based semantic knowledge and sentiment offers better performance than the usual (the average F1-score of the ten categories improves from 0.41, 0.39, 0.59, and 0.59 to 0.47, 0.55, 0.70, and 0.76 for the cultural, economic, political, and geographical respectively) for classifying the news-spreading barriers.Comment: arXiv admin note: substantial text overlap with arXiv:2304.0816

    Automatic caption generation for news images

    Get PDF
    This thesis is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Automatic description generation for video frames would help security authorities manage more efficiently and utilize large volumes of monitoring data. Image search engines could potentially benefit from image description in supporting more accurate and targeted queries for end users. Importantly, generating image descriptions would aid blind or partially sighted people who cannot access visual information in the same way as sighted people can. However, previous work has relied on fine-gained resources, manually created for specific domains and applications In this thesis, we explore the feasibility of automatic caption generation for news images in a knowledge-lean way. We depart from previous work, as we learn a model of caption generation from publicly available data that has not been explicitly labelled for our task. The model consists of two components, namely extracting image content and rendering it in natural language. Specifically, we exploit data resources where images and their textual descriptions co-occur naturally. We present a new dataset consisting of news articles, images, and their captions that we required from the BBC News website. Rather than laboriously annotating images with keywords, we simply treat the captions as the labels. We show that it is possible to learn the visual and textual correspondence under such noisy conditions by extending an existing generative annotation model (Lavrenko et al., 2003). We also find that the accompanying news documents substantially complements the extraction of the image content. In order to provide a better modelling and representation of image content,We propose a probabilistic image annotation model that exploits the synergy between visual and textual modalities under the assumption that images and their textual descriptions are generated by a shared set of latent variables (topics). Using Latent Dirichlet Allocation (Blei and Jordan, 2003), we represent visual and textual modalities jointly as a probability distribution over a set of topics. Our model takes these topic distributions into account while finding the most likely keywords for an image and its associated document. The availability of news documents in our dataset allows us to perform the caption generation task in a fashion akin to text summarization; save one important difference that our model is not solely based on text but uses the image in order to select content from the document that should be present in the caption. We propose both extractive and abstractive caption generation models to render the extracted image content in natural language without relying on rich knowledge resources, sentence-templates or grammars. The backbone for both approaches is our topic-based image annotation model. Our extractive models examine how to best select sentences that overlap in content with our image annotation model. We modify an existing abstractive headline generation model to our scenario by incorporating visual information. Our own model operates over image description keywords and document phrases by taking dependency and word order constraints into account. Experimental results show that both approaches can generate human-readable captions for news images. Our phrase-based abstractive model manages to yield as informative captions as those written by the BBC journalists

    Explaining the distribution of implicit means of misrepresentation:A case study on Italian immigration discourse

    Get PDF
    This study analyzes Fillmore's frames in a large corpus of Italian news headlines concerning migrations, dating from 2013 to 2021 and taken from newspapers of diverse ideological stances. Our goal is to assess whether, how, and why migrants' representation varies over time and across ideological stances. Our approach combines corpus-assisted critical discourse analysis with cognitive linguistics. We present a new methodology that exploits SOCIOFILLMORE, a tool integrating a novel Natural Language Processing model for automatic frame annotation into a web-based user interface for exploring frame-annotated corpora. In our corpus, the frequency distribution of frames varies over time according to detectable contextual factors. Across political stances, instead, the most frequent frames remain more constant: both right-winged and left-winged news providers contribute to reifying migrants into non-agentive entities. Further, in religious (Christian) press migrants are given a more humanizing depiction, but they still often appear in non-agentive roles. The distributions of frames can be explained by the fact that the latter act as indirect, routinized, and implicit means of (mis)representation. We suggest that framing entails inferential operations that take place unconsciously and can therefore escape the cognitive screening not only of those who receive discourse, but also of those who (re)produce it.</p

    A genre analysis on the roles of rhetorical structures and discourse markers in Malaysian newspaper reports

    Get PDF
    There are a number of issues related to the rhetorical structures of newspaper reports. From the ESP perspective, students of journalism courses still lack the knowledge of using appropriate rhetorical structures of newspaper reports. Understanding various forms, functions and positions of the linguistic structure of a genre is crucial to overcome the issues to achieve the communicative purposes embedded in the genre. However, there are limited studies on newspaper report’s structural linguistics patterns from the aspect of rhetorical structures (moves) and the use of discourse markers. Thus, there is a need to analyse how the rhetorical structures (moves) and the discourse markers are used in a newspaper report. Using the genre theory, this study analysed the frequency, patterns and the functional relations of moves and discourse markers in the corpus of online newspaper articles. The corpus consisted of ninety articles from the crime, politics and environmental news published in The Star newspaper. A corpus-based approach was used to code and calculate the frequency and patterns of the moves and discourse markers. The functions of the moves and the discourse markers were examined to see the relations within the moves. The data indicated that nine-move structures were identified with five optional moves and four obligatory moves. A hybrid and cyclical pattern emerged from the distributional patterns. Different grammatical word classes of discourse markers existed in the newspaper report. Based on the distributional patterns, the discourse markers occurred in the initial, middle and end of the sentences. Most of these discourse markers were found to serve a number of roles with the most occurrences of concession, evaluation, reason, and elaboration. The findings of this study can be useful for journalists, curriculum designers, translators, learners and instructors. Implication of the study revealed the importance of the rhetorical moves and DMs in specific genres for writers. Further analyses indicated the need for research of various patterns of the moves and DMs on the basis of large-scale genre research, analyse other sub-genres and comparison of newspaper reports with other countries in the future

    Mapping (Dis-)Information Flow about the MH17 Plane Crash

    Get PDF
    Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

    Argumentative zoning information extraction from scientific text

    Get PDF
    Let me tell you, writing a thesis is not always a barrel of laughs—and strange things can happen, too. For example, at the height of my thesis paranoia, I had a re-current dream in which my cat Amy gave me detailed advice on how to restructure the thesis chapters, which was awfully nice of her. But I also had a lot of human help throughout this time, whether things were going fine or beserk. Most of all, I want to thank Marc Moens: I could not have had a better or more knowledgable supervisor. He always took time for me, however busy he might have been, reading chapters thoroughly in two days. He both had the calmness of mind to give me lots of freedom in research, and the right judgement to guide me away, tactfully but determinedly, from the occasional catastrophe or other waiting along the way. He was great fun to work with and also became a good friend. My work has profitted from the interdisciplinary, interactive and enlightened atmosphere at the Human Communication Centre and the Centre for Cognitive Science (which is now called something else). The Language Technology Group was a great place to work in, as my research was grounded in practical applications develope
    • …
    corecore