Search CORE

159 research outputs found

Readers and Reading in the First World War

Author: Edmund G. C. King
Francesca Benatti
Shafquat Towheed
The Stanford Natural Language Processing Group
University of Sheffield
Publication venue: 'Modern Humanities Research Association'
Publication date: 01/01/2015
Field of study

This essay consists of three individually authored and interlinked sections. In ‘A Digital Humanities Approach’, Francesca Benatti looks at datasets and databases (including the UK Reading Experience Database) and shows how a systematic, macro-analytical use of digital humanities tools and resources might yield answers to some key questions about reading in the First World War. In ‘Reading behind the Wire in the First World War’ Edmund G. C. King scrutinizes the reading practices and preferences of Allied prisoners of war in Mainz, showing that reading circumscribed by the contingencies of a prison camp created an unique literary community, whose legacy can be traced through their literary output after the war. In ‘Book-hunger in Salonika’, Shafquat Towheed examines the record of a single reader in a specific and fairly static frontline, and argues that in the case of the Salonika campaign, reading communities emerged in close proximity to existing centres of print culture. The focus of this essay moves from the general to the particular, from the scoping of large datasets, to the analyses of identified readers within a specific geographical and temporal space. The authors engage with the wider issues and problems of recovering, interpreting, visualizing, narrating, and representing readers in the First World War

Crossref

Open Research Online

Text content and task performance in the evaluation of a natural language generation system

Author: Gatt Albert
International Conference RANLP - 2009 /Recent Advances in Natural Language Processing
Portet Francois
Publication venue: Ontotext
Publication date: 01/01/2009
Field of study

An important question in the evaluation of Natural Language Generation systems concerns the relationship between textual characteristics and task performance. If the results of task-based evaluation can be correlated to properties of the text, there are better prospects for improving the system. The present paper investigates this relationship by focusing on the outcomes of a task-based evaluation of a system that generates summaries of patient data, attempting to correlate these with the results of an analysis of the system’s texts, compared to a set of gold standard human-authored summaries.peer-reviewe

OAR@UM

ABDN at SemEval-2018 Task 10 : recognising discriminative attributes using context embeddings and WordNet

Author: Mao Rui
Chen G.
Li Ruizhe
Lin Chenghua
Sub Natural Language Processing
Natural Language Processing
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

This paper describes the system that we submitted for SemEval-2018 task 10: capturing discriminative attributes. Our system is built upon a simple idea of measuring the attribute word’s similarity with each of the two semantically similar words, based on an extended word embedding method and WordNet. Instead of computing the similarities between the attribute and semantically similar words by using standard word embeddings, we propose a novel method that combines word and context embeddings which can better measure similarities. Our model is simple and effective, which achieves an average F1 score of 0.62 on the test set

Crossref

Utrecht University Repository

White Rose Research Online

Evaluating algorithms for the generation of referring expressions : going beyond toy domains

Author: Gatt Albert
International Conference RANLP - 2007/Recent Advances in Natural Language Processing
van Deemter Kees
van der Sluis Ielka
Publication venue: Xerox Research Centre Europe
Publication date: 01/01/2007
Field of study

We describe a corpus-based evaluation methodology, applied to a number of classic algorithms in the generation of referring expressions. Following up on earlier work involving very simple domains, this paper deals with the issues associated with domains that contain ‘real-life’ objects of some complexity. Results indicate that state of the art algorithms perform very differently when applied to a complex domain. Moreover, if a version of the Incremental Algorithm is used then it becomes of huge importance to select a good preference order. These results should contribute to a growing debate on the evaluation of nlg systems, arguing in favour of carefully constructed balanced and semantically transparent corpora.peer-reviewe

OAR@UM

How we do things with words: Analyzing text as social and cultural data

Author: Dedeo Simon
Eisenstein Jacob
Liakata Maria
Mimno David
Natural Language Processing
Nguyen Dong
Sub Natural Language Processing
Tromble Rebekah
Winters Jane
Publication venue: 'Frontiers Media SA'
Publication date: 02/07/2019
Field of study

In this article we describe our experiences with computational text analysis. We hope to achieve three primary goals. First, we aim to shed light on thorny issues not always at the forefront of discussions about computational text analysis methods. Second, we hope to provide a set of best practices for working with thick social and cultural concepts. Our guidance is based on our own experiences and is therefore inherently imperfect. Still, given our diversity of disciplinary backgrounds and research practices, we hope to capture a range of ideas and identify commonalities that will resonate for many. And this leads to our final goal: to help promote interdisciplinary collaborations. Interdisciplinary insights and partnerships are essential for realizing the full potential of any computational text analysis that involves social and cultural concepts, and the more we are able to bridge these divides, the more fruitful we believe our work will be

arXiv.org e-Print Archive

Utrecht University Repository

Generating Media Background Checks for Automated Source Critical Reasoning

Author: Schlichtkrull M
The 2024 Conference on Empirical Methods in Natural Language Processing
Publication venue
Publication date: 12/11/2024
Field of study

Not everything on the internet is true. This unfortunate fact requires both humans and models to perform complex reasoning about credibility when working with retrieved information. In NLP, this problem has seen little attention. Indeed, retrieval-augmented models are not typically expected to distrust retrieved documents. Human experts overcome the challenge by gathering signals about the context, reliability, and tendency of source documents - that is, they perform source criticism. We propose a novel NLP task focused on finding and summarising such signals. We introduce a new dataset of 6,709 "media background checks" derived from Media Bias / Fact Check, a volunteer-run website documenting media bias. We test open-source and closed-source LLM baselines with and without retrieval on this dataset, finding that retrieval greatly improves performance. We furthermore carry out human evaluation, demonstrating that 1) media background checks are helpful for humans, and 2) media background checks are helpful for retrieval-augmented models

Queen Mary Research Online

Efficient Vision-Language pre-training via domain-specific learning for human activities

Author: Bulat A
Empirical Methods in Natural Language Processing
Guerrero R
Martinez B
Ouali Y
Tzimiropoulos G
Publication venue
Publication date: 24/10/2024
Field of study

Current Vision-Language (VL) models owe their success to large-scale pre-training on web-collected data, which in turn requires high-capacity architectures and large compute resources for training. We posit that when the downstream tasks are known in advance, which is in practice common, the pretraining process can be aligned to the downstream domain, leading to more efficient and accurate models, while shortening the pretraining step. To this end, we introduce a domain-aligned pretraining strategy that, without additional data collection, improves the accuracy on a domain of interest, herein, that of human activities, while largely preserving the generalist knowledge. At the core of our approach stands a new LLM-based method that, provided with a simple set of concept seeds, produces a concept hierarchy with high coverage of the target domain.The concept hierarchy is used to filter a large-scale web-crawled dataset and, then, enhance the resulting instances with targeted synthetic labels. We study in depth how to train such approaches and their resulting behavior. We further show generalization to video-based data by introducing a fast adaptation approach for transitioning from a static (image) model to a dynamic one (i.e. with temporal modeling). On the domain of interest, our approach significantly outperforms models trained on up to 60× more samples and between 10-100× shorter training schedules for image retrieval, video retrieval and action recognition. Code will be released

Queen Mary Research Online

Are Embedded Potatoes Still Vegetables? On the Limitations of WordNet Embeddings for Lexical Semantics

Author: Cheng X
Emerson G
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Schlichtkrull M
Publication venue
Publication date: 06/12/2023
Field of study

Queen Mary Research Online