159 research outputs found

    Readers and Reading in the First World War

    Get PDF
    This essay consists of three individually authored and interlinked sections. In ‘A Digital Humanities Approach’, Francesca Benatti looks at datasets and databases (including the UK Reading Experience Database) and shows how a systematic, macro-analytical use of digital humanities tools and resources might yield answers to some key questions about reading in the First World War. In ‘Reading behind the Wire in the First World War’ Edmund G. C. King scrutinizes the reading practices and preferences of Allied prisoners of war in Mainz, showing that reading circumscribed by the contingencies of a prison camp created an unique literary community, whose legacy can be traced through their literary output after the war. In ‘Book-hunger in Salonika’, Shafquat Towheed examines the record of a single reader in a specific and fairly static frontline, and argues that in the case of the Salonika campaign, reading communities emerged in close proximity to existing centres of print culture. The focus of this essay moves from the general to the particular, from the scoping of large datasets, to the analyses of identified readers within a specific geographical and temporal space. The authors engage with the wider issues and problems of recovering, interpreting, visualizing, narrating, and representing readers in the First World War

    Text content and task performance in the evaluation of a natural language generation system

    Get PDF
    An important question in the evaluation of Natural Language Generation systems concerns the relationship between textual characteristics and task performance. If the results of task-based evaluation can be correlated to properties of the text, there are better prospects for improving the system. The present paper investigates this relationship by focusing on the outcomes of a task-based evaluation of a system that generates summaries of patient data, attempting to correlate these with the results of an analysis of the system’s texts, compared to a set of gold standard human-authored summaries.peer-reviewe

    ABDN at SemEval-2018 Task 10 : recognising discriminative attributes using context embeddings and WordNet

    Get PDF
    This paper describes the system that we submitted for SemEval-2018 task 10: capturing discriminative attributes. Our system is built upon a simple idea of measuring the attribute word’s similarity with each of the two semantically similar words, based on an extended word embedding method and WordNet. Instead of computing the similarities between the attribute and semantically similar words by using standard word embeddings, we propose a novel method that combines word and context embeddings which can better measure similarities. Our model is simple and effective, which achieves an average F1 score of 0.62 on the test set

    Evaluating algorithms for the generation of referring expressions : going beyond toy domains

    Get PDF
    We describe a corpus-based evaluation methodology, applied to a number of classic algorithms in the generation of referring expressions. Following up on earlier work involving very simple domains, this paper deals with the issues associated with domains that contain ‘real-life’ objects of some complexity. Results indicate that state of the art algorithms perform very differently when applied to a complex domain. Moreover, if a version of the Incremental Algorithm is used then it becomes of huge importance to select a good preference order. These results should contribute to a growing debate on the evaluation of nlg systems, arguing in favour of carefully constructed balanced and semantically transparent corpora.peer-reviewe

    How we do things with words: Analyzing text as social and cultural data

    Get PDF
    In this article we describe our experiences with computational text analysis. We hope to achieve three primary goals. First, we aim to shed light on thorny issues not always at the forefront of discussions about computational text analysis methods. Second, we hope to provide a set of best practices for working with thick social and cultural concepts. Our guidance is based on our own experiences and is therefore inherently imperfect. Still, given our diversity of disciplinary backgrounds and research practices, we hope to capture a range of ideas and identify commonalities that will resonate for many. And this leads to our final goal: to help promote interdisciplinary collaborations. Interdisciplinary insights and partnerships are essential for realizing the full potential of any computational text analysis that involves social and cultural concepts, and the more we are able to bridge these divides, the more fruitful we believe our work will be

    Generating Media Background Checks for Automated Source Critical Reasoning

    Get PDF
    Not everything on the internet is true. This unfortunate fact requires both humans and models to perform complex reasoning about credibility when working with retrieved information. In NLP, this problem has seen little attention. Indeed, retrieval-augmented models are not typically expected to distrust retrieved documents. Human experts overcome the challenge by gathering signals about the context, reliability, and tendency of source documents - that is, they perform source criticism. We propose a novel NLP task focused on finding and summarising such signals. We introduce a new dataset of 6,709 "media background checks" derived from Media Bias / Fact Check, a volunteer-run website documenting media bias. We test open-source and closed-source LLM baselines with and without retrieval on this dataset, finding that retrieval greatly improves performance. We furthermore carry out human evaluation, demonstrating that 1) media background checks are helpful for humans, and 2) media background checks are helpful for retrieval-augmented models

    Efficient Vision-Language pre-training via domain-specific learning for human activities

    Get PDF
    Current Vision-Language (VL) models owe their success to large-scale pre-training on web-collected data, which in turn requires high-capacity architectures and large compute resources for training. We posit that when the downstream tasks are known in advance, which is in practice common, the pretraining process can be aligned to the downstream domain, leading to more efficient and accurate models, while shortening the pretraining step. To this end, we introduce a domain-aligned pretraining strategy that, without additional data collection, improves the accuracy on a domain of interest, herein, that of human activities, while largely preserving the generalist knowledge. At the core of our approach stands a new LLM-based method that, provided with a simple set of concept seeds, produces a concept hierarchy with high coverage of the target domain.The concept hierarchy is used to filter a large-scale web-crawled dataset and, then, enhance the resulting instances with targeted synthetic labels. We study in depth how to train such approaches and their resulting behavior. We further show generalization to video-based data by introducing a fast adaptation approach for transitioning from a static (image) model to a dynamic one (i.e. with temporal modeling). On the domain of interest, our approach significantly outperforms models trained on up to 60× more samples and between 10-100× shorter training schedules for image retrieval, video retrieval and action recognition. Code will be released
    corecore