Search CORE

1,214 research outputs found

Recommended from our members

Unsupervised Timeline Generation for Wikipedia History Articles

Author: Bauer S
Teufel SH
Publication venue: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
Publication date: 30/12/2016
Field of study

This paper presents a generic approach to content selection for creating timelines from individual history articles for which no external information about the same topic is available. This scenario is in contrast to existing works on timeline generation, which require the presence of a large corpus of news articles. To identify salient events in a given history article, we exploit lexical cues about the article's subject area, as well as time expressions that are syntactically attached to an event word. We also test different methods of ensuring timeline coverage of the entire historical time span described. Our best-performing method outperforms a new unsupervised base-line and an improved version of an existing supervised approach. We see our work as a step towards more semantically motivated approaches to single-document summarisation

Apollo (Cambridge)

ANALYZING USER INTERACTION LOGS OF AN EDUCATIONAL VISUALIZATION SYSTEM TO UNDERSTAND HOW STUDENTS GENERATE INSIGHTS

Author: Mukabak Kuatbek
Publication venue: Graduate School of UNIST
Publication date: 01/08/2018
Field of study

Department of Computer Science and EngineeringVisual analytics systems have been becoming popular in many domains. Recently, a visual analytical tool, VAiRoma is designed in educational domain to support students learn the history class. However, how users are interacting with such systems is still not known enough. In an educational domain, it is important to know how users are gaining insights. It may give us an opportunity to understand the user???s learning style, so that we can design better visualization tools in the future. In this thesis, I will analyze the interaction logs of an educational visualization system, VAiRoma, in order to explore how users generating insights via the system. Based on the results, users tried more explorative interactions at the initial stages of their insight generation path. In the middle of the path, users mostly read some textual information. Toward the end, they attempted to show their understandings from what they learnt by creating an annotation. There is also a cyclic behavior of an insight generation path. In 38% of cases, during the annotation creation process, the users cancelled to ???create an annotation??? and went back to read some textual information.ope

ScholarWorks@UNIST

Tracking the History and Evolution of Entities: Entity-centric Temporal Analysis of Large Social Media Archives

Author: Fafalios Pavlos
Iosifidis Vasileios
Ntoutsi Eirini
Stefanidis Kostas
Publication venue
Publication date: 24/10/2018
Field of study

How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as old news articles or social media archives. In particular, user-generated content posted in social networks, like Twitter and Facebook, can be seen as a comprehensive documentation of our society, and thus meaningful analysis methods over such archived data are of immense value for sociologists, historians and other interested parties who want to study the history and evolution of entities and events. To this end, in this paper we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities were reflected in social media in different time periods and under different aspects, like popularity, attitude, controversiality, and connectedness with other entities. A case study using a large Twitter archive of four years illustrates the insights that can be gained by such an entity-centric and multi-aspect analysis.Comment: This is a preprint of an article accepted for publication in the International Journal on Digital Libraries (2018

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

A Controllable Model of Grounded Response Generation

Author: Brockett Chris
Dolan Bill
Galley Michel
Gao Jianfeng
Gao Xiang
Hajishirzi Hannaneh
Koncel-Kedziorski Rik
Ostendorf Mari
Quirk Chris
Wu Zeqiu
Zhang Yizhe
Publication venue
Publication date: 18/05/2021
Field of study

Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process, often resulting in uninteresting responses. Attempts to boost informativeness alone come at the expense of factual accuracy, as attested by pretrained language models' propensity to "hallucinate" facts. While this may be mitigated by access to background knowledge, there is scant guarantee of relevance and informativeness in generated responses. We propose a framework that we call controllable grounded response generation (CGRG), in which lexical control phrases are either provided by a user or automatically extracted by a control phrase predictor from dialogue context and grounding knowledge. Quantitative and qualitative results show that, using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Content Selection for Timeline Generation from Single History Articles

Author: Bauer Sandro Mario
Publication venue: University of Cambridge
Publication date: 09/11/2017
Field of study

This thesis investigates the problem of content selection for timeline generation from single history articles. While the task of timeline generation has been addressed before, most previous approaches assume the existence of a large corpus of history articles from the same era. They exploit the fact that salient information is likely to be mentioned multiple times in such corpora. However, large resources of this kind are only available for historical events that happened in the most recent decades. In this thesis, I present approaches which can be used to create history timelines for any historical period, even for eras such as the Middle Ages, for which no large corpora of supplementary text exist. The thesis first presents a system that selects relevant historical figures in a given article, a task which is substantially easier than full timeline generation. I show that a supervised approach which uses linguistic, structural and semantic features outperforms a competitive baseline on this task. Based on the observations made in this initial study, I then develop approaches for timeline generation. I find that an unsupervised approach that takes into account the article's subject area outperforms several supervised and unsupervised baselines. A main focus of this thesis is the development of evaluation methodologies and resources, as no suitable corpora existed when work began. For the initial experiment on important historical figures, I construct a corpus of existing timelines and textual articles, and devise a method for evaluating algorithms based on this resource. For timeline generation, I present a comprehensive evaluation methodology which is based on the interpretation of the task as a special form of single-document summarisation. This methodology scores algorithms based on meaning units rather than surface similarity. Unlike previous semantic-units-based evaluation methods for summarisation, my evaluation method does not require any manual annotation of system timelines. Once an evaluation resource has been created, which involves only annotation of the input texts, new timeline generation algorithms can be tested at no cost. This crucial advantage should make my new evaluation methodology attractive for the evaluation of general single-document summaries beyond timelines. I also present an evaluation resource which is based on this methodology. It was constructed using gold-standard timelines elicited from 30 human timeline writers, and has been made publicly available. This thesis concentrates on the content selection stage of timeline generation, and leaves the surface realisation step for future work. However, my evaluation methodology is designed in such a way that it can in principle also quantify the degree to which surface realisation is successful

Apollo (Cambridge)

Improving Searchability of Automatically Transcribed Lectures Through Dynamic Language Modelling

Author: Marquard Stephen
Publication venue: 'Division of Chemical Information and Computer Sciences'
Publication date: 13/08/2016
Field of study

Recording university lectures through lecture capture systems is increasingly common. However, a single continuous audio recording is often unhelpful for users, who may wish to navigate quickly to a particular part of a lecture, or locate a specific lecture within a set of recordings. A transcript of the recording can enable faster navigation and searching. Automatic speech recognition (ASR) technologies may be used to create automated transcripts, to avoid the significant time and cost involved in manual transcription. Low accuracy of ASR-generated transcripts may however limit their usefulness. In particular, ASR systems optimized for general speech recognition may not recognize the many technical or discipline-specific words occurring in university lectures. To improve the usefulness of ASR transcripts for the purposes of information retrieval (search) and navigating within recordings, the lexicon and language model used by the ASR engine may be dynamically adapted for the topic of each lecture. A prototype is presented which uses the English Wikipedia as a semantically dense, large language corpus to generate a custom lexicon and language model for each lecture from a small set of keywords. Two strategies for extracting a topic-specific subset of Wikipedia articles are investigated: a naïve crawler which follows all article links from a set of seed articles produced by a Wikipedia search from the initial keywords, and a refinement which follows only links to articles sufficiently similar to the parent article. Pair-wise article similarity is computed from a pre-computed vector space model of Wikipedia article term scores generated using latent semantic indexing. The CMU Sphinx4 ASR engine is used to generate transcripts from thirteen recorded lectures from Open Yale Courses, using the English HUB4 language model as a reference and the two topic-specific language models generated for each lecture from Wikipedia

Cape Town University OpenUCT