6,421 research outputs found

    USFD at KBP 2011: Entity Linking, Slot Filling and Temporal Bounding

    Full text link
    This paper describes the University of Sheffield's entry in the 2011 TAC KBP entity linking and slot filling tasks. We chose to participate in the monolingual entity linking task, the monolingual slot filling task and the temporal slot filling tasks. We set out to build a framework for experimentation with knowledge base population. This framework was created, and applied to multiple KBP tasks. We demonstrated that our proposed framework is effective and suitable for collaborative development efforts, as well as useful in a teaching environment. Finally we present results that, while very modest, provide improvements an order of magnitude greater than our 2010 attempt.Comment: Proc. Text Analysis Conference (2011

    On the Feasibility of Automated Detection of Allusive Text Reuse

    Full text link
    The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely---commonly based on none or very few shared words. Arguably, lexical semantics can be resorted to since uncovering semantic relations between words has the potential to increase the support underlying the allusion and alleviate the lexical sparsity. A further obstacle is the lack of evaluation benchmark corpora, largely due to the highly interpretative character of the annotation process. In the present paper, we aim to elucidate the feasibility of automated allusion detection. We approach the matter from an Information Retrieval perspective in which referencing texts act as queries and referenced texts as relevant documents to be retrieved, and estimate the difficulty of benchmark corpus compilation by a novel inter-annotator agreement study on query segmentation. Furthermore, we investigate to what extent the integration of lexical semantic information derived from distributional models and ontologies can aid retrieving cases of allusive reuse. The results show that (i) despite low agreement scores, using manual queries considerably improves retrieval performance with respect to a windowing approach, and that (ii) retrieval performance can be moderately boosted with distributional semantics

    Recognizing and organizing opinions expressed in the world press

    Get PDF
    Journal ArticleTomorrow's question answering systems will need to have the ability to process information about beliefs, opinions, and evaluations-the perspective of an agent. Answers to many simple factual questions-even yes/no questions-are affected by the perspective of the information source. For example, a questioner asking question (1) might be interested to know that, in general, sources in European and North American governments tend to answer "no" to question (1), while sources in African governments tend to answer "yes:

    Annotation Studio: multimedia text annotation for students

    Get PDF
    Annotation Studio will be a web-based application that actively engages students in interpreting literary texts and other humanities documents. While strengthening students' new media literacies, this open source web application will develop traditional humanistic skills including close reading, textual analysis, persuasive writing, and critical thinking. Initial features will include: 1) easy-to-use annotation tools that facilitate linking and comparing primary texts with multi-media source, variation, and adaptation documents; 2) sharable collections of multimedia materials prepared by faculty and student users; 3) multiple filtering and display mechanisms for texts, written annotations, and multimedia annotations; 4) collaboration functionality; and 5) multimedia composition tools. Products of the start-up phase will include a working prototype, feedback from students and instructors, and a white paper summarizing lessons learned

    Using term clouds to represent segment-level semantic content of podcasts

    Get PDF
    Spoken audio, like any time-continuous medium, is notoriously difficult to browse or skim without support of an interface providing semantically annotated jump points to signal the user where to listen in. Creation of time-aligned metadata by human annotators is prohibitively expensive, motivating the investigation of representations of segment-level semantic content based on transcripts generated by automatic speech recognition (ASR). This paper examines the feasibility of using term clouds to provide users with a structured representation of the semantic content of podcast episodes. Podcast episodes are visualized as a series of sub-episode segments, each represented by a term cloud derived from a transcript generated by automatic speech recognition (ASR). Quality of segment-level term clouds is measured quantitatively and their utility is investigated using a small-scale user study based on human labeled segment boundaries. Since the segment-level clouds generated from ASR-transcripts prove useful, we examine an adaptation of text tiling techniques to speech in order to be able to generate segments as part of a completely automated indexing and structuring system for browsing of spoken audio. Results demonstrate that the segments generated are comparable with human selected segment boundaries
    corecore