6,421 research outputs found
USFD at KBP 2011: Entity Linking, Slot Filling and Temporal Bounding
This paper describes the University of Sheffield's entry in the 2011 TAC KBP
entity linking and slot filling tasks. We chose to participate in the
monolingual entity linking task, the monolingual slot filling task and the
temporal slot filling tasks. We set out to build a framework for
experimentation with knowledge base population. This framework was created, and
applied to multiple KBP tasks. We demonstrated that our proposed framework is
effective and suitable for collaborative development efforts, as well as useful
in a teaching environment. Finally we present results that, while very modest,
provide improvements an order of magnitude greater than our 2010 attempt.Comment: Proc. Text Analysis Conference (2011
On the Feasibility of Automated Detection of Allusive Text Reuse
The detection of allusive text reuse is particularly challenging due to the
sparse evidence on which allusive references rely---commonly based on none or
very few shared words. Arguably, lexical semantics can be resorted to since
uncovering semantic relations between words has the potential to increase the
support underlying the allusion and alleviate the lexical sparsity. A further
obstacle is the lack of evaluation benchmark corpora, largely due to the highly
interpretative character of the annotation process. In the present paper, we
aim to elucidate the feasibility of automated allusion detection. We approach
the matter from an Information Retrieval perspective in which referencing texts
act as queries and referenced texts as relevant documents to be retrieved, and
estimate the difficulty of benchmark corpus compilation by a novel
inter-annotator agreement study on query segmentation. Furthermore, we
investigate to what extent the integration of lexical semantic information
derived from distributional models and ontologies can aid retrieving cases of
allusive reuse. The results show that (i) despite low agreement scores, using
manual queries considerably improves retrieval performance with respect to a
windowing approach, and that (ii) retrieval performance can be moderately
boosted with distributional semantics
Recognizing and organizing opinions expressed in the world press
Journal ArticleTomorrow's question answering systems will need to have the ability to process information about beliefs, opinions, and evaluations-the perspective of an agent. Answers to many simple factual questions-even yes/no questions-are affected by the perspective of the information source. For example, a questioner asking question (1) might be interested to know that, in general, sources in European and North American governments tend to answer "no" to question (1), while sources in African governments tend to answer "yes:
Annotation Studio: multimedia text annotation for students
Annotation Studio will be a web-based application that actively engages students in interpreting literary texts and other humanities documents. While strengthening students' new media literacies, this open source web application will develop traditional humanistic skills including close reading, textual analysis, persuasive writing, and critical thinking. Initial features will include: 1) easy-to-use annotation tools that facilitate linking and comparing primary texts with multi-media source, variation, and adaptation documents; 2) sharable collections of multimedia materials prepared by faculty and student users; 3) multiple filtering and display mechanisms for texts, written annotations, and multimedia annotations; 4) collaboration functionality; and 5) multimedia composition tools. Products of the start-up phase will include a working prototype, feedback from students and instructors, and a white paper summarizing lessons learned
Using term clouds to represent segment-level semantic content of podcasts
Spoken audio, like any time-continuous medium, is notoriously difficult to browse or skim without support of an interface providing semantically annotated jump points to signal the user where to listen in. Creation of time-aligned metadata by human annotators is prohibitively expensive, motivating the investigation of representations of segment-level semantic content based on transcripts
generated by automatic speech recognition (ASR). This paper
examines the feasibility of using term clouds to provide users with a structured representation of the semantic content of podcast episodes. Podcast episodes are visualized as a series of sub-episode segments, each represented by a term cloud derived from a transcript
generated by automatic speech recognition (ASR). Quality of
segment-level term clouds is measured quantitatively and their utility is investigated using a small-scale user study based on human labeled segment boundaries. Since the segment-level clouds generated from ASR-transcripts prove useful, we examine an adaptation of text tiling techniques to speech in order to be able to generate segments as part of a completely automated indexing and structuring system for browsing of spoken audio. Results demonstrate that the segments generated are comparable with human selected segment boundaries
- …