66,397 research outputs found
Rhetorical relations for information retrieval
Typically, every part in most coherent text has some plausible reason for its
presence, some function that it performs to the overall semantics of the text.
Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts
of a text are linked to each other. Knowledge about this socalled discourse
structure has been applied successfully to several natural language processing
tasks. This work studies the use of rhetorical relations for Information
Retrieval (IR): Is there a correlation between certain rhetorical relations and
retrieval performance? Can knowledge about a document's rhetorical relations be
useful to IR? We present a language model modification that considers
rhetorical relations when estimating the relevance of a document to a query.
Empirical evaluation of different versions of our model on TREC settings shows
that certain rhetorical relations can benefit retrieval effectiveness notably
(> 10% in mean average precision over a state-of-the-art baseline)
Centering, Anaphora Resolution, and Discourse Structure
Centering was formulated as a model of the relationship between attentional
state, the form of referring expressions, and the coherence of an utterance
within a discourse segment (Grosz, Joshi and Weinstein, 1986; Grosz, Joshi and
Weinstein, 1995). In this chapter, I argue that the restriction of centering to
operating within a discourse segment should be abandoned in order to integrate
centering with a model of global discourse structure. The within-segment
restriction causes three problems. The first problem is that centers are often
continued over discourse segment boundaries with pronominal referring
expressions whose form is identical to those that occur within a discourse
segment. The second problem is that recent work has shown that listeners
perceive segment boundaries at various levels of granularity. If centering
models a universal processing phenomenon, it is implausible that each listener
is using a different centering algorithm.The third issue is that even for
utterances within a discourse segment, there are strong contrasts between
utterances whose adjacent utterance within a segment is hierarchically recent
and those whose adjacent utterance within a segment is linearly recent. This
chapter argues that these problems can be eliminated by replacing Grosz and
Sidner's stack model of attentional state with an alternate model, the cache
model. I show how the cache model is easily integrated with the centering
algorithm, and provide several types of data from naturally occurring
discourses that support the proposed integrated model. Future work should
provide additional support for these claims with an examination of a larger
corpus of naturally occurring discourses.Comment: 35 pages, uses elsart12, lingmacros, named, psfi
Intelligent indexing of crime scene photographs
The Scene of Crime Information System's automatic image-indexing prototype goes beyond extracting keywords and syntactic relations from captions. The semantic information it gathers gives investigators an intuitive, accurate way to search a database of cases for specific photographic evidence. Intelligent, automatic indexing and retrieval of crime scene photographs is one of the main functions of SOCIS, our research prototype developed within the Scene of Crime Information System project. The prototype, now in its final development and evaluation phase, applies advanced natural language processing techniques to text-based image indexing and retrieval to tackle crime investigation needs effectively and efficiently
Applying Science Models for Search
The paper proposes three different kinds of science models as value-added
services that are integrated in the retrieval process to enhance retrieval
quality. The paper discusses the approaches Search Term Recommendation,
Bradfordizing and Author Centrality on a general level and addresses
implementation issues of the models within a real-life retrieval environment.Comment: 14 pages, 3 figures, ISI 201
Foreground and background text in retrieval
Our hypothesis is that certain clauses have foreground functions in text,
while other clauses have background functions and that these functions are
expressed or reflected in the syntactic structure of the clause.
Presumably these clauses will have differing utility for automatic
approaches to text understanding; a summarization system might want to
utilize background clauses to capture commonalities between numbers of
documents while an indexing system might use foreground clauses in order to
capture specific characteristics of a certain document
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
We present two novel models of document coherence and their application to
information retrieval (IR). Both models approximate document coherence using
discourse entities, e.g. the subject or object of a sentence. Our first model
views text as a Markov process generating sequences of discourse entities
(entity n-grams); we use the entropy of these entity n-grams to approximate the
rate at which new information appears in text, reasoning that as more new words
appear, the topic increasingly drifts and text coherence decreases. Our second
model extends the work of Guinaudeau & Strube [28] that represents text as a
graph of discourse entities, linked by different relations, such as their
distance or adjacency in text. We use several graph topology metrics to
approximate different aspects of the discourse flow that can indicate
coherence, such as the average clustering or betweenness of discourse entities
in text. Experiments with several instantiations of these models show that: (i)
our models perform on a par with two other well-known models of text coherence
even without any parameter tuning, and (ii) reranking retrieval results
according to their coherence scores gives notable performance gains, confirming
a relation between document coherence and relevance. This work contributes two
novel models of document coherence, the application of which to IR complements
recent work in the integration of document cohesiveness or comprehensibility to
ranking [5, 56]
Recommended from our members
Upbeat and quirky with a bit of a build: Interpretive repertories in creative music search
Pre-existing commercial music is widely used to accom-pany moving images in films, TV commercials and com-puter games. This process is known as music synchronisa-tion. Professionals are employed by rights holders and film makers to perform creative music searches on large catalogues to find appropriate pieces of music for syn-chronisation. This paper discusses a Discourse Analysis of thirty interview texts related to the process. Coded ex-amples are presented and discussed. Four interpretive repertoires are identified: the Musical Repertoire, the Soundtrack Repertoire, the Business Repertoire and the Cultural Repertoire. These ways of talking about music are adopted by all of the community regardless of their interest as Music Owner or Music User.
Music is shown to have multi-variate and sometimes conflicting meanings within this community which are dynamic and negotiated. This is related to a theoretical feedback model of communication and meaning making which proposes that Owners and Users employ their own and shared ways of talking and thinking about music and its context to determine musical meaning. The value to the music information retrieval community is to inform system design from a user information needs perspective
- …