University of Edinburgh. College of Science and Engineering. School of Informatics.
Abstract
Institute for Communicating and Collaborative Systems"All right, but apart from the sanitation, the medicine, education, wine,
public order, irrigation, roads, the fresh-water system and public health,
what have the Romans ever done for us?"
(Monty Python, The Life of Brian)
Alternative phrases identify selected elements from a set and subject them
to particular scrutiny with respect to the sentence's predicate. For instance, in
the above example, sanitation, medicine, etc. are all identified as elements in
the set of things the Romans have done for us" that should not be included in
the response to the question. They are alternative responses to the desired ones.
Alternative phrases come in a variety of constructions and perform a variety
of tasks: excluding elements (apart from), expressing preference for particular
elements (especially), and simply identifying representative examples (such as).
Not a great deal of work has been done on alternative phrases in general.
Hearst (1992) used a pattern-matching analysis of certain alternative phrases
to learn hyponyms from unannotated corpora. Also, a few examples from a
subset of alternative phrases, called exceptive phrases, have been studied, most
recently, by von Fintel (1993) and Hoeksema (1995). But not all constructions
are amenable to pattern-matching techniques, and the work on exceptive phrases
focuses on some very specific semantic points. The focus of this thesis is to present
a general program for analyzing a wide variety of alternative phrases including
their presuppositional and anaphoric properties.
I perform my analyses in Combinatory Categorial Grammar, a lexicalized
formalism. The semantic aspects of the analysis benefit greatly from the concept
of alternative sets, sets of propositions that differ in one or more argument
(Karttunen and Peters, 1979; Rooth, 1985, 1992; Prevost and Steedman, 1994;
Steedman, 2000a). In addition, elegant solutions are made possible by separating
the semantics into assertion and presupposition (Stalnaker, 1974; Karttunen and
Peters, 1979; Stone and Doran, 1997; Stone and Webber, 1998; Webber et al.,
1999b)| with each performing quite different tasks.
My second goal is to demonstrate the practicality and importance of this analysis to real systems. Although it is relevant to many practical applications,
I will focus primarily on natural language information retrieval (NLIR) as a case
study. In such a domain, queries like Where can I find other web browsers than
Netscape for download? and Where can I find shoes made by Bufialino, such
as the Bushwackers? are often observed. I review several techniques for NLIR
and demonstrate that implementations of those techniques perform poorly on
such queries. I show that understanding alternative phrases can enable simple
techniques which greatly improve precision.
To bridge the gap between these goals, I present Grok, a modular natural
language system. Several general NLP issues necessary to support my linguistic
analysis are discussed: anaphora resolution, processing of presuppositions, interface
to knowledge representation, and the creation of a wide-coverage lexicon.
Special attention is paid to the lexicon, which is a combination of a hand-built
and an acquired lexicon