9 research outputs found
CLiFF Notes: Research in the Language Information and Computation Laboratory of The University of Pennsylvania
This report takes its name from the Computational Linguistics Feedback Forum (CLIFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science, Psychology, and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. With 48 individual contributors and six projects represented, this is the largest LINC Lab collection to date, and the most diverse
Instance-based natural language generation
In recent years, ranking approaches to Natural Language Generation have become increasingly popular. They abandon the idea of generation as a deterministic decision¬
making process in favour of approaches that combine overgeneration with ranking at
some stage in processing.In this thesis, we investigate the use of instance-based ranking methods for surface
realization in Natural Language Generation. Our approach to instance-based Natural
Language Generation employs two basic components: a rule system that generates a
number of realization candidates from a meaning representation and an instance-based
ranker that scores the candidates according to their similarity to examples taken from a
training corpus. The instance-based ranker uses information retrieval methods to rank
output candidates.Our approach is corpus-based in that it uses a treebank (a subset of the Penn Treebank
II containing management succession texts) in combination with manual semantic markup to automatically produce a generation grammar. Furthermore, the corpus
is also used by the instance-based ranker. The semantic annotation of a test portion of
the compiled subcorpus serves as input to the generator.In this thesis, we develop an efficient search technique for identifying the optimal
candidate based on the A*-algorithm, detail the annotation scheme and grammar con¬
struction algorithm and show how a Rete-based production system can be used for
efficient candidate generation. Furthermore, we examine the output of the generator
and discuss issues like input coverage (completeness), fluency and faithfulness that are
relevant to surface generation in general
Using Classification To Generate Text
The IDAS natural-language generation system uses a KL-ONE type classifier to perform content determination, surface realisation, and part of text planning. Generation-by-classification allows IDAS to use a single representation and reasoning component for both domain and linguistic knowledge, which is difficult for systems based on unification or systemic generation techniques