83 research outputs found
Approximate text generation from non-hierarchical representations in a declarative framework
This thesis is on Natural Language Generation. It describes a linguistic realisation
system that translates the semantic information encoded in a conceptual graph into an
English language sentence. The use of a non-hierarchically structured semantic representation (conceptual graphs) and an approximate matching between semantic structures allows us to investigate a more general version of the sentence generation problem
where one is not pre-committed to a choice of the syntactically prominent elements in
the initial semantics. We show clearly how the semantic structure is declaratively related to linguistically motivated syntactic representation â we use D-Tree Grammars
which stem from work on Tree-Adjoining Grammars. The declarative specification of
the mapping between semantics and syntax allows for different processing strategies
to be exploited. A number of generation strategies have been considered: a pure topdown strategy and a chart-based generation technique which allows partially successful
computations to be reused in other branches of the search space. Having a generator
with increased paraphrasing power as a consequence of using non-hierarchical input
and approximate matching raises the issue whether certain 'better' paraphrases can be
generated before others. We investigate preference-based processing in the context of
generation
Integrated supertagging and parsing
EuroMatrixPlus project funded by the European Commission, 7th Framework ProgrammeParsing is the task of assigning syntactic or semantic structure to a natural language
sentence. This thesis focuses on syntactic parsing with Combinatory Categorial Grammar
(CCG; Steedman 2000). CCG allows incremental processing, which is essential
for speech recognition and some machine translation models, and it can build semantic
structure in tandem with syntactic parsing. Supertagging solves a subset of the parsing
task by assigning lexical types to words in a sentence using a sequence model. It has
emerged as a way to improve the efficiency of full CCG parsing (Clark and Curran,
2007) by reducing the parserâs search space. This has been very successful and it is the
central theme of this thesis.
We begin by an analysis of how efficiency is being traded for accuracy in supertagging.
Pruning the search space by supertagging is inherently approximate and to contrast
this we include A* in our analysis, a classic exact search technique. Interestingly,
we find that combining the two methods improves efficiency but we also demonstrate
that excessive pruning by a supertagger significantly lowers the upper bound on accuracy
of a CCG parser.
Inspired by this analysis, we design a single integrated model with both supertagging
and parsing features, rather than separating them into distinct models chained
together in a pipeline. To overcome the resulting complexity, we experiment with both
loopy belief propagation and dual decomposition approaches to inference, the first empirical
comparison of these algorithms that we are aware of on a structured natural
language processing problem.
Finally, we address training the integrated model. We adopt the idea of optimising
directly for a task-specific metric such as is common in other areas like statistical
machine translation. We demonstrate how a novel dynamic programming algorithm
enables us to optimise for F-measure, our task-specific evaluation metric, and experiment
with approximations, which prove to be excellent substitutions.
Each of the presented methods improves over the state-of-the-art in CCG parsing.
Moreover, the improvements are additive, achieving a labelled/unlabelled dependency
F-measure on CCGbank of 89.3%/94.0% with gold part-of-speech tags, and
87.2%/92.8% with automatic part-of-speech tags, the best reported results for this task
to date. Our techniques are general and we expect them to apply to other parsing problems,
including lexicalised tree adjoining grammar and context-free grammar parsing
Two characterisation results of multiple context-free grammars and their application to parsing
In the first part of this thesis, a Chomsky-SchĂŒtzenberger characterisation and an automaton characterisation of multiple context-free grammars are proved. Furthermore, a framework for approximation of automata with storage is described. The second part develops each of the three theoretical results into a parsing algorithm
Recommended from our members
AXEL: A framework to deal with ambiguity in three-noun compounds
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 6/12/2010.Cognitive Linguistics has been widely used to deal with the ambiguity generated by words in combination. Although this domain offers many solutions to address this challenge, not all of them can be implemented in a computational environment. The Dynamic Construal of Meaning framework is argued to have this ability because it describes an intrinsic degree of association of meanings, which in turn, can be translated into computational programs. A limitation towards a computational approach, however, has been the lack of syntactic parameters. This research argues that this limitation could be overcome with the aid of the Generative Lexicon Theory (GLT). Specifically, this dissertation formulated possible means to marry the GLT and Cognitive Linguistics in a novel rapprochement between the two.
This bond between opposing theories provided the means to design a computational template (the AXEL System) by realising syntax and semantics at software levels. An instance of the AXEL system was created using a Design Research approach. Planned iterations were involved in the development to improve artefact performance. Such iterations boosted performance-improving, which accounted for the degree of association of meanings in three-noun compounds.
This dissertation delivered three major contributions on the brink of a so-called turning point in Computational Linguistics (CL). First, the AXEL system was used to disclose hidden lexical patterns on ambiguity. These patterns are difficult, if not impossible, to be identified without automatic techniques. This research claimed that these patterns can assist audiences of linguists to review lexical knowledge on a software-based viewpoint.
Following linguistic awareness, the second result advocated for the adoption of improved resources by decreasing electronic space of Sense Enumerative Lexicons (SELs). The AXEL system deployed the generation of âat the moment of useâ interpretations, optimising the way the space is needed for lexical storage.
Finally, this research introduced a subsystem of metrics to characterise an ambiguous degree of association of three-noun compounds enabling ranking methods. Weighing methods delivered mechanisms of classification of meanings towards Word Sense Disambiguation (WSD). Overall these results attempted to tackle difficulties in understanding studies of Lexical Semantics via software tools
Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme
Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie
- âŠ