30,984 research outputs found
Recommended from our members
Learning for semantic parsing using statistical syntactic parsing techniques
textNatural language understanding is a sub-field of natural language processing, which builds automated systems to understand natural language. It is such an ambitious task that it sometimes is referred to as an AI-complete problem, implying that its difficulty is equivalent to solving the central artificial intelligence problem -- making computers as intelligent as people. Despite its complexity, natural language understanding continues to be a fundamental problem in natural language processing in terms of its theoretical and empirical importance. In recent years, startling progress has been made at different levels of natural language processing tasks, which provides great opportunity for deeper natural language understanding. In this thesis, we focus on the task of semantic parsing, which maps a natural language sentence into a complete, formal meaning representation in a meaning representation language. We present two novel state-of-the-art learned syntax-based semantic parsers using statistical syntactic parsing techniques, motivated by the following two reasons. First, the syntax-based semantic parsing is theoretically well-founded in computational semantics. Second, adopting a syntax-based approach allows us to directly leverage the enormous progress made in statistical syntactic parsing. The first semantic parser, Scissor, adopts an integrated syntactic-semantic parsing approach, in which a statistical syntactic parser is augmented with semantic parameters to produce a semantically-augmented parse tree (SAPT). This integrated approach allows both syntactic and semantic information to be available during parsing time to obtain an accurate combined syntactic-semantic analysis. The performance of Scissor is further improved by using discriminative reranking for incorporating non-local features. The second semantic parser, SynSem, exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional semantic interpretation. This pipeline approach allows semantic parsing to conveniently leverage the most recent progress in statistical syntactic parsing. We report experimental results on two real applications: an interpreter for coaching instructions in robotic soccer and a natural-language database interface, showing that the improvement of Scissor and SynSem over other systems is mainly on long sentences, where the knowledge of syntax given in the form of annotated SAPTs or syntactic parses from an existing parser helps semantic composition. SynSem also significantly improves results with limited training data, and is shown to be robust to syntactic errors.Computer Science
Exploiting multi-word units in statistical parsing and generation
Syntactic parsing is an important prerequisite for many natural language processing (NLP) applications. The task refers to the process of generating the tree of syntactic nodes with associated phrase category labels corresponding to a sentence.
Our objective is to improve upon statistical models for syntactic parsing by leveraging multi-word units (MWUs) such as named entities and other classes of multi-word expressions. Multi-word units are phrases that are lexically, syntactically and/or semantically
idiosyncratic in that they are to at least some degree
non-compositional. If such units are identified prior to, or as part of, the parsing process their boundaries can be exploited as islands of certainty within the very large (and often highly ambiguous) search space. Luckily, certain types of MWUs can be readily identified in an automatic fashion (using a variety of techniques) to a near-human
level of accuracy.
We carry out a number of experiments which integrate knowledge about different classes of MWUs in several commonly deployed parsing architectures. In a supplementary set of experiments, we attempt to exploit these units in the converse operation to statistical parsing---statistical generation (in our case, surface realisation from Lexical-Functional Grammar f-structures). We show that, by exploiting knowledge about MWUs, certain classes of parsing and generation decisions are more accurately resolved. This translates to improvements in overall parsing and generation results which, although modest, are demonstrably significant
Treebank-Based Deep Grammar Acquisition for French Probabilistic Parsing Resources
Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in wide-coverage grammars automatically obtained from treebanks. In particular, recent years have seen a move
towards acquiring deep (LFG, HPSG and CCG) resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such
as syntactic dependency trees. As is often the case in early pioneering work in natural language processing, English has been the focus of attention in the first efforts towards acquiring treebank-based deep-grammar resources, followed by treatments of, for example, German, Japanese, Chinese and Spanish. However, to date no comparable large-scale automatically acquired deep-grammar resources have been obtained for French. The goal of the research presented in this thesis is to develop, implement, and evaluate treebank-based deep-grammar acquisition techniques for French. Along the way towards achieving this goal, this thesis presents the derivation of a new treebank for French from the Paris 7 Treebank, the Modified French Treebank, a cleaner, more coherent treebank with several transformed structures and new linguistic analyses. Statistical parsers trained on this data outperform those trained on the original Paris 7 Treebank, which has five times the amount of data. The Modified French Treebank is the data source used for the development of treebank-based automatic deep-grammar acquisition for LFG parsing resources
for French, based on an f-structure annotation algorithm for this treebank. LFG CFG-based parsing architectures are then extended and tested, achieving a competitive best f-score of 86.73% for all features. The CFG-based parsing architectures are then complemented with an alternative dependency-based statistical parsing approach, obviating the CFG-based parsing step, and instead directly
parsing strings into f-structures
Harmonic analysis of music using combinatory categorial grammar
FP7 grant 249520 (GRAMPLUS)Various patterns of the organization of Western tonal music exhibit hierarchical structure,
among them the harmonic progressions underlying melodies and the metre underlying
rhythmic patterns. Recognizing these structures is an important part of unconscious
human cognitive processing of music. Since the prosody and syntax of natural
languages are commonly analysed with similar hierarchical structures, it is reasonable
to expect that the techniques used to identify these structures automatically in natural
language might also be applied to the automatic interpretation of music.
In natural language processing (NLP), analysing the syntactic structure of a sentence
is prerequisite to semantic interpretation. The analysis is made difficult by the
high degree of ambiguity in even moderately long sentences. In music, a similar sort of
structural analysis, with a similar degree of ambiguity, is fundamental to tasks such as
key identification and score transcription. These and other tasks depend on harmonic
and rhythmic analyses. There is a long history of applying linguistic analysis techniques
to musical analysis. In recent years, statistical modelling, in particular in the
form of probabilistic models, has become ubiquitous in NLP for large-scale practical
analysis of language. The focus of the present work is the application of statistical
parsing to automatic harmonic analysis of music.
This thesis demonstrates that statistical parsing techniques, adapted from NLP with
little modification, can be successfully applied to recovering the harmonic structure
underlying music. It shows first how a type of formal grammar based on one used
for linguistic syntactic processing, Combinatory Categorial Grammar (CCG), can be
used to analyse the hierarchical structure of chord sequences. I introduce a formal
language similar to first-order predicate logical to express the hierarchical tonal harmonic
relationships between chords. The syntactic grammar formalism then serves as
a mechanism to map an unstructured chord sequence onto its structured analysis.
In NLP, the high degree of ambiguity of the analysis means that a parser must
consider a huge number of possible structures. Chart parsing provides an efficient
mechanism to explore them. Statistical models allow the parser to use information
about structures seen before in a training corpus to eliminate improbable interpretations
early on in the process and to rank the final analyses by plausibility. To apply the
same techniques to harmonic analysis of chord sequences, a corpus of tonal jazz chord
sequences annotated by hand with harmonic analyses is constructed. Two statistical
parsing techniques are adapted to the present task and evaluated on their success at recovering the annotated structures. The experiments show that parsing using a statistical
model of syntactic derivations is more successful than a Markovian baseline
model at recovering harmonic structure. In addition, the practical technique of statistical
supertagging serves to speed up parsing without any loss in accuracy.
This approach to recovering harmonic structure can be extended to the analysis of
performance data symbolically represented as notes. Experiments using some simple
proof-of-concept extensions of the above parsing models demonstrate one probabilistic
approach to this. The results reported provide a baseline for future work on the task of
harmonic analysis of performances
A Robust Parser-Interpreter for Jazz Chord Sequences
Hierarchical structure similar to that associated with prosody and syntax in language can be identified in the rhythmic and harmonic progressions that underlie Western tonal music. Analysing such musical struc-ture resembles natural language parsing: it requires the derivation of an underlying interpretation from an un-structured sequence of highly ambiguous elements— in the case of music, the notes. The task here is not merely to decide whether the sequence is grammati-cal, but rather to decide which among a large number of analyses it has. An analysis of this sort is a part of the cognitive processing performed by listeners familiar with a musical idiom, whether musically trained or not. Our focus is on the analysis of the structure of ex-pectations and resolutions created by harmonic progres-sions. Building on previous work, we define a theory of tonal harmonic progression, which plays a role analo-gous to semantics in language. Our parser uses a formal grammar of jazz chord sequences, of a kind widely used for natural language processing (NLP), to map music, in the form of chord sequences used by performers, onto a representation of the structured relationships between chords. It uses statistical modelling techniques used for wide-coverage parsing in NLP to make practical pars-ing feasible in the face of considerable ambiguity in the grammar. Using machine learning over a small corpus of jazz chord sequences annotated with harmonic anal-yses, we show that grammar-based musical interpreta-tion using simple statistical parsing models is more ac-curate than a baseline HMM. The experiment demon-strates that statistical techniques adapted from NLP can be profitably applied to the analysis of harmonic struc-ture
- …