56,718 research outputs found

    From surface dependencies towards deeper semantic representations [Semantic representations]

    Get PDF
    In the past, a divide could be seen between ā€™deepā€™ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TĆ¼Ba-D/Z treebank and Versley (2005)Ā“s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)Ā“s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TĆ¼Ba-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved

    Wide-coverage deep statistical parsing using automatic dependency structure annotation

    Get PDF
    A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing ā€œdeepā€ hand-crafted wide-coverage with ā€œshallowā€ treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

    Improving language mapping in clinical fMRI through assessment of grammar.

    Get PDF
    IntroductionBrain surgery in the language dominant hemisphere remains challenging due to unintended post-surgical language deficits, despite using pre-surgical functional magnetic resonance (fMRI) and intraoperative cortical stimulation. Moreover, patients are often recommended not to undergo surgery if the accompanying risk to language appears to be too high. While standard fMRI language mapping protocols may have relatively good predictive value at the group level, they remain sub-optimal on an individual level. The standard tests used typically assess lexico-semantic aspects of language, and they do not accurately reflect the complexity of language either in comprehension or production at the sentence level. Among patients who had left hemisphere language dominance we assessed which tests are best at activating language areas in the brain.MethodWe compared grammar tests (items testing word order in actives and passives, wh-subject and object questions, relativized subject and object clauses and past tense marking) with standard tests (object naming, auditory and visual responsive naming), using pre-operative fMRI. Twenty-five surgical candidates (13 females) participated in this study. Sixteen patients presented with a brain tumor, and nine with epilepsy. All participants underwent two pre-operative fMRI protocols: one including CYCLE-N grammar tests (items testing word order in actives and passives, wh-subject and object questions, relativized subject and object clauses and past tense marking); and a second one with standard fMRI tests (object naming, auditory and visual responsive naming). fMRI activations during performance in both protocols were compared at the group level, as well as in individual candidates.ResultsThe grammar tests generated more volume of activation in the left hemisphere (left/right angular gyrus, right anterior/posterior superior temporal gyrus) and identified additional language regions not shown by the standard tests (e.g., left anterior/posterior supramarginal gyrus). The standard tests produced more activation in left BA 47. Ten participants had more robust activations in the left hemisphere in the grammar tests and two in the standard tests. The grammar tests also elicited substantial activations in the right hemisphere and thus turned out to be superior at identifying both right and left hemisphere contribution to language processing.ConclusionThe grammar tests may be an important addition to the standard pre-operative fMRI testing

    The Unsupervised Acquisition of a Lexicon from Continuous Speech

    Get PDF
    We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor
    • ā€¦
    corecore