50 research outputs found

    Evaluating Parsers with Dependency Constraints

    Get PDF
    Many syntactic parsers now score over 90% on English in-domain evaluation, but the remaining errors have been challenging to address and difficult to quantify. Standard parsing metrics provide a consistent basis for comparison between parsers, but do not illuminate what errors remain to be addressed. This thesis develops a constraint-based evaluation for dependency and Combinatory Categorial Grammar (CCG) parsers to address this deficiency. We examine the constrained and cascading impact, representing the direct and indirect effects of errors on parsing accuracy. This identifies errors that are the underlying source of problems in parses, compared to those which are a consequence of those problems. Kummerfeld et al. (2012) propose a static post-parsing analysis to categorise groups of errors into abstract classes, but this cannot account for cascading changes resulting from repairing errors, or limitations which may prevent the parser from applying a repair. In contrast, our technique is based on enforcing the presence of certain dependencies during parsing, whilst allowing the parser to choose the remainder of the analysis according to its grammar and model. We draw constraints for this process from gold-standard annotated corpora, grouping them into abstract error classes such as NP attachment, PP attachment, and clause attachment. By applying constraints from each error class in turn, we can examine how parsers respond when forced to correctly analyse each class. We show how to apply dependency constraints in three parsers: the graph-based MSTParser (McDonald and Pereira, 2006) and the transition-based ZPar (Zhang and Clark, 2011b) dependency parsers, and the C&C CCG parser (Clark and Curran, 2007b). Each is widely-used and influential in the field, and each generates some form of predicate-argument dependencies. We compare the parsers, identifying common sources of error, and differences in the distribution of errors between constrained and cascaded impact. Our work allows us to contrast the implementations of each parser, and how they respond to constraint application. Using our analysis, we experiment with new features for dependency parsing, which encode the frequency of proposed arcs in large-scale corpora derived from scanned books. These features are inspired by and extend on the work of Bansal and Klein (2011). We target these features at the most notable errors, and show how they address some, but not all of the difficult attachments across newswire and web text. CCG parsing is particularly challenging, as different derivations do not always generate different dependencies. We develop dependency hashing to address semantically redundant parses in n-best CCG parsing, and demonstrate its necessity and effectiveness. Dependency hashing substantially improves the diversity of n-best CCG parses, and improves a CCG reranker when used for creating training and test data. We show the intricacies of applying constraints to C&C, and describe instances where applying constraints causes the parser to produce a worse analysis. These results illustrate how algorithms which are relatively straightforward for constituency and dependency parsers are non-trivial to implement in CCG. This work has explored dependencies as constraints in dependency and CCG parsing. We have shown how dependency hashing can efficiently eliminate semantically redundant CCG n-best parses, and presented a new evaluation framework based on enforcing the presence of dependencies in the output of the parser. By otherwise allowing the parser to proceed as it would have, we avoid the assumptions inherent in other work. We hope this work will provide insights into the remaining errors in parsing, and target efforts to address those errors, creating better syntactic analysis for downstream applications

    Graphical Models with Structured Factors, Neural Factors, and Approximation-aware Training

    Get PDF
    This thesis broadens the space of rich yet practical models for structured prediction. We introduce a general framework for modeling with four ingredients: (1) latent variables, (2) structural constraints, (3) learned (neural) feature representations of the inputs, and (4) training that takes the approximations made during inference into account. The thesis builds up to this framework through an empirical study of three NLP tasks: semantic role labeling, relation extraction, and dependency parsing -- obtaining state-of-the-art results on the former two. We apply the resulting graphical models with structured and neural factors, and approximation-aware learning to jointly model part-of-speech tags, a syntactic dependency parse, and semantic roles in a low-resource setting where the syntax is unobserved. We present an alternative view of these models as neural networks with a topology inspired by inference on graphical models that encode our intuitions about the data

    Combined distributional and logical semantics

    Get PDF
    Understanding natural language sentences requires interpreting words, and combining the meanings of words into the meanings of sentences. Despite much work on lexical and compositional semantics individually, existing approaches are unlikely to offer a complete solution. This thesis introduces a new approach, which combines the benefits of distributional lexical semantics and logical compositional semantics. Linguistic theories of compositional semantics have shown how logical forms can be built for sentences, and how to represent semantic operators such as negatives, quantifiers and modals. However, computational implementations of such theories have shown poor performance on applications, mainly due to a reliance on incomplete hand-built ontologies for the meanings of content words. Conversely, distributional semantics has been shown to be effective in learning the representations of content words based on collocations in large unlabelled corpora, but there are major outstanding challenges in representing function words and building representations for sentences. I introduce a new model which captures the main advantages of logical and distributional approaches. The proposal closely follows formal semantics, except for changing the definitions of content words. In traditional formal semantics, each word would express a different symbol. Instead, I allow multiple words to express the same symbol, corresponding to underlying concepts. For example, both the verb write and the noun author can be made to express the same relation. These symbols can be learnt by clustering symbols based on distributional statistics—for example, write and author will share many similar arguments. Crucially, the clustering means that the representations are symbolic, so can easily be incorporated into standard logical approaches. The simple model proves insufficient, and I develop several extensions. I develop an unsupervised probabilistic model of ambiguity, and show how this model can be built into compositional derivations to produce a distribution over logical forms. The flat clustering approach does not model relations between concepts, for example that buying implies owning. Instead, I show how to build graph structures over the clusters, which allows such inferences. I also explore if the abstract concepts can be generalized cross-lingually, for example mapping French verb ecrire to the same cluster as the English verb write. The systems developed show good performance on question answering and entailment tasks, and are capable of both sophisticated multi-sentence inferences involving quantifiers, and subtle reasoning about lexical semantics. These results show that distributional and formal logical semantics are not mutually exclusive, and that a combined model can be built that captures the advantages of each