10,227 research outputs found

    A corpus study of creating rule-based enhanced universal dependencies for German

    Get PDF
    In this paper, we present a first attempt at enriching German Universal Dependencies (UD) treebanks with enhanced dependencies. Similarly to the converter for English (Schuster and Manning, 2016), we develop a rule-based system for deriving enhanced dependencies from the basic layer, covering three linguistic phenomena: relative clauses, coordination, and raising/control. For quality control, we manually correct or validate a set of 196 sentences, finding that around 90% of added relations are correct. Our data analysis reveals that difficulties arise mainly due to inconsistencies in the basic layer annotations. We show that the English system is in general applicable to German data, but that adapting to the particularities of the German treebanks and language increases precision and recall by up to 10%. Comparing the application of our converter on gold standard dependencies vs. automatic parses, we find that F1 drops by around 10% in the latter setting due to error propagation. Finally, an enhanced UD parser trained on a converted treebank performs poorly when evaluated against our annotations, indicating that more work remains to be done to create gold standard enhanced German treebanks

    Cognitive constraints and island effects

    Get PDF
    Competence-based theories of island effects play a central role in generative grammar, yet the graded nature of many syntactic islands has never been properly accounted for. Categorical syntactic accounts of island effects have persisted in spite of a wealth of data suggesting that island effects are not categorical in nature and that nonstructural manipulations that leave island structures intact can radically alter judgments of island violations. We argue here, building on work by Paul Deane, Robert Kluender, and others, that processing factors have the potential to account for this otherwise unexplained variation in acceptability judgments. We report the results of self-paced reading experiments and controlled acceptability studies that explore the relationship between processing costs and judgments of acceptability. In each of the three self-paced reading studies, the data indicate that the processing cost of different types of island violations can be significantly reduced to a degree comparable to that of nonisland filler-gap constructions by manipulating a single nonstructural factor. Moreover, this reduction in processing cost is accompanied by significant improvements in acceptability. This evidence favors the hypothesis that island-violating constructions involve numerous processing pressures that aggregate to drive processing difficulty above a threshold, resulting in unacceptability. We examine the implications of these findings for the grammar of filler-gap dependencies

    The Radical Unacceptability Hypothesis: Accounting for Unacceptability without Universal Constraints

    Get PDF
    The Radical Unacceptability Hypothesis (RUH) has been proposed as a way of explaining the unacceptability of extraction from islands and frozen structures. This hypothesis explicitly assumes a distinction between unacceptability due to violations of local well-formedness conditions—conditions on constituency, constituent order, and morphological form—and unacceptability due to extra-grammatical factors. We explore the RUH with respect to classical islands, and extend it to a broader range of phenomena, including freezing, A′ chain interactions, zero-relative clauses, topic islands, weak crossover, extraction from subjects and parasitic gaps, and sensitivity to information structure. The picture that emerges is consistent with the RUH, and suggests more generally that the unacceptability of extraction from otherwise well-formed configurations reflects non-syntactic factors, not principles of grammar.Peer Reviewe

    The achievements of Generative Syntax : a time chart and some reflections

    Get PDF
    In May 2015, a group of eminent linguists met in Athens to debate the road ahead for generative grammar. There was a lot of discussion, and the linguists expressed the intention to draw a list of achievements of generative grammar, for the benefit of other linguists and of the field in general. The list has been sketched, and it is rather interesting, as it presents a general picture of the results that is very 'past-heavy'. In this paper I reproduce the list and discuss the reasons why it looks the way it does.El maig de 2015, un grup d'eminents lingüistes es van reunir a Atenes per debatre el camí que cal seguir per a la gramàtica generativa. Hi va haver molta discussió i els lingüistes van manifestar la intenció de confeccionar una llista d'èxits de la gramàtica generativa en benefici d'altres lingüistes i de l'àmbit en general. La llista ha estat esbossada i és força interessant, ja que presenta una imatge general dels resultats molt "passada". En aquest treball reprodueixo la llista i comento els motius pels quals es veu d'aquesta manera

    On Internal Merge

    Get PDF

    Deep learning for extracting protein-protein interactions from biomedical literature

    Full text link
    State-of-the-art methods for protein-protein interaction (PPI) extraction are primarily feature-based or kernel-based by leveraging lexical and syntactic information. But how to incorporate such knowledge in the recent deep learning methods remains an open question. In this paper, we propose a multichannel dependency-based convolutional neural network model (McDepCNN). It applies one channel to the embedding vector of each word in the sentence, and another channel to the embedding vector of the head of the corresponding word. Therefore, the model can use richer information obtained from different channels. Experiments on two public benchmarking datasets, AIMed and BioInfer, demonstrate that McDepCNN compares favorably to the state-of-the-art rich-feature and single-kernel based methods. In addition, McDepCNN achieves 24.4% relative improvement in F1-score over the state-of-the-art methods on cross-corpus evaluation and 12% improvement in F1-score over kernel-based methods on "difficult" instances. These results suggest that McDepCNN generalizes more easily over different corpora, and is capable of capturing long distance features in the sentences.Comment: Accepted for publication in Proceedings of the 2017 Workshop on Biomedical Natural Language Processing, 10 pages, 2 figures, 6 table

    Cross-lingual transfer parsing for low-resourced languages: an Irish case study

    Get PDF
    We present a study of cross-lingual direct transfer parsing for the Irish language. Firstly we discuss mapping of the annotation scheme of the Irish Dependency Treebank to a universal dependency scheme. We explain our dependency label mapping choices and the structural changes required in the Irish Dependency Treebank. We then experiment with the universally annotated treebanks of ten languages from four language family groups to assess which languages are the most useful for cross-lingual parsing of Irish by using these treebanks to train delexicalised parsing models which are then applied to sentences from the Irish Dependency Treebank. The best results are achieved when using Indonesian, a language from the Austronesian language family

    How tight is your language? A semantic typology based on Mutual Information

    Get PDF
    Languages differ in the degree of semantic flexibility of their syntactic roles. For example, Eng- lish and Indonesian are considered more flexible with regard to the semantics of subjects, whereas German and Japanese are less flexible. In Hawkins’ classification, more flexible lan- guages are said to have a loose fit, and less flexible ones are those that have a tight fit. This classification has been based on manual inspection of example sentences. The present paper proposes a new, quantitative approach to deriving the measures of looseness and tightness from corpora. We use corpora of online news from the Leipzig Corpora Collection in thirty typolog- ically and genealogically diverse languages and parse them syntactically with the help of the Universal Dependencies annotation software. Next, we compute Mutual Information scores for each language using the matrices of lexical lemmas and four syntactic dependencies (intransi- tive subjects, transitive subject, objects and obliques). The new approach allows us not only to reproduce the results of previous investigations, but also to extend the typology to new lan- guages. We also demonstrate that verb-final languages tend to have a tighter relationship be- tween lexemes and syntactic roles, which helps language users to recognize thematic roles early during comprehension

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)
    corecore