158 research outputs found
Syntactic Nuclei in Dependency Parsing -- A Multilingual Exploration
Standard models for syntactic dependency parsing take words to be the
elementary units that enter into dependency relations. In this paper, we
investigate whether there are any benefits from enriching these models with the
more abstract notion of nucleus proposed by Tesni\`{e}re. We do this by showing
how the concept of nucleus can be defined in the framework of Universal
Dependencies and how we can use composition functions to make a
transition-based dependency parser aware of this concept. Experiments on 12
languages show that nucleus composition gives small but significant
improvements in parsing accuracy. Further analysis reveals that the improvement
mainly concerns a small number of dependency relations, including nominal
modifiers, relations of coordination, main predicates, and direct objects.Comment: Accepted at EACL-202
Statistical dependency parsing of Turkish
This paper presents results from the first statistical dependency parser for Turkish. Turkish is a free-constituent order language with complex agglutinative inflectional and derivational morphology and presents interesting challenges for statistical parsing, as in general, dependency relations are between “portions” of words called inflectional groups. We have explored statistical models that use different representational units for parsing. We have used the Turkish Dependency Treebank to train and test our parser but have limited this initial exploration to that subset of the treebank sentences with only left-to-right non-crossing dependency links. Our results indicate that the best accuracy in terms of the dependency relations between inflectional groups is obtained when we use inflectional groups as units in parsing, and when contexts around the dependent are employed
The incremental use of morphological information and lexicalization in data-driven dependency parsing
Typological diversity among the natural languages of the world poses interesting challenges for the models and algorithms used in syntactic parsing. In this paper, we apply a data-driven dependency parser to Turkish, a language characterized by rich morphology and flexible constituent order, and study the effect of employing varying amounts of morpholexical information on parsing performance. The investigations show that accuracy can be improved by using representations based on inflectional groups rather than word forms, confirming earlier studies. In addition, lexicalization and the use of rich morphological features are found to have a positive effect. By combining all these techniques, we obtain the highest reported accuracy for parsing the Turkish Treebank
UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation
Conference name: the 24th Meeting of the Special Interest Group on Discourse and Dialogue, Conference place: Prague, Czechia, Session period: 2023/09/11-15, Organizer: Association for Computational Linguisticsapplication/pdfNational Institute for Japanese Language and LinguisticsTohoku UniversityMegagon Labs, Tokyo, Recruit Co., LtdNational Institute for Japanese Language and LinguisticsIn this study, we have developed Universal Dependencies (UD) resources for spoken Japanese in the Corpus of Everyday Japanese Conversation (CEJC). The CEJC is a large corpus of spoken language that encompasses various everyday conversations in Japanese, and includes word delimitation and part-of-speech annotation. We have newly annotated Long Word Unit delimitation and Bunsetsu (Japanese phrase)-based dependencies, including Bunsetsu boundaries, for CEJC. The UD of Japanese resources was constructed in accordance with hand-maintained conversion rules from the CEJC with two types of word delimitation, part-of-speech tags and Bunsetsu-based syntactic dependency relations. Furthermore, we examined various issues pertaining to the construction of UD in the CEJC by comparing it with the written Japanese corpus and evaluating UD parsing accuracy.conference pape
Türkçe cümlelerin kural tabanlı bağlılık analizi
Bu makalede, Türkçe cümlelerin kural tabanlı bağlılık analizi yöntemi ile ayrıştırılmaları sonucunda elde edilen başarım sunulmaktadır. Çalışma, test verisi olarak kullanılan ODTÜ-Sabancı Ağaç Yapılı Derlemi'nin bütünü üzerindeki ilk kural tabanlı sonuçları içermektedir. Uygulanan ayrıştırma algoritması ve kural yapıları detaylı olarak verilmiştir. Sonuçlar Türkçe'nin Bağlılık Analizi konusunda yapılacak çalışmalara temel olma niteliğindedir
Tagging and parsing with cascaded Markov models : automation of corpus annotation
This thesis presents new techniques for parsing natural language. They are based on Markov Models, which are commonly used in part-of-speech tagging for sequential processing on the world level. We show that Markov Models can be successfully applied to other levels of syntactic processing. first two classification task are handled: the assignment of grammatical functions and the labeling of non-terminal nodes. Then, Markov Models are used to recognize hierarchical syntactic structures. Each layer of a structure is represented by a separate Markov Model. The output of a lower layer is passed as input to a higher layer, hence the name: Cascaded Markov Models. Instead of simple symbols, the states emit partial context-free structures. The new techniques are applied to corpus annotation and partial parsing and are evaluated using corpora of different languages and domains.Ausgehend von Markov-Modellen, die für das Part-of-Speech-Tagging eingesetzt werden, stellt diese Arbeit Verfahren vor, die Markov-Modelle auch auf weiteren Ebenen der syntaktischen Verarbeitung erfolgreich nutzen. Dies betrifft zum einen Klassifikationen wie die Zuweisung grammatischer Funktionen und die Bestimmung von Kategorien nichtterminaler Knoten, zum anderen die Zuweisung hierarchischer, syntaktischer Strukturen durch Markov-Modelle. Letzteres geschieht durch die Repräsentation jeder Ebene einer syntaktischen Struktur durch ein eigenes Markov-Modell, was den Namen des Verfahrens prägt: Kaskadierte Markov-Modelle. Deren Zustände geben anstelle atomarer Symbole partielle kontextfreie Strukturen aus. Diese Verfahren kommen in der Korpusannotation und dem partiellen Parsing zum Einsatz und werden anhand mehrerer Korpora evaluiert
Automatic Acquisition of Lexical-Functional Grammar Resources from a Japanese Dependency Corpus
PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 200
- …