610 research outputs found
Statistical parsing of morphologically rich languages (SPMRL): what, how and whither
The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations
Initial Experiments on Russian to Kazakh SMT
We present our initial experiments on Russian to Kazakh phrase-based
statistical machine translation. Following a common approach to SMT between
morphologically rich languages, we employ morphological processing techniques.
Namely, for our initial experiments, we perform source-side lemmatization. Given
a rather humble-sized parallel corpus at hand, we also put some effort in data
cleaning and investigate the impact of data quality vs. quantity trade off on the
overall performance. Although our experiments mostly focus on source side preprocessing we achieve a substantial, statistically significant improvement over the
baseline that operates on raw, unprocessed data
Overview of the SPMRL 2013 shared task: cross-framework evaluation of parsing morphologically rich languages
This paper reports on the first shared task on statistical parsing of morphologically rich languages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the evaluation metrics for parsing MRLs given different representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
International audienceThis paper reports on the first shared task on statistical parsing of morphologically rich lan- guages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the eval- uation metrics for parsing MRLs given dif- ferent representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios
A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing
We present a novel neural network model that learns POS tagging and
graph-based dependency parsing jointly. Our model uses bidirectional LSTMs to
learn feature representations shared for both POS tagging and dependency
parsing tasks, thus handling the feature-engineering problem. Our extensive
experiments, on 19 languages from the Universal Dependencies project, show that
our model outperforms the state-of-the-art neural network-based
Stack-propagation model for joint POS tagging and transition-based dependency
parsing, resulting in a new state of the art. Our code is open-source and
available together with pre-trained models at:
https://github.com/datquocnguyen/jPTDPComment: v2: also include universal POS tagging, UAS and LAS accuracies w.r.t
gold-standard segmentation on Universal Dependencies 2.0 - CoNLL 2017 shared
task test data; in CoNLL 201
- …