1,775 research outputs found
Cross-lingual transfer parsing for low-resourced languages: an Irish case study
We present a study of cross-lingual direct transfer parsing for the Irish language. Firstly we
discuss mapping of the annotation scheme of the Irish Dependency Treebank to a universal dependency scheme. We explain our dependency label mapping choices and the structural changes
required in the Irish Dependency Treebank. We then experiment with the universally annotated
treebanks of ten languages from four language family groups to assess which languages are the
most useful for cross-lingual parsing of Irish by using these treebanks to train delexicalised parsing models which are then applied to sentences from the Irish Dependency Treebank. The best
results are achieved when using Indonesian, a language from the Austronesian language family
Greek Dependency Treebank (GDT)
70K words, Non-validated sentence segmentation. Non-validated POS tagging, Manual annotation of syntactic dependencies and dependency labels, Manual annotation of semantic roles, Manual annotation of events based on a shallow domain specific ontology (only for a 31K words subset of GDT
Sastavljanje Hrvatske ovisnosne banke stabala: početne etape
The paper presents work–in–progress on the building of the Croatian Dependency Treebank.
Its design principles, procedures and the pilot corpus used within are described. Perspectives
for further development of the Croatian Dependency Treebank are presented at the end.Članak donosi međurezultate sastavljanja Hrvatske ovisnosne banke stabala koje je istraživanje
u tijeku. Opisuju se njezina načela oblikovanja, postupci i uporabljeni pilot korpus. Na kraju se
članka predstavljaju perspektive za daljnji razvitak Hrvatske ovisnosne banke stabala
Sastavljanje Hrvatske ovisnosne banke stabala: početne etape
The paper presents work–in–progress on the building of the Croatian Dependency Treebank.
Its design principles, procedures and the pilot corpus used within are described. Perspectives
for further development of the Croatian Dependency Treebank are presented at the end.Članak donosi međurezultate sastavljanja Hrvatske ovisnosne banke stabala koje je istraživanje
u tijeku. Opisuju se njezina načela oblikovanja, postupci i uporabljeni pilot korpus. Na kraju se
članka predstavljaju perspektive za daljnji razvitak Hrvatske ovisnosne banke stabala
Active learning and the Irish treebank
We report on our ongoing work in developing the Irish Dependency Treebank, describe the results of two Inter annotator Agreement (IAA) studies, demonstrate improvements in annotation consistency which have a knock-on effect on parsing accuracy, and present the final set of dependency labels. We then go on to investigate the extent to which active learning can play a role in treebank and parser development by comparing an active learning bootstrapping approach to a passive approach in which sentences are chosen at random for manual revision. We show that active learning outperforms passive learning, but when annotation effort is taken into account, it is not clear how much of an advantage the active learning approach has. Finally, we present results which suggest that adding automatic parses to the training data along with manually revised parses in an active learning setup does not greatly affect parsing accuracy
Universal Dependencies Parsing for Colloquial Singaporean English
Singlish can be interesting to the ACL community both linguistically as a
major creole based on English, and computationally for information extraction
and sentiment analysis of regional social media. We investigate dependency
parsing of Singlish by constructing a dependency treebank under the Universal
Dependencies scheme, and then training a neural network model by integrating
English syntactic knowledge into a state-of-the-art parser trained on the
Singlish treebank. Results show that English knowledge can lead to 25% relative
error reduction, resulting in a parser of 84.47% accuracies. To the best of our
knowledge, we are the first to use neural stacking to improve cross-lingual
dependency parsing on low-resource languages. We make both our annotation and
parser available for further research.Comment: Accepted by ACL 201
The risks of mixing dependency lengths from sequences of different length
Mixing dependency lengths from sequences of different length is a common
practice in language research. However, the empirical distribution of
dependency lengths of sentences of the same length differs from that of
sentences of varying length and the distribution of dependency lengths depends
on sentence length for real sentences and also under the null hypothesis that
dependencies connect vertices located in random positions of the sequence. This
suggests that certain results, such as the distribution of syntactic dependency
lengths mixing dependencies from sentences of varying length, could be a mere
consequence of that mixing. Furthermore, differences in the global averages of
dependency length (mixing lengths from sentences of varying length) for two
different languages do not simply imply a priori that one language optimizes
dependency lengths better than the other because those differences could be due
to differences in the distribution of sentence lengths and other factors.Comment: Laguage and referencing has been improved; Eqs. 7, 11, B7 and B8 have
been correcte
- …