Search CORE

2,783 research outputs found

Investigating multilingual dependency parsing

Author: Johansson Richard
Nugues Pierre
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2006
Field of study

In this paper, we describe a system for the CoNLL-X shared task of multilingual dependency parsing. It uses a baseline Nivre’s parser (Nivre, 2003) that first identifies the parse actions and then labels the dependency arcs. These two steps are implemented as SVM classifiers using LIBSVM. Features take into account the static context as well as relations dynamically built during parsing. We experimented two main additions to our implementation of Nivre’s parser: N-best search and bidirectional parsing. We trained the parser in both left-right and right-left directions and we combined the results. To construct a single-head, rooted, and cycle-free tree, we applied the Chu-Liu/Edmonds optimization algorithm. We ran the same algorithm with the same parameters on all the languages

Crossref

Lund University Publications

Investigating multilingual dependency parsing

Author: Johansson Richard
Nugues Pierre
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2006
Field of study

Lund University Publications

Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure

Author: McDonald Ryan
Täckström Oscar
Uszkoreit Jakob
Publication venue
Publication date: 01/01/2012
Field of study

It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Polyglot: Distributed Word Representations for Multilingual NLP

Author: Al-Rfou Rami
Perozzi Bryan
Skiena Steven
Publication venue
Publication date: 27/06/2014
Field of study

Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational Natural Language Learning CoNLL'201

arXiv.org e-Print Archive

CiteSeerX

Parsing Thai Social Data: A New Challenge for Thai NLP

Author: Chalothorn Tawunrat
Khampingyot Borirat
Maharattamalai Nattasit
Singkul Sattaya
Taerungruang Supawat
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/03/2020
Field of study

Dependency parsing (DP) is a task that analyzes text for syntactic structure and relationship between words. DP is widely used to improve natural language processing (NLP) applications in many languages such as English. Previous works on DP are generally applicable to formally written languages. However, they do not apply to informal languages such as the ones used in social networks. Therefore, DP has to be researched and explored with such social network data. In this paper, we explore and identify a DP model that is suitable for Thai social network data. After that, we will identify the appropriate linguistic unit as an input. The result showed that, the transition based model called, improve Elkared dependency parser outperform the others at UAS of 81.42%.Comment: 7 Pages, 8 figures, to be published in The 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2019

arXiv.org e-Print Archive

Crossref