3,004 research outputs found
Cross-language frame semantics transfer in bilingual corpora
Abstract. Recent work on the transfer of semantic information across languages has been recently applied to the development of resources annotated with Frame information for different non-English European languages. These works are based on the assumption that parallel corpora annotated for English can be used to transfer the semantic information to the other target languages. In this paper, a robust method based on a statistical machine translation step augmented with simple rule-based post-processing is presented. It alleviates problems related to preprocessing errors and the complex optimization required by syntax-dependent models of the cross-lingual mapping. Different alignment strategies are here in-vestigated against the Europarl corpus. Results suggest that the quality of the de-rived annotations is surprisingly good and well suited for training semantic role labeling systems.
Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus
Many efforts of research are devoted to semantic role labeling (SRL) which is
crucial for natural language understanding. Supervised approaches have achieved
impressing performances when large-scale corpora are available for
resource-rich languages such as English. While for the low-resource languages
with no annotated SRL dataset, it is still challenging to obtain competitive
performances. Cross-lingual SRL is one promising way to address the problem,
which has achieved great advances with the help of model transferring and
annotation projection. In this paper, we propose a novel alternative based on
corpus translation, constructing high-quality training datasets for the target
languages from the source gold-standard SRL annotations. Experimental results
on Universal Proposition Bank show that the translation-based method is highly
effective, and the automatic pseudo datasets can improve the target-language
SRL performances significantly.Comment: Accepted at ACL 202
- …