Search CORE

7 research outputs found

低リソースのニューラル機械翻訳のための追加リソースの活用に関する研究

Author: Imankulova Aizhan
Publication venue
Publication date: 25/03/2021
Field of study

東京都立大

Tokyo Metropolitan University Institutional Repository Miyako-Dori / 首都大学東京機関リポジトリ

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding

Author: Imankulova Aizhan
Khairunnisa Siti Oryza
Komachi Mamoru
Plank Barbara
Ramponi Alan
Sharaf Ibrahim
Stepanovic Marija
van der Goot Rob
Üstün Ahmet
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

The lack of publicly available evaluation data for low-resource languages limits progress in Spoken Language Understanding (SLU). As key tasks like intent classification and slot filling require abundant training data, it is desirable to reuse existing data in high-resource languages to develop models for low-resource scenarios. We introduce xSID, a new benchmark for cross-lingual (x) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect. To tackle the challenge, we propose a joint learning approach, with English SLU training data and non-English auxiliary tasks from raw text, syntax and translation for transfer. We study two setups which differ by type and language coverage of the pre-trained embeddings. Our results show that jointly learning the main tasks with masked language modeling is effective for slots, while machine translation transfer works best for intent classification

Archivio della ricerca - Fondazione Bruno Kessler

The IT University of Copenhagen's Repository

A Study on Exploiting Additional Resources for Low-resource Neural Machine Translation

Author: Imankulova Aizhan
イマンクロヴァアイジャン
Publication venue
Publication date: 18/10/2021
Field of study

Institutional Repositories DataBase (IRDB)

Gender Bias in Masked Language Models for Multiple Languages

Author: Bollegala Danushka
Imankulova Aizhan
Kaneko Masahiro
Okazaki Naoaki
Publication venue
Publication date: 04/05/2022
Field of study

Masked Language Models (MLMs) pre-trained by predicting masked tokens on large corpora have been used successfully in natural language processing tasks for a variety of languages. Unfortunately, it was reported that MLMs also learn discriminative biases regarding attributes such as gender and race. Because most studies have focused on MLMs in English, the bias of MLMs in other languages has rarely been investigated. Manual annotation of evaluation data for languages other than English has been challenging due to the cost and difficulty in recruiting annotators. Moreover, the existing bias evaluation methods require the stereotypical sentence pairs consisting of the same context with attribute words (e.g. He/She is a nurse). We propose Multilingual Bias Evaluation (MBE) score, to evaluate bias in various languages using only English attribute word lists and parallel corpora between the target language and English without requiring manually annotated data. We evaluated MLMs in eight languages using the MBE and confirmed that gender-related biases are encoded in MLMs for all those languages. We manually created datasets for gender bias in Japanese and Russian to evaluate the validity of the MBE. The results show that the bias scores reported by the MBE significantly correlates with that computed from the above manually created datasets and the existing English datasets for gender bias.Comment: NAACL 202

arXiv.org e-Print Archive

Cross-lingual Multi-task Transfer for Zero-shot Task-oriented Dialog

Author: Imankulova Aizhan
Khairunnisa Siti Oryza
Komachi Mamoru
Plank Barbara
Ramponi Alan
Sharaf Ibrahim
Stepanovic Marija
van der Goot Rob
Üstün Ahmet
Publication venue
Publication date: 25/09/2021
Field of study

The IT University of Copenhagen's Repository