Search CORE

516 research outputs found

Whodunit? Learning to Contrast for Authorship Attribution

Author: Ai Bo
Tan Samson
Tan Yugin
Wang Yuchen
Publication venue
Publication date: 10/10/2022
Field of study

Authorship attribution is the task of identifying the author of a given text. The key is finding representations that can differentiate between authors. Existing approaches typically use manually designed features that capture a dataset's content and style, but these approaches are dataset-dependent and yield inconsistent performance across corpora. In this work, we propose \textit{learning} author-specific representations by fine-tuning pre-trained generic language representations with a contrastive objective (Contra-X). We show that Contra-X learns representations that form highly separable clusters for different authors. It advances the state-of-the-art on multiple human and machine authorship attribution benchmarks, enabling improvements of up to 6.8% over cross-entropy fine-tuning. However, we find that Contra-X improves overall accuracy at the cost of sacrificing performance for some authors. Resolving this tension will be an important direction for future work. To the best of our knowledge, we are the first to integrate contrastive learning with pre-trained language model fine-tuning for authorship attribution.Comment: camera-ready version, AACL-IJCNLP 202

arXiv.org e-Print Archive

A Survey of Embodied AI: From Simulators to Research Tasks

Author: Duan Jiafei
Tan Cheston
Tan Hui Li
Yu Samson
Zhu Hongyuan
Publication venue
Publication date: 30/09/2021
Field of study

There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", where AI algorithms and agents no longer learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through interactions with their environments from an egocentric perception similar to humans. Consequently, there has been substantial growth in the demand for embodied AI simulators to support various embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of Artificial General Intelligence (AGI), but there has not been a contemporary and comprehensive survey of this field. This paper aims to provide an encyclopedic survey for the field of embodied AI, from its simulators to its research. By evaluating nine current embodied AI simulators with our proposed seven features, this paper aims to understand the simulators in their provision for use in embodied AI research and their limitations. Lastly, this paper surveys the three main research tasks in embodied AI -- visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation metrics and datasets. Finally, with the new insights revealed through surveying the field, the paper will provide suggestions for simulator-for-task selections and recommendations for the future directions of the field.Comment: Under Review for IEEE TETC

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Robustness of Utilizing Feedback in Embodied Visual Navigation

Author: Duan Jiafei
Tan Cheston
Yu Samson
Zhang Jenny
Publication venue
Publication date: 06/03/2023
Field of study

This paper presents a framework for training an agent to actively request help in object-goal navigation tasks, with feedback indicating the location of the target object in its field of view. To make the agent more robust in scenarios where a teacher may not always be available, the proposed training curriculum includes a mix of episodes with and without feedback. The results show that this approach improves the agent's performance, even in the absence of feedback.Comment: Accepted at the ICRA Workshop for Communicating Robot Learning across Human-Robot Interactio

arXiv.org e-Print Archive

It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations

Author: Joty Shafiq
Kan Min-Yen
Socher Richard
Tan Samson
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Training on only perfect Standard English corpora predisposes pre-trained neural networks to discriminate against minorities from non-standard linguistic backgrounds (e.g., African American Vernacular English, Colloquial Singapore English, etc.). We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples that expose these biases in popular NLP models, e.g., BERT and Transformer, and show that adversarially fine-tuning them for a single epoch significantly improves robustness without sacrificing performance on clean data.Comment: To appear in the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020

arXiv.org e-Print Archive

Crossref

Using closely-related language to build an ASR for a very under-resourced language: Iban

Author: Besacier Laurent
Lecouteux Benjamin
Samson Juan Sarah
Tien Ping Tan
Publication venue: HAL CCSD
Publication date: 01/09/2014
Field of study

International audienceThis paper describes our work on automatic speech recognition system (ASR) for an under-resourced language, Iban, a language that is mainly spoken in Sarawak, Malaysia. We collected 8 hours of data to begin this study due to no resources for ASR exist. We employed bootstrapping techniques involving a closely-related language for rapidly building and improve an Iban system. First, we used already available data from Malay, a local dominant language in Malaysia, to bootstrap grapheme-to-phoneme system (G2P) for the target language. We also built various types of G2Ps, including a grapheme-based and an English G2P, to produce different versions of dictionaries. We tested all of the dictionaries on the Iban ASR to provide us the best version. Second, we improved the baseline GMM system word error rate (WER) result by utilizing subspace Gaussian mixture models (SGMM). To test, we set two levels of data sparseness on Iban data; 7 hours and 1 hour transcribed speech. We investigated cross-lingual SGMM where the shared parameters were obtained either in monolingual or multilingual fashion and then applied to the target language for training. Experiments on out-of-language data, English and Malay, as source languages result in lower WERs when Iban data is very limited

Crossref

Hal - Université Grenoble Alpes

Unimas Institutional Repository

Development of rapid diagnostic methods using nucleic acid based molecular techniques for white spot syndrome virus (WSSV)

Author: Lee Kok Leong
Mohamed Din Mohamed Shariff
Soon Samson Min Ngen
Tan Lee Tung
Publication venue: The Research Unit, Universiti Putra Malaysia
Publication date: 01/01/1999
Field of study

Universiti Putra Malaysia Institutional Repository