Search CORE

27 research outputs found

Outlier-Aware Training for Improving Group Accuracy Disparities

Author: Chen Li-Kuang
Kruengkrai Canasai
Yamagishi Junichi
Publication venue
Publication date: 27/10/2022
Field of study

Methods addressing spurious correlations such as Just Train Twice (JTT, arXiv:2107.09044v2) involve reweighting a subset of the training set to maximize the worst-group accuracy. However, the reweighted set of examples may potentially contain unlearnable examples that hamper the model's learning. We propose mitigating this by detecting outliers to the training set and removing them before reweighting. Our experiments show that our method achieves competitive or better accuracy compared with JTT and can detect and remove annotation errors in the subset being reweighted in JTT

arXiv.org e-Print Archive

XFEVER: Exploring Fact Verification across Languages

Author: Chang Yi-Chen
Kruengkrai Canasai
Yamagishi Junichi
Publication venue
Publication date: 24/10/2023
Field of study

This paper introduces the Cross-lingual Fact Extraction and VERification (XFEVER) dataset designed for benchmarking the fact verification models across different languages. We constructed it by translating the claim and evidence texts of the Fact Extraction and VERification (FEVER) dataset into six languages. The training and development sets were translated using machine translation, whereas the test set includes texts translated by professional translators and machine-translated texts. Using the XFEVER dataset, two cross-lingual fact verification scenarios, zero-shot learning and translate-train learning, are defined, and baseline models for each scenario are also proposed in this paper. Experimental results show that the multilingual language model can be used to build fact verification models in different languages efficiently. However, the performance varies by language and is somewhat inferior to the English case. We also found that we can effectively mitigate model miscalibration by considering the prediction similarity between the English and target languages. The XFEVER dataset, code, and model checkpoints are available at https://github.com/nii-yamagishilab/xfever.Comment: Accepted for an oral presentation at the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023

arXiv.org e-Print Archive

An Example-Based Approach to Difficult Pronoun Resolution

Author: Inoue Naoya
Inui Kentaro
Kruengkrai Canasai
Sugiura Jun
Publication venue: Department of Linguistics, Faculty of Arts, Chulalongkorn University
Publication date: 01/01/2014
Field of study

Waseda University Repository

Recognizing Complex Negation on Twitter

Author: Hashimoto Chikara
Inui Kentaro
Kloetzer Julien
Kruengkrai Canasai
Mizuno Junta
Ohtake Kiyonori
Torisawa Kentaro
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

A Practical Text Summarizer by Paragraph Extraction for Thai

Author: Canasai Kruengkrai
Chuleerat Jaruskulchai
Publication venue
Publication date
Field of study

In this paper, we propose a practical approach for extracting the most relevant paragraphs from the original document to form a summary for Thai text. The idea of our approach is to exploit both the local and global properties of paragraphs. The local property can be considered as clusters of significant words within each paragraph, while the global property can be though of as relations of all paragraphs in a document. These two properties are combined for ranking and extracting summaries. Experimental results on real-world data sets are encouraging.

CiteSeerX

A Parallel Learning Algorithm for Text Classification

Author: Canasai Kruengkrai
Chuleerat Jaruskulchai
Publication venue
Publication date: 01/01/2002
Field of study

Text classification is the process of classifying documents into predef'med categories based on their content. Existing supervised learning algorithms to automatically classify text need sufficient labeled documents to learn accurately. Applying the ExpectationMaximization (EM) algorithm to this problem is an alternative approach that utilizes a large pool of unlabeled documents to augment the available labeled documents. Unfortunately, the time needed to learn with these large unlabeled documents is too high. This paper introduces a novel parallel learning algorithm for text classification task. The parallel algorithm is based on the combination of the EM algorithm and the naive Bayes classifier. Our goal is to improve the computational time in learning and classifying process. We studied the performance of our parallel algorithm on a large Linux PC cluster called PIRUN Cluster. We report both timing and accuracy results. These results indicate that the proposed parallel algorithm is capable of handling large document collections

CiteSeerX

Crossref

Refining A Divisive Partitioning Algorithm for Unsupervised Clustering

Author: Canasai Kruengkrai
Hitoshi Isahara
Virach Sornlertlamvanich
Publication venue
Publication date
Field of study

Abstract. The Principal Direction Divisive Partitioning (PDDP) algorithm is a fast and scalable clustering algorithm [3]. The basic idea is to recursively split the data set into sub-clusters based on principal direction vectors. However, the PDDP algorithm can yield poor results, especially when cluster structures are not well-separated from on

CiteSeerX