53 research outputs found
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation
In this paper, we introduce a data-driven approach for Formality-Sensitive
Machine Translation (FSMT) that caters to the unique linguistic properties of
four target languages. Our methodology centers on two core strategies: 1)
language-specific data handling, and 2) synthetic data generation using
large-scale language models and empirical prompt engineering. This approach
demonstrates a considerable improvement over the baseline, highlighting the
effectiveness of data-centric techniques. Our prompt engineering strategy
further improves performance by producing superior synthetic translation
examples.Comment: Accepted for Data-centric Machine Learning Research (DMLR) Workshop
at ICML 202
Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse
We introduce the concept of "Alternative Speech" as a new way to directly
combat hate speech and complement the limitations of counter-narrative. An
alternative speech provides practical alternatives to hate speech in real-world
scenarios by offering speech-level corrections to speakers while considering
the surrounding context and promoting speakers to reform. Further, an
alternative speech can combat hate speech alongside counter-narratives,
offering a useful tool to address social issues such as racial discrimination
and gender inequality. We propose the new concept and provide detailed
guidelines for constructing the necessary dataset. Through discussion, we
demonstrate that combining alternative speech and counter-narrative can be a
more effective strategy for combating hate speech by complementing specificity
and guiding capacity of counter-narrative. This paper presents another
perspective for dealing with hate speech, offering viable remedies to
complement the constraints of current approaches to mitigating harmful bias.Comment: Accepted for The First Workshop on Data-Centric AI (DCAI) at ICDM
202
A Self-Supervised Automatic Post-Editing Data Generation Tool
Data building for automatic post-editing (APE) requires extensive and
expert-level human effort, as it contains an elaborate process that involves
identifying errors in sentences and providing suitable revisions. Hence, we
develop a self-supervised data generation tool, deployable as a web
application, that minimizes human supervision and constructs personalized APE
data from a parallel corpus for several language pairs with English as the
target language. Data-centric APE research can be conducted using this tool,
involving many language pairs that have not been studied thus far owing to the
lack of suitable data.Comment: Accepted for DataPerf workshop at ICML 202
A Study on the Development of Game-based Mind Wandering Judgment Model in Video Lecture-based Education
Although video lecture materials are very efficient learning materials, they are likely to be unilateral learning materials by the lecturer. It is easily degraded to be one-sided learning, which has been considered as a problem of online education, and it is difficult to judge whether learners are actually learning. Therefore, in this paper, a minimum learning activity judgment model that can automatically determine if they actually learn through mind wandering judgment was proposed to overcome the limitations of previous learning materials, and educational effect verification experiment was performed. Experiment results show that the video lecture class using the minimum learning activity judgment system was effective in improving the academic achievement
- …