53 research outputs found

    Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation

    Full text link
    In this paper, we introduce a data-driven approach for Formality-Sensitive Machine Translation (FSMT) that caters to the unique linguistic properties of four target languages. Our methodology centers on two core strategies: 1) language-specific data handling, and 2) synthetic data generation using large-scale language models and empirical prompt engineering. This approach demonstrates a considerable improvement over the baseline, highlighting the effectiveness of data-centric techniques. Our prompt engineering strategy further improves performance by producing superior synthetic translation examples.Comment: Accepted for Data-centric Machine Learning Research (DMLR) Workshop at ICML 202

    Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse

    Full text link
    We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative. An alternative speech provides practical alternatives to hate speech in real-world scenarios by offering speech-level corrections to speakers while considering the surrounding context and promoting speakers to reform. Further, an alternative speech can combat hate speech alongside counter-narratives, offering a useful tool to address social issues such as racial discrimination and gender inequality. We propose the new concept and provide detailed guidelines for constructing the necessary dataset. Through discussion, we demonstrate that combining alternative speech and counter-narrative can be a more effective strategy for combating hate speech by complementing specificity and guiding capacity of counter-narrative. This paper presents another perspective for dealing with hate speech, offering viable remedies to complement the constraints of current approaches to mitigating harmful bias.Comment: Accepted for The First Workshop on Data-Centric AI (DCAI) at ICDM 202

    A Self-Supervised Automatic Post-Editing Data Generation Tool

    Full text link
    Data building for automatic post-editing (APE) requires extensive and expert-level human effort, as it contains an elaborate process that involves identifying errors in sentences and providing suitable revisions. Hence, we develop a self-supervised data generation tool, deployable as a web application, that minimizes human supervision and constructs personalized APE data from a parallel corpus for several language pairs with English as the target language. Data-centric APE research can be conducted using this tool, involving many language pairs that have not been studied thus far owing to the lack of suitable data.Comment: Accepted for DataPerf workshop at ICML 202

    A Study on the Development of Game-based Mind Wandering Judgment Model in Video Lecture-based Education

    Get PDF
    Although video lecture materials are very efficient learning materials, they are likely to be unilateral learning materials by the lecturer. It is easily degraded to be one-sided learning, which has been considered as a problem of online education, and it is difficult to judge whether learners are actually learning. Therefore, in this paper, a minimum learning activity judgment model that can automatically determine if they actually learn through mind wandering judgment was proposed to overcome the limitations of previous learning materials, and educational effect verification experiment was performed. Experiment results show that the video lecture class using the minimum learning activity judgment system was effective in improving the academic achievement
    • …
    corecore