11 research outputs found

    Design and Evaluation of the Corpus of Everyday Japanese Conversation

    Get PDF
    application/pdfNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsGraduate School of Humanities, Chiba UniversityNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsWe have constructed the Corpus of Everyday Japanese Conversation (CEJC) and published it in March 2022. The CEJC is designed to contain various kinds of everyday conversations in a balanced manner to capture their diversity. The CEJC features not only audio but also video data to facilitate precise understanding of the mechanism of real-life social behavior. The publication of a large-scale corpus of everyday conversations that includes video data is a new approach. The CEJC contains 200 hours of speech, 577 conversations, about 2.4 million words, and a total of 1675 conversants. In this paper, we present an overview of the corpus, including the recording method and devices, structure of the corpus, formats of video and audio files, transcription, and annotations. We then report some results of the evaluation of the CEJC in terms of conversant and conversation attributes. We show that the CEJC includes a good balance of adult conversants in terms of gender and age, as well as a variety of conversations in terms of conversation forms, places, activities, and numbers of conversants.conference pape
    corecore