17 research outputs found

    UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation

    Get PDF
    Conference name: the 24th Meeting of the Special Interest Group on Discourse and Dialogue, Conference place: Prague, Czechia, Session period: 2023/09/11-15, Organizer: Association for Computational Linguisticsapplication/pdfNational Institute for Japanese Language and LinguisticsTohoku UniversityMegagon Labs, Tokyo, Recruit Co., LtdNational Institute for Japanese Language and LinguisticsIn this study, we have developed Universal Dependencies (UD) resources for spoken Japanese in the Corpus of Everyday Japanese Conversation (CEJC). The CEJC is a large corpus of spoken language that encompasses various everyday conversations in Japanese, and includes word delimitation and part-of-speech annotation. We have newly annotated Long Word Unit delimitation and Bunsetsu (Japanese phrase)-based dependencies, including Bunsetsu boundaries, for CEJC. The UD of Japanese resources was constructed in accordance with hand-maintained conversion rules from the CEJC with two types of word delimitation, part-of-speech tags and Bunsetsu-based syntactic dependency relations. Furthermore, we examined various issues pertaining to the construction of UD in the CEJC by comparing it with the written Japanese corpus and evaluating UD parsing accuracy.conference pape

    Speech corpora in NINJAL, Japan demonstration of corpus concordance systems : Chunagon and Kotonoha

    Get PDF
    National Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsThe National Institute for Japanese Language and Linguistics, Japan (NINJAL, Japan) provides a demonstration site in the LPSS 2019 conference. This manuscript presents an overview of the demonstration of three corpora: Corpus of Spontaneous Japanese, Corpus of Everyday Japanese Conversation, and Corpus of Japanese Dialects.NINJAL also demonstrates two concordance systems. The first is "Chunagon (中納言)" which is a morpheme based concordance system that was made publicly available in 2011. The second is the currently developing system "Kotonoha" released in 2018 that enables query of multiple corpora in terms of register type and period

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    Word Delimitation Issues in UD Japanese

    No full text

    国語研長単位に基づく日本語 Universal Dependencies

    Get PDF
    国立国語研究所国立国語研究所国立国語研究所 / 東京外国語大学National Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and Linguistics / Tokyo University of Foreign Studie

    Universal Dependencies 2.5

    No full text
    Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008)
    corecore