92 research outputs found

    A multimodal multiparty human-robot dialogue corpus for real world interaction

    Get PDF
    Kyoto University/Honda Research Institute Japan Co.,Ltd.LREC 2018 Special Speech Sessions "Speech Resources Collection in Real-World Situations"; Phoenix Seagaia Conference Center, Miyazaki; 2018-05-09We have developed the MPR multimodal dialogue corpus and describe research activities using the corpus aimed for enabling multiparty human-robot verbal communication in real-world settings. While aiming for that as the final goal, the immediate focus of our project and the corpus is non-verbal communication, especially social signal processing by machines as the foundation of human-machine verbal communication. The MPR corpus stores annotated audio-visual recordings of dialogues between one robot and one or multiple (up to tree) participants. The annotations include speech segment, addressee of speech, transcript, interaction state, and, dialogue act types. Our research on multiparty dialogue management, boredom recognition, response obligation recognition, surprise detection and repair detection using the corpus is briefly introduced, and an analysis on repair in multiuser situations is presented. It exhibits richer repair behaviors and demands more sophisticated repair handling by machines

    Automatic Answerability Evaluation for Question Generation

    Full text link
    Conventional automatic evaluation metrics, such as BLEU and ROUGE, developed for natural language generation (NLG) tasks, are based on measuring the n-gram overlap between the generated and reference text. These simple metrics may be insufficient for more complex tasks, such as question generation (QG), which requires generating questions that are answerable by the reference answers. Developing a more sophisticated automatic evaluation metric, thus, remains as an urgent problem in QG research. This work proposes a Prompting-based Metric on ANswerability (PMAN), a novel automatic evaluation metric to assess whether the generated questions are answerable by the reference answers for the QG tasks. Extensive experiments demonstrate that its evaluation results are reliable and align with human evaluations. We further apply our metric to evaluate the performance of QG models, which shows our metric complements conventional metrics. Our implementation of a ChatGPT-based QG model achieves state-of-the-art (SOTA) performance in generating answerable questions

    Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition

    Full text link
    Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter-view and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared to all baselines

    Robot-directed speech detection using multimodal semantic confidence based on speech, image, and motion

    Get PDF
    ABSTRACT In this paper, we propose a novel method to detect robotdirected (RD) speech that adopts the Multimodal Semantic Confidence (MSC) measure. The MSC measure is used to decide whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, image, and motion confidence measures with weightings that are optimized by logistic regression. Experimental results show that, compared with a baseline method that uses speech confidence only, MSC achieved an absolute increase of 5% for clean speech and 12% for noisy speech in terms of average maximum F-measure

    Full-Length Sequence of Mouse Acupuncture-Induced 1-L (Aig1l) Gene Including Its Transcriptional Start Site

    Get PDF
    We have been investigating the molecular efficacy of electroacupuncture (EA), which is one type of acupuncture therapy. In our previous molecular biological study of acupuncture, we found an EA-induced gene, named acupuncture-induced 1-L (Aig1l), in mouse skeletal muscle. The aims of this study consisted of identification of the full-length cDNA sequence of Aig1l including the transcriptional start site, determination of the tissue distribution of Aig1l and analysis of the effect of EA on Aig1l gene expression. We determined the complete cDNA sequence including the transcriptional start site via cDNA cloning with the cap site hunting method. We then analyzed the tissue distribution of Aig1l by means of northern blot analysis and real-time quantitative polymerase chain reaction. We used the semiquantitative reverse transcriptase-polymerase chain reaction to examine the effect of EA on Aig1l gene expression. Our results showed that the complete cDNA sequence of Aig1l was 6073 bp long, and the putative protein consisted of 962 amino acids. All seven tissues that we analyzed expressed the Aig1l gene. In skeletal muscle, EA induced expression of the Aig1l gene, with high expression observed after 3 hours of EA. Our findings thus suggest that the Aig1l gene may play a key role in the molecular mechanisms of EA efficacy

    Audio-Visual Teaching Aid for Instructing English Stress Timings

    Get PDF
    This study proposed and evaluated an audio-visual teaching aid for teaching rhythm of spoken English. The teaching aid instructs stress timing of English by movements of a circle marker on PC screen. Native Japanese participants exercised English sentences with and without the teaching aid and their speech sounds were recorded before and after the exercise. The results of analyses of the speech sounds showed that the teaching aid could improve in learning the English stress timing

    対話システムライブコンペティションから何が得られたか

    Get PDF
    日本電信電話株式会社NTTメディアインテリジェンス研究所京都大学電気通信大学(株)NTTドコモ(株)富士通研究所東北大学 / 理化学研究所国立国語研究所国立国語研究所日本電信電話株式会社NTTコミュニケーション科学基礎研究所NTT Media Intelligence Laboratories, NTT CorporationKyoto UniversityThe University of Electro-CommunicationsNTT DOCOMO INC.Fujitsu Laboratories, LTD.Tohoku University / RIKEN AIPNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNTT Communication Science Laboratorie
    corecore