92 research outputs found
A multimodal multiparty human-robot dialogue corpus for real world interaction
Kyoto University/Honda Research Institute Japan Co.,Ltd.LREC 2018 Special Speech Sessions "Speech Resources Collection in Real-World Situations"; Phoenix Seagaia Conference Center, Miyazaki; 2018-05-09We have developed the MPR multimodal dialogue corpus and describe research activities using the corpus aimed for enabling multiparty human-robot verbal communication in real-world settings. While aiming for that as the final goal, the immediate focus of our project and the corpus is non-verbal communication, especially social signal processing by machines as the foundation of human-machine verbal communication. The MPR corpus stores annotated audio-visual recordings of dialogues between one robot and one or multiple (up to tree) participants. The annotations include speech segment, addressee of speech, transcript, interaction state, and, dialogue act types. Our research on multiparty dialogue management, boredom recognition, response obligation recognition, surprise detection and repair detection using the corpus is briefly introduced, and an analysis on repair in multiuser situations is presented. It exhibits richer repair behaviors and demands more sophisticated repair handling by machines
Automatic Answerability Evaluation for Question Generation
Conventional automatic evaluation metrics, such as BLEU and ROUGE, developed
for natural language generation (NLG) tasks, are based on measuring the n-gram
overlap between the generated and reference text. These simple metrics may be
insufficient for more complex tasks, such as question generation (QG), which
requires generating questions that are answerable by the reference answers.
Developing a more sophisticated automatic evaluation metric, thus, remains as
an urgent problem in QG research. This work proposes a Prompting-based Metric
on ANswerability (PMAN), a novel automatic evaluation metric to assess whether
the generated questions are answerable by the reference answers for the QG
tasks. Extensive experiments demonstrate that its evaluation results are
reliable and align with human evaluations. We further apply our metric to
evaluate the performance of QG models, which shows our metric complements
conventional metrics. Our implementation of a ChatGPT-based QG model achieves
state-of-the-art (SOTA) performance in generating answerable questions
Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition
Multimodal emotion recognition aims to recognize emotions for each utterance
of multiple modalities, which has received increasing attention for its
application in human-machine interaction. Current graph-based methods fail to
simultaneously depict global contextual features and local diverse uni-modal
features in a dialogue. Furthermore, with the number of graph layers
increasing, they easily fall into over-smoothing. In this paper, we propose a
method for joint modality fusion and graph contrastive learning for multimodal
emotion recognition (Joyful), where multimodality fusion, contrastive learning,
and emotion recognition are jointly optimized. Specifically, we first design a
new multimodal fusion mechanism that can provide deep interaction and fusion
between the global contextual and uni-modal specific features. Then, we
introduce a graph contrastive learning framework with inter-view and intra-view
contrastive losses to learn more distinguishable representations for samples
with different sentiments. Extensive experiments on three benchmark datasets
indicate that Joyful achieved state-of-the-art (SOTA) performance compared to
all baselines
Robot-directed speech detection using multimodal semantic confidence based on speech, image, and motion
ABSTRACT In this paper, we propose a novel method to detect robotdirected (RD) speech that adopts the Multimodal Semantic Confidence (MSC) measure. The MSC measure is used to decide whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, image, and motion confidence measures with weightings that are optimized by logistic regression. Experimental results show that, compared with a baseline method that uses speech confidence only, MSC achieved an absolute increase of 5% for clean speech and 12% for noisy speech in terms of average maximum F-measure
Full-Length Sequence of Mouse Acupuncture-Induced 1-L (Aig1l) Gene Including Its Transcriptional Start Site
We have been investigating the molecular efficacy of electroacupuncture (EA), which is one type of acupuncture therapy. In our previous molecular biological study of acupuncture, we found an EA-induced gene, named acupuncture-induced 1-L (Aig1l), in mouse skeletal muscle. The aims of this study consisted of identification of the full-length cDNA sequence of Aig1l including the transcriptional start site, determination of the tissue distribution of Aig1l and analysis of the effect of EA on Aig1l gene expression. We determined the complete cDNA sequence including the transcriptional start site via cDNA cloning with the cap site hunting method. We then analyzed the tissue distribution of Aig1l by means of northern blot analysis and real-time quantitative polymerase chain reaction. We used the semiquantitative reverse transcriptase-polymerase chain reaction to examine the effect of EA on Aig1l gene expression. Our results showed that the complete cDNA sequence of Aig1l was 6073 bp long, and the putative protein consisted of 962 amino acids. All seven tissues that we analyzed expressed the Aig1l gene. In skeletal muscle, EA induced expression of the Aig1l gene, with high expression observed after 3 hours of EA. Our findings thus suggest that the Aig1l gene may play a key role in the molecular mechanisms of EA efficacy
Audio-Visual Teaching Aid for Instructing English Stress Timings
This study proposed and evaluated an audio-visual teaching aid for teaching rhythm of spoken English. The teaching aid instructs stress timing of English by movements of a circle marker on PC screen. Native Japanese participants exercised English sentences with and without the teaching aid and their speech sounds were recorded before and after the exercise. The results of analyses of the speech sounds showed that the teaching aid could improve in learning the English stress timing
対話システムライブコンペティションから何が得られたか
日本電信電話株式会社NTTメディアインテリジェンス研究所京都大学電気通信大学(株)NTTドコモ(株)富士通研究所東北大学 / 理化学研究所国立国語研究所国立国語研究所日本電信電話株式会社NTTコミュニケーション科学基礎研究所NTT Media Intelligence Laboratories, NTT CorporationKyoto UniversityThe University of Electro-CommunicationsNTT DOCOMO INC.Fujitsu Laboratories, LTD.Tohoku University / RIKEN AIPNational Institute for Japanese Language and LinguisticsNational Institute for Japanese Language and LinguisticsNTT Communication Science Laboratorie
- …