203 research outputs found

    Multimodal Resources in Turn-Taking in Semi-Institutional Mandarin Multiparty Interactions

    Get PDF
    This study investigates the utilization of multimodal resources in organizing turn-taking during multiparty interactions in a Mandarin talk show. By applying multimodal conversation analysis and interactional linguistics to 5.8 hours of impromptu talk show data, the study reveals that the chair and the other participants orient to their dual roles, both institutional and real-life, to configure a semi-institutional setting. Besides, the multimodal resources can be effectively used by the participants, i.e., the host and the guests, to manage contingencies during turn-taking, including visible cues, embodied movements, and pragmatic (in)completion. The findings contribute to our understanding of the dynamics of turn-taking in semi-institutional settings and shed light on the interplay of multimodal resources in larger group conversations. The research expands the existing literature on multiparty institutional conversations in Mandarin

    3GPP-Like THz Channel Modeling for Indoor Office and Urban Microcellular Scenarios

    Full text link
    Terahertz (THz) communication is envisioned as the possible technology for the sixth-generation (6G) communication system. THz channel propagation characteristics are the basis of designing and evaluating for THz communication system. In this paper, THz channel measurements at 100 GHz and 132 GHz are conducted in an indoor office scenario and an urban microcellular (UMi) scenario, respectively. Based on the measurement, the 3GPP-like channel parameters are extracted and analyzed. Moreover, the parameters models are available for the simulation of the channel impulse response by the geometry-based stochastic model (GBSM). Then, the comparisons between measurement-based parameter models and 3rd Generation Partnership Project (3GPP) channel models are investigated. It is observed that the case with path loss approaching free space exists in the NLoS scenario. Besides, the cluster number are 4 at LoS and 5 at NLoS in the indoor office and 4 at LoS and 3 at NLoS in the UMi, which are much less than 3GPP. The multipath component (MPC) in the THz channel distributes more simpler and more sparsely than the 3GPP millimeter wave (mm-wave) channel models. Furthermore, the ergodic capacity of mm-wave and THz are evaluated by the proposed THz GBSM implementation framework. The THz measurement model predicts the smallest capacity, indicating that high carrier frequency is limited to the single transmission mechanism of reflection and results in the reduction of cluster numbers and ergodic capacity. Generally, these results are helpful to understand and model the THz channel and apply the THz communication technique for 6G.Comment: 13 pages, 12 figures, 3 table

    From Think Parallel to Think Sequential

    Get PDF

    Reasoning over Hierarchical Question Decomposition Tree for Explainable Question Answering

    Full text link
    Explainable question answering (XQA) aims to answer a given question and provide an explanation why the answer is selected. Existing XQA methods focus on reasoning on a single knowledge source, e.g., structured knowledge bases, unstructured corpora, etc. However, integrating information from heterogeneous knowledge sources is essential to answer complex questions. In this paper, we propose to leverage question decomposing for heterogeneous knowledge integration, by breaking down a complex question into simpler ones, and selecting the appropriate knowledge source for each sub-question. To facilitate reasoning, we propose a novel two-stage XQA framework, Reasoning over Hierarchical Question Decomposition Tree (RoHT). First, we build the Hierarchical Question Decomposition Tree (HQDT) to understand the semantics of a complex question; then, we conduct probabilistic reasoning over HQDT from root to leaves recursively, to aggregate heterogeneous knowledge at different tree levels and search for a best solution considering the decomposing and answering probabilities. The experiments on complex QA datasets KQA Pro and Musique show that our framework outperforms SOTA methods significantly, demonstrating the effectiveness of leveraging question decomposing for knowledge integration and our RoHT framework.Comment: has been accepted by ACL202

    A Study of Pulsation properties of 57 Non-Blazhko effect ab-type RR Lyrae stars with homogeneous metallicities from the LAMOST-Kepler/K2 survey

    Full text link
    Homogeneous metallicities and continuous high-precision light curves play key roles in studying the pulsation properties of RR Lyrae stars. By cross-matching with LAMOST DR6, we have determined 7 and 50 Non-Blazhko RRab stars in the Kepler and K2 fields, respectively, who have homogeneous metallicities determined from low-resolution spectra of the LAMOST-Kepler/K2 project. The Fourier Decomposition method is applied to the light curves of these stars provided by the Kepler space based telescope to determine the fundamental pulsation periods and the pulsation parameters. The calculated amplitude ratios of R21, R31 and the phase differences of {\phi}21, {\phi}31 are consistent with the parameters of the RRab stars in both the Globular Clusters and the Large Magellanic Cloud. We find a linear relationship between the phase differences {\phi}21 and {\phi}31, which is in good agreement with the results in previous literature. As far as the amplitude, we find that the amplitude of primary frequency A1 and the total amplitude Atot follow either a cubic or linear relationship. For the rise time RT, we do not find its relevance with the period of the fundamental pulsation mode P1, or Atot and {\phi}21. However, it might follow a linear relationship with R31. Based on the homogeneous metallicities, we have derived a new calibration formula for the relationship of period-{\phi}31-[Fe/H], which agrees well with the previous studies

    Human Motion Generation: A Survey

    Full text link
    Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.Comment: 20 pages, 5 figure

    Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance

    Full text link
    Streaming voice conversion (VC) is the task of converting the voice of one person to another in real-time. Previous streaming VC methods use phonetic posteriorgrams (PPGs) extracted from automatic speech recognition (ASR) systems to represent speaker-independent information. However, PPGs lack the prosody and vocalization information of the source speaker, and streaming PPGs contain undesired leaked timbre of the source speaker. In this paper, we propose to use intermediate bottleneck features (IBFs) to replace PPGs. VC systems trained with IBFs retain more prosody and vocalization information of the source speaker. Furthermore, we propose a non-streaming teacher guidance (TG) framework that addresses the timbre leakage problem. Experiments show that our proposed IBFs and the TG framework achieve a state-of-the-art streaming VC naturalness of 3.85, a content consistency of 3.77, and a timbre similarity of 3.77 under a future receptive field of 160 ms which significantly outperform previous streaming VC systems.Comment: The paper has been submitted to ICASSP202
    corecore