203 research outputs found
Multimodal Resources in Turn-Taking in Semi-Institutional Mandarin Multiparty Interactions
This study investigates the utilization of multimodal resources in organizing turn-taking during multiparty interactions in a Mandarin talk show. By applying multimodal conversation analysis and interactional linguistics to 5.8 hours of impromptu talk show data, the study reveals that the chair and the other participants orient to their dual roles, both institutional and real-life, to configure a semi-institutional setting. Besides, the multimodal resources can be effectively used by the participants, i.e., the host and the guests, to manage contingencies during turn-taking, including visible cues, embodied movements, and pragmatic (in)completion. The findings contribute to our understanding of the dynamics of turn-taking in semi-institutional settings and shed light on the interplay of multimodal resources in larger group conversations. The research expands the existing literature on multiparty institutional conversations in Mandarin
3GPP-Like THz Channel Modeling for Indoor Office and Urban Microcellular Scenarios
Terahertz (THz) communication is envisioned as the possible technology for
the sixth-generation (6G) communication system. THz channel propagation
characteristics are the basis of designing and evaluating for THz communication
system. In this paper, THz channel measurements at 100 GHz and 132 GHz are
conducted in an indoor office scenario and an urban microcellular (UMi)
scenario, respectively. Based on the measurement, the 3GPP-like channel
parameters are extracted and analyzed. Moreover, the parameters models are
available for the simulation of the channel impulse response by the
geometry-based stochastic model (GBSM). Then, the comparisons between
measurement-based parameter models and 3rd Generation Partnership Project
(3GPP) channel models are investigated. It is observed that the case with path
loss approaching free space exists in the NLoS scenario. Besides, the cluster
number are 4 at LoS and 5 at NLoS in the indoor office and 4 at LoS and 3 at
NLoS in the UMi, which are much less than 3GPP. The multipath component (MPC)
in the THz channel distributes more simpler and more sparsely than the 3GPP
millimeter wave (mm-wave) channel models. Furthermore, the ergodic capacity of
mm-wave and THz are evaluated by the proposed THz GBSM implementation
framework. The THz measurement model predicts the smallest capacity, indicating
that high carrier frequency is limited to the single transmission mechanism of
reflection and results in the reduction of cluster numbers and ergodic
capacity. Generally, these results are helpful to understand and model the THz
channel and apply the THz communication technique for 6G.Comment: 13 pages, 12 figures, 3 table
Reasoning over Hierarchical Question Decomposition Tree for Explainable Question Answering
Explainable question answering (XQA) aims to answer a given question and
provide an explanation why the answer is selected. Existing XQA methods focus
on reasoning on a single knowledge source, e.g., structured knowledge bases,
unstructured corpora, etc. However, integrating information from heterogeneous
knowledge sources is essential to answer complex questions. In this paper, we
propose to leverage question decomposing for heterogeneous knowledge
integration, by breaking down a complex question into simpler ones, and
selecting the appropriate knowledge source for each sub-question. To facilitate
reasoning, we propose a novel two-stage XQA framework, Reasoning over
Hierarchical Question Decomposition Tree (RoHT). First, we build the
Hierarchical Question Decomposition Tree (HQDT) to understand the semantics of
a complex question; then, we conduct probabilistic reasoning over HQDT from
root to leaves recursively, to aggregate heterogeneous knowledge at different
tree levels and search for a best solution considering the decomposing and
answering probabilities. The experiments on complex QA datasets KQA Pro and
Musique show that our framework outperforms SOTA methods significantly,
demonstrating the effectiveness of leveraging question decomposing for
knowledge integration and our RoHT framework.Comment: has been accepted by ACL202
A Study of Pulsation properties of 57 Non-Blazhko effect ab-type RR Lyrae stars with homogeneous metallicities from the LAMOST-Kepler/K2 survey
Homogeneous metallicities and continuous high-precision light curves play key
roles in studying the pulsation properties of RR Lyrae stars. By cross-matching
with LAMOST DR6, we have determined 7 and 50 Non-Blazhko RRab stars in the
Kepler and K2 fields, respectively, who have homogeneous metallicities
determined from low-resolution spectra of the LAMOST-Kepler/K2 project. The
Fourier Decomposition method is applied to the light curves of these stars
provided by the Kepler space based telescope to determine the fundamental
pulsation periods and the pulsation parameters. The calculated amplitude ratios
of R21, R31 and the phase differences of {\phi}21, {\phi}31 are consistent with
the parameters of the RRab stars in both the Globular Clusters and the Large
Magellanic Cloud. We find a linear relationship between the phase differences
{\phi}21 and {\phi}31, which is in good agreement with the results in previous
literature. As far as the amplitude, we find that the amplitude of primary
frequency A1 and the total amplitude Atot follow either a cubic or linear
relationship. For the rise time RT, we do not find its relevance with the
period of the fundamental pulsation mode P1, or Atot and {\phi}21. However, it
might follow a linear relationship with R31. Based on the homogeneous
metallicities, we have derived a new calibration formula for the relationship
of period-{\phi}31-[Fe/H], which agrees well with the previous studies
Human Motion Generation: A Survey
Human motion generation aims to generate natural human pose sequences and
shows immense potential for real-world applications. Substantial progress has
been made recently in motion data collection technologies and generation
methods, laying the foundation for increasing interest in human motion
generation. Most research within this field focuses on generating human motions
based on conditional signals, such as text, audio, and scene contexts. While
significant advancements have been made in recent years, the task continues to
pose challenges due to the intricate nature of human motion and its implicit
relationship with conditional signals. In this survey, we present a
comprehensive literature review of human motion generation, which, to the best
of our knowledge, is the first of its kind in this field. We begin by
introducing the background of human motion and generative models, followed by
an examination of representative methods for three mainstream sub-tasks:
text-conditioned, audio-conditioned, and scene-conditioned human motion
generation. Additionally, we provide an overview of common datasets and
evaluation metrics. Lastly, we discuss open problems and outline potential
future research directions. We hope that this survey could provide the
community with a comprehensive glimpse of this rapidly evolving field and
inspire novel ideas that address the outstanding challenges.Comment: 20 pages, 5 figure
Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Streaming voice conversion (VC) is the task of converting the voice of one
person to another in real-time. Previous streaming VC methods use phonetic
posteriorgrams (PPGs) extracted from automatic speech recognition (ASR) systems
to represent speaker-independent information. However, PPGs lack the prosody
and vocalization information of the source speaker, and streaming PPGs contain
undesired leaked timbre of the source speaker. In this paper, we propose to use
intermediate bottleneck features (IBFs) to replace PPGs. VC systems trained
with IBFs retain more prosody and vocalization information of the source
speaker. Furthermore, we propose a non-streaming teacher guidance (TG)
framework that addresses the timbre leakage problem. Experiments show that our
proposed IBFs and the TG framework achieve a state-of-the-art streaming VC
naturalness of 3.85, a content consistency of 3.77, and a timbre similarity of
3.77 under a future receptive field of 160 ms which significantly outperform
previous streaming VC systems.Comment: The paper has been submitted to ICASSP202
Deterministic Ray Tracing: A Promising Approach to THz Channel Modeling in 6G Deployment Scenarios
- …