13,810 research outputs found
Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization
Audio-visual learning has demonstrated promising results in many classical
speech tasks (e.g., speech separation, automatic speech recognition, wake-word
spotting). We believe that introducing visual modality will also benefit
speaker diarization. To date, Target-Speaker Voice Activity Detection (TS-VAD)
plays an important role in highly accurate speaker diarization. However,
previous TS-VAD models take audio features and utilize the speaker's acoustic
footprint to distinguish his or her personal speech activities, which is easily
affected by overlapped speech in multi-speaker scenarios. Although visual
information naturally tolerates overlapped speech, it suffers from spatial
occlusion, low resolution, etc. The potential modality-missing problem blocks
TS-VAD towards an audio-visual approach.
This paper proposes a novel Multi-Input Multi-Output Target-Speaker Voice
Activity Detection (MIMO-TSVAD) framework for speaker diarization. The proposed
method can take audio-visual input and leverage the speaker's acoustic
footprint or lip track to flexibly conduct audio-based, video-based, and
audio-visual speaker diarization in a unified sequence-to-sequence framework.
Experimental results show that the MIMO-TSVAD framework demonstrates
state-of-the-art performance on the VoxConverse, DIHARD-III, and MISP 2022
datasets under corresponding evaluation metrics, obtaining the Diarization
Error Rates (DERs) of 4.18%, 10.10%, and 8.15%, respectively. In addition, it
can perform robustly in heavy lip-missing scenarios.Comment: Under review of IEEE/ACM Transactions on Audio, Speech, and Language
Processin
LO-Net: Deep Real-time Lidar Odometry
We present a novel deep convolutional network pipeline, LO-Net, for real-time
lidar odometry estimation. Unlike most existing lidar odometry (LO) estimations
that go through individually designed feature selection, feature matching, and
pose estimation pipeline, LO-Net can be trained in an end-to-end manner. With a
new mask-weighted geometric constraint loss, LO-Net can effectively learn
feature representation for LO estimation, and can implicitly exploit the
sequential dependencies and dynamics in the data. We also design a scan-to-map
module, which uses the geometric and semantic information learned in LO-Net, to
improve the estimation accuracy. Experiments on benchmark datasets demonstrate
that LO-Net outperforms existing learning based approaches and has similar
accuracy with the state-of-the-art geometry-based approach, LOAM
Holographic entanglement of purification for thermofield double states and thermal quench
We explore the properties of holographic entanglement of purification (EoP)
for two disjoint strips in the Schwarzschild-AdS black brane and the Vaidya-AdS
black brane spacetimes. For two given strips on the same boundary of
Schwarzschild-AdS spacetime, there is an upper bound of the separation beyond
which the holographic EoP will always vanish no matter how wide the strips are.
In the case that two strips are in the two boundaries of the spacetime
respectively, we find that the holographic EoP exists only when the strips are
wide enough. If the width is finite, the EoP can be nonzero in a finite time
region. For thermal quench case, we find that the equilibrium time of
holographic EoP is only sensitive to the width of strips, while that of the
holographic mutual information is sensitive not only to the width of strips but
also to their separation.Comment: 23 pages, 12 figures, major correction of section
Nuclear stopping and sideward-flow correlation from 0.35A to 200A GeV
The correlation between the nuclear stopping and the scale invariant nucleon
sideward flow at energies ranging from those available at the GSI heavy ion
synchrotron (SIS) to those at the CERN Super Proton Synchrotron (SPS) is
studied within ultrarelativistic quantum molecular dynamics (UrQMD). The
universal behavior of the two experimental observables for various colliding
systems and scale impact parameters are found to be highly correlated with each
other. As there is no phase transition mechanism involved in the UrQMD, the
correlation may be broken down by the sudden change of the bulk properties of
the nuclear matter, such as the formation of quark-gluon plasma (QGP), which
can be employed as a QGP phase transition signal in high-energy heavy ion
collisions. Furthermore, we also point out that the appearance of a breakdown
of the correlation may be a powerful tool for searching for the critical point
on the QCD phase diagram.Comment: 5 pages, 4 figure
- …