3,349 research outputs found
BER: Balanced Error Rate For Speaker Diarization
DER is the primary metric to evaluate diarization performance while facing a
dilemma: the errors in short utterances or segments tend to be overwhelmed by
longer ones. Short segments, e.g., `yes' or `no,' still have semantic
information. Besides, DER overlooks errors in less-talked speakers. Although
JER balances speaker errors, it still suffers from the same dilemma.
Considering all those aspects, duration error, segment error, and
speaker-weighted error constituting a complete diarization evaluation, we
propose a Balanced Error Rate (BER) to evaluate speaker diarization. First, we
propose a segment-level error rate (SER) via connected sub-graphs and adaptive
IoU threshold to get accurate segment matching. Second, to evaluate diarization
in a unified way, we adopt a speaker-specific harmonic mean between duration
and segment, followed by a speaker-weighted average. Third, we analyze our
metric via the modularized system, EEND, and the multi-modal method on real
datasets. SER and BER are publicly available at https://github.com/X-LANCE/BER.Comment: 5 pages, 2 figure
Enhancing thermoelectric figure-of-merit by low-dimensional electrical transport in phonon-glass crystals
Low-dimensional electronic and glassy phononic transport are two important
ingredients of highly-efficient thermoelectric material, from which two
branches of the thermoelectric research emerge. One focuses on controlling
electronic transport in the low dimension, while the other on multiscale phonon
engineering in the bulk. Recent work has benefited much from combining these
two approaches, e.g., phonon engineering in low-dimensional materials. Here, we
propose to employ the low-dimensional electronic structure in bulk phonon-glass
crystal as an alternative way to increase the thermoelectric efficiency.
Through first-principles electronic structure calculation and classical
molecular dynamics simulation, we show that the - stacking
Bis-Dithienothiophene molecular crystal is a natural candidate for such an
approach. This is determined by the nature of its chemical bonding. Without any
optimization of the material parameter, we obtain a maximum room-temperature
figure of merit, , of at optimal doping, thus validating our idea.Comment: Nano Lett.201
Focal Mechanism Solutions of the 2008 Wenchuan earthquake sequence from P-wave polarities and SH/P amplitude ratios: new results and implications
The 2008 Wenchuan earthquake, a major intraplate earthquake with Mw 7.9, occurred on the slowly deforming Longmenshan fault. To better understand the causes of this devastating earthquake, we need knowledge of the regional stress field and the underlying geodynamic processes. Here, we determine focal mechanism solutions (FMSs) of the 2008 Wenchuan earthquake sequence (WES) using both P-wave first-motion polarity data and SH/P amplitude ratio (AR) data. As P-wave polarities are more reliable information, they are given priority over SH/P AR, the latter of which are used only when the former has loose constraint on the FMSs. We collect data from three categories: (1) permanent stations deployed by the China Earthquake Administration (CEA); (2) the Western Sichuan Passive Seismic Array (WSPSA) deployed by Institute of Geology, CEA; (3) global stations from Incorporated Research Institutions for Seismology. Finally, 129 events with magnitude over Ms 4.0 in the 2008 WES are identified to have well-constrained FMSs. Among them, 83 are well constrained by P-wave polarities only as shown by Cai et al. (Earthq Sci 24(1):115–125, 2011), and the rest of which are newly constrained by incorporating SH/P AR. Based on the spatial distribution and FMSs of the WES, we draw following conclusions: (1) the principle compressional directions of most FMSs of the WES are subhorizontal, generally in agreement with the conclusion given by Cai et al. (2011) but with a few modifications that the compressional directions are WNW–ESE around Wenchuan and ENE–WSW around Qingchuan, respectively. The subhorizontal compressional direction along the Longmenshan fault from SW to NE seems to have a left-lateral rotation, which agrees well with regional stress field inverted by former researchers (e.g., Xu et al., Acta Seismol Sin 30(5), 1987; Acta Geophys Sin 32(6), 1989; Cui et al., Seismol Geol 27(2):234–242, 2005); (2) the FMSs of the events not only reflected the regional stress state of the Longmenshan region, but also were obviously controlled by the faults to some extent, which was pointed out by Cai et al. (2011) and Yi et al. (Chin J Geophys 55(4):1213–1227, 2012); (3) while the 2008 Wenchuan earthquake and some of its strong aftershocks released most of the elastic energy accumulated on the Longmenshan fault, some other aftershocks seem to occur just for releasing the elastic energy promptly created by the 2008 Wenchuan earthquake and some of its strong aftershocks. (4) Our results further suggest that the Longmenshan fault from Wenchuan to Beichuan was nearly fully destroyed by the 2008 Wenchuan earthquake and accordingly propose that there is less probability for great earthquakes in the middle part of the Longmenshan fault in the near future, although there might be a barrier to the southwest of Wenchuan and it is needed to pay some attention on it in the near future.Wenchuan Fault Scientific Drilling Progra
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder
Generating high-quality and person-generic visual dubbing remains a
challenge. Recent innovation has seen the advent of a two-stage paradigm,
decoupling the rendering and lip synchronization process facilitated by
intermediate representation as a conduit. Still, previous methodologies rely on
rough landmarks or are confined to a single speaker, thus limiting their
performance. In this paper, we propose DiffDub: Diffusion-based dubbing. We
first craft the Diffusion auto-encoder by an inpainting renderer incorporating
a mask to delineate editable zones and unaltered regions. This allows for
seamless filling of the lower-face region while preserving the remaining parts.
Throughout our experiments, we encountered several challenges. Primarily, the
semantic encoder lacks robustness, constricting its ability to capture
high-level features. Besides, the modeling ignored facial positioning, causing
mouth or nose jitters across frames. To tackle these issues, we employ
versatile strategies, including data augmentation and supplementary eye
guidance. Moreover, we encapsulated a conformer-based reference encoder and
motion generator fortified by a cross-attention mechanism. This enables our
model to learn person-specific textures with varying references and reduces
reliance on paired audio-visual data. Our rigorous experiments comprehensively
highlight that our ground-breaking approach outpaces existing methods with
considerable margins and delivers seamless, intelligible videos in
person-generic and multilingual scenarios.Comment: 5 pages, Accepted to ICASSP 202
Comprehensive Molecular Analyses of an SLC Family-Based Model in Stomach Adenocarcinoma
Background: Solute carrier (SLC) family members are crucial in transporting amino acids across membranes. Amino acids are indispensable for both cancer and immune cells. However, the clinical significance of amino acid transporting SLC members in stomach adenocarcinoma (STAD) remains unclear. This study aimed to develop an SLC family-based model to predict the prognosis and the response of STAD patients to immunotherapy.Methods: A total of 1239 tumor cases were obtained from online databases. The training set (n = 371) consisted of RNA sequencing profiles obtained from The Cancer Genome Atlas (TCGA), while those from Gene Expression Omnibus (GEO) were used as the test set. Subsequently, the clinical characteristics and immune profiles were investigated, and potential immunotherapy response prediction values of the model were assessed.Results: Based on the TCGA cohort, an SLC family-based model was developed using multivariate Cox analysis. All tumor cases were stratified into high- and low-risk groups considering the SLC model. High-risk patients had a worse overall survival (OS) than low-risk patients, consistent with the results of GEO cohorts. Comprehensive analyses revealed that the high-risk group was correlated with aggressiveness-related pathways, whereas the low-risk group had better T helper cell infiltration and stronger immunotherapy response. Compared to the high-risk group, the low-risk group presented increased PD-L1 and tumor mutation burden.Conclusion: This SLC family-based model has the potential to predict the prognosis and immunotherapy outcomes of STAD patients. The survival of patients in the low-risk group was greatly prolonged, and the patients may benefit more from immunotherapy
High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark
The extraction of lakes from remote sensing images is a complex challenge due
to the varied lake shapes and data noise. Current methods rely on multispectral
image datasets, making it challenging to learn lake features accurately from
pixel arrangements. This, in turn, affects model learning and the creation of
accurate segmentation masks. This paper introduces a unified prompt-based
dataset construction approach that provides approximate lake locations using
point, box, and mask prompts. We also propose a two-stage prompt enhancement
framework, LEPrompter, which involves prompt-based and prompt-free stages
during training. The prompt-based stage employs a prompt encoder to extract
prior information, integrating prompt tokens and image embeddings through self-
and cross-attention in the prompt decoder. Prompts are deactivated once the
model is trained to ensure independence during inference, enabling automated
lake extraction. Evaluations on Surface Water and Qinghai-Tibet Plateau Lake
datasets show consistent performance improvements compared to the previous
state-of-the-art method. LEPrompter achieves mIoU scores of 91.48% and 97.43%
on the respective datasets without introducing additional parameters or GFLOPs.
Supplementary materials provide the source code, pre-trained models, and
detailed user studies.Comment: 8 pages, 7 figure
- …