3,349 research outputs found

    BER: Balanced Error Rate For Speaker Diarization

    Full text link
    DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones. Short segments, e.g., `yes' or `no,' still have semantic information. Besides, DER overlooks errors in less-talked speakers. Although JER balances speaker errors, it still suffers from the same dilemma. Considering all those aspects, duration error, segment error, and speaker-weighted error constituting a complete diarization evaluation, we propose a Balanced Error Rate (BER) to evaluate speaker diarization. First, we propose a segment-level error rate (SER) via connected sub-graphs and adaptive IoU threshold to get accurate segment matching. Second, to evaluate diarization in a unified way, we adopt a speaker-specific harmonic mean between duration and segment, followed by a speaker-weighted average. Third, we analyze our metric via the modularized system, EEND, and the multi-modal method on real datasets. SER and BER are publicly available at https://github.com/X-LANCE/BER.Comment: 5 pages, 2 figure

    Enhancing thermoelectric figure-of-merit by low-dimensional electrical transport in phonon-glass crystals

    Full text link
    Low-dimensional electronic and glassy phononic transport are two important ingredients of highly-efficient thermoelectric material, from which two branches of the thermoelectric research emerge. One focuses on controlling electronic transport in the low dimension, while the other on multiscale phonon engineering in the bulk. Recent work has benefited much from combining these two approaches, e.g., phonon engineering in low-dimensional materials. Here, we propose to employ the low-dimensional electronic structure in bulk phonon-glass crystal as an alternative way to increase the thermoelectric efficiency. Through first-principles electronic structure calculation and classical molecular dynamics simulation, we show that the π\pi-π\pi stacking Bis-Dithienothiophene molecular crystal is a natural candidate for such an approach. This is determined by the nature of its chemical bonding. Without any optimization of the material parameter, we obtain a maximum room-temperature figure of merit, ZTZT, of 1.481.48 at optimal doping, thus validating our idea.Comment: Nano Lett.201

    Focal Mechanism Solutions of the 2008 Wenchuan earthquake sequence from P-wave polarities and SH/P amplitude ratios: new results and implications

    Get PDF
    The 2008 Wenchuan earthquake, a major intraplate earthquake with Mw 7.9, occurred on the slowly deforming Longmenshan fault. To better understand the causes of this devastating earthquake, we need knowledge of the regional stress field and the underlying geodynamic processes. Here, we determine focal mechanism solutions (FMSs) of the 2008 Wenchuan earthquake sequence (WES) using both P-wave first-motion polarity data and SH/P amplitude ratio (AR) data. As P-wave polarities are more reliable information, they are given priority over SH/P AR, the latter of which are used only when the former has loose constraint on the FMSs. We collect data from three categories: (1) permanent stations deployed by the China Earthquake Administration (CEA); (2) the Western Sichuan Passive Seismic Array (WSPSA) deployed by Institute of Geology, CEA; (3) global stations from Incorporated Research Institutions for Seismology. Finally, 129 events with magnitude over Ms 4.0 in the 2008 WES are identified to have well-constrained FMSs. Among them, 83 are well constrained by P-wave polarities only as shown by Cai et al. (Earthq Sci 24(1):115–125, 2011), and the rest of which are newly constrained by incorporating SH/P AR. Based on the spatial distribution and FMSs of the WES, we draw following conclusions: (1) the principle compressional directions of most FMSs of the WES are subhorizontal, generally in agreement with the conclusion given by Cai et al. (2011) but with a few modifications that the compressional directions are WNW–ESE around Wenchuan and ENE–WSW around Qingchuan, respectively. The subhorizontal compressional direction along the Longmenshan fault from SW to NE seems to have a left-lateral rotation, which agrees well with regional stress field inverted by former researchers (e.g., Xu et al., Acta Seismol Sin 30(5), 1987; Acta Geophys Sin 32(6), 1989; Cui et al., Seismol Geol 27(2):234–242, 2005); (2) the FMSs of the events not only reflected the regional stress state of the Longmenshan region, but also were obviously controlled by the faults to some extent, which was pointed out by Cai et al. (2011) and Yi et al. (Chin J Geophys 55(4):1213–1227, 2012); (3) while the 2008 Wenchuan earthquake and some of its strong aftershocks released most of the elastic energy accumulated on the Longmenshan fault, some other aftershocks seem to occur just for releasing the elastic energy promptly created by the 2008 Wenchuan earthquake and some of its strong aftershocks. (4) Our results further suggest that the Longmenshan fault from Wenchuan to Beichuan was nearly fully destroyed by the 2008 Wenchuan earthquake and accordingly propose that there is less probability for great earthquakes in the middle part of the Longmenshan fault in the near future, although there might be a barrier to the southwest of Wenchuan and it is needed to pay some attention on it in the near future.Wenchuan Fault Scientific Drilling Progra

    DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder

    Full text link
    Generating high-quality and person-generic visual dubbing remains a challenge. Recent innovation has seen the advent of a two-stage paradigm, decoupling the rendering and lip synchronization process facilitated by intermediate representation as a conduit. Still, previous methodologies rely on rough landmarks or are confined to a single speaker, thus limiting their performance. In this paper, we propose DiffDub: Diffusion-based dubbing. We first craft the Diffusion auto-encoder by an inpainting renderer incorporating a mask to delineate editable zones and unaltered regions. This allows for seamless filling of the lower-face region while preserving the remaining parts. Throughout our experiments, we encountered several challenges. Primarily, the semantic encoder lacks robustness, constricting its ability to capture high-level features. Besides, the modeling ignored facial positioning, causing mouth or nose jitters across frames. To tackle these issues, we employ versatile strategies, including data augmentation and supplementary eye guidance. Moreover, we encapsulated a conformer-based reference encoder and motion generator fortified by a cross-attention mechanism. This enables our model to learn person-specific textures with varying references and reduces reliance on paired audio-visual data. Our rigorous experiments comprehensively highlight that our ground-breaking approach outpaces existing methods with considerable margins and delivers seamless, intelligible videos in person-generic and multilingual scenarios.Comment: 5 pages, Accepted to ICASSP 202

    Comprehensive Molecular Analyses of an SLC Family-Based Model in Stomach Adenocarcinoma

    Get PDF
    Background: Solute carrier (SLC) family members are crucial in transporting amino acids across membranes. Amino acids are indispensable for both cancer and immune cells. However, the clinical significance of amino acid transporting SLC members in stomach adenocarcinoma (STAD) remains unclear. This study aimed to develop an SLC family-based model to predict the prognosis and the response of STAD patients to immunotherapy.Methods: A total of 1239 tumor cases were obtained from online databases. The training set (n = 371) consisted of RNA sequencing profiles obtained from The Cancer Genome Atlas (TCGA), while those from Gene Expression Omnibus (GEO) were used as the test set. Subsequently, the clinical characteristics and immune profiles were investigated, and potential immunotherapy response prediction values of the model were assessed.Results: Based on the TCGA cohort, an SLC family-based model was developed using multivariate Cox analysis. All tumor cases were stratified into high- and low-risk groups considering the SLC model. High-risk patients had a worse overall survival (OS) than low-risk patients, consistent with the results of GEO cohorts. Comprehensive analyses revealed that the high-risk group was correlated with aggressiveness-related pathways, whereas the low-risk group had better T helper cell infiltration and stronger immunotherapy response. Compared to the high-risk group, the low-risk group presented increased PD-L1 and tumor mutation burden.Conclusion: This SLC family-based model has the potential to predict the prognosis and immunotherapy outcomes of STAD patients. The survival of patients in the low-risk group was greatly prolonged, and the patients may benefit more from immunotherapy

    High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark

    Full text link
    The extraction of lakes from remote sensing images is a complex challenge due to the varied lake shapes and data noise. Current methods rely on multispectral image datasets, making it challenging to learn lake features accurately from pixel arrangements. This, in turn, affects model learning and the creation of accurate segmentation masks. This paper introduces a unified prompt-based dataset construction approach that provides approximate lake locations using point, box, and mask prompts. We also propose a two-stage prompt enhancement framework, LEPrompter, which involves prompt-based and prompt-free stages during training. The prompt-based stage employs a prompt encoder to extract prior information, integrating prompt tokens and image embeddings through self- and cross-attention in the prompt decoder. Prompts are deactivated once the model is trained to ensure independence during inference, enabling automated lake extraction. Evaluations on Surface Water and Qinghai-Tibet Plateau Lake datasets show consistent performance improvements compared to the previous state-of-the-art method. LEPrompter achieves mIoU scores of 91.48% and 97.43% on the respective datasets without introducing additional parameters or GFLOPs. Supplementary materials provide the source code, pre-trained models, and detailed user studies.Comment: 8 pages, 7 figure
    • …
    corecore