351 research outputs found

    High-resolution embedding extractor for speaker diarisation

    Full text link
    Speaker embedding extractors significantly influence the performance of clustering-based speaker diarisation systems. Conventionally, only one embedding is extracted from each speech segment. However, because of the sliding window approach, a segment easily includes two or more speakers owing to speaker change points. This study proposes a novel embedding extractor architecture, referred to as a high-resolution embedding extractor (HEE), which extracts multiple high-resolution embeddings from each speech segment. Hee consists of a feature-map extractor and an enhancer, where the enhancer with the self-attention mechanism is the key to success. The enhancer of HEE replaces the aggregation process; instead of a global pooling layer, the enhancer combines relative information to each frame via attention leveraging the global context. Extracted dense frame-level embeddings can each represent a speaker. Thus, multiple speakers can be represented by different frame-level features in each segment. We also propose an artificially generating mixture data training framework to train the proposed HEE. Through experiments on five evaluation sets, including four public datasets, the proposed HEE demonstrates at least 10% improvement on each evaluation set, except for one dataset, which we analyse that rapid speaker changes less exist.Comment: 5pages, 2 figure, 3 tables, submitted to ICASS

    Absolute decision corrupts absolutely: conservative online speaker diarisation

    Full text link
    Our focus lies in developing an online speaker diarisation framework which demonstrates robust performance across diverse domains. In online speaker diarisation, outputs generated in real-time are irreversible, and a few misjudgements in the early phase of an input session can lead to catastrophic results. We hypothesise that cautiously increasing the number of estimated speakers is of paramount importance among many other factors. Thus, our proposed framework includes decreasing the number of speakers by one when the system judges that an increase in the past was faulty. We also adopt dual buffers, checkpoints and centroids, where checkpoints are combined with silhouette coefficients to estimate the number of speakers and centroids represent speakers. Again, we believe that more than one centroid can be generated from one speaker. Thus we design a clustering-based label matching technique to assign labels in real-time. The resulting system is lightweight yet surprisingly effective. The system demonstrates state-of-the-art performance on DIHARD 2 and 3 datasets, where it is also competitive in AMI and VoxConverse test sets.Comment: 5pages, 2 figure, 4 tables, submitted to ICASS

    Disentangled dimensionality reduction for noise-robust speaker diarisation

    Full text link
    The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation. Speaker embeddings play a crucial role in the performance of diarisation systems, but they often capture spurious information such as noise and reverberation, adversely affecting performance. Our previous work has proposed an auto-encoder-based dimensionality reduction module to help remove the redundant information. However, they do not explicitly separate such information and have also been found to be sensitive to hyper-parameter values. To this end, we propose two contributions to overcome these issues: (i) a novel dimensionality reduction framework that can disentangle spurious information from the speaker embeddings; (ii) the use of a speech/non-speech indicator to prevent the speaker code from representing the background noise. Through a range of experiments conducted on four different datasets, our approach consistently demonstrates the state-of-the-art performance among models without system fusion.Comment: This paper was submitted to Interspeech202

    Encoder-decoder multimodal speaker change detection

    Full text link
    The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are built upon two main proposals, a novel mechanism for modality fusion and the adoption of a encoder-decoder architecture. Different to previous MMSCD works that extract speaker embeddings from extremely short audio segments, aligned to a single word, we use a speaker embedding extracted from 1.5s. A transformer decoder layer further improves the performance of an encoder-only MMSCD model. The proposed model achieves state-of-the-art results among studies that report SCD performance and is also on par with recent work that combines SCD with automatic speech recognition via human transcription.Comment: 5 pages, accepted for presentation at INTERSPEECH 202

    Association between hemoglobin glycation index and cardiometabolic risk factors in Korean pediatric nondiabetic population

    Get PDF
    Purpose The hemoglobin glycation index (HGI) represents the degree of nonenzymatic glycation and has been positively associated with cardiometabolic risk factors (CMRFs) and cardiovascular disease in adults. This study aimed to investigate the association between HGI, components of metabolic syndrome (MS), and alanine aminotransferase (ALT) in a pediatric nondiabetic population. Methods Data from 3,885 subjects aged 10–18 years from the Korea National Health and Nutrition Examination Survey (2011–2016) were included. HGI was defined as subtraction of predicted glycated hemoglobin (HbA1c) from measured HbA1c. Participants were divided into 3 groups according to HGI tertile. Components of MS (abdominal obesity, fasting glucose, triglycerides, high-density lipoprotein cholesterol, and blood pressure), and proportion of MS, CMRF clustering (≥2 of MS components), and elevated ALT were compared among the groups. Results Body mass index (BMI) z-score, obesity, total cholesterol, ALT, abdominal obesity, elevated triglycerides, and CMRF clustering showed increasing HGI trends from lower-to-higher tertiles. Multiple logistic regression analysis showed the upper HGI tertile was associated with elevated triglycerides (odds ratio, 1.65; 95% confidence interval, 1.18–2.30). Multiple linear regression analysis showed HGI level was significantly associated with BMI z-score, HbA1c, triglycerides, and ALT. When stratified by sex, age group, and BMI category, overweight/obese subjects showed linear HGI trends for presence of CMRF clustering and ALT elevation. Conclusions HGI was associated with CMRFs in a Korean pediatric population. High HGI might be an independent risk factor for CMRF clustering and ALT elevation in overweight/obese youth. Further studies are required to establish the clinical relevance of HGI for cardiometabolic health in youth

    Reduced Dose Intensity FOLFOX-4 as First Line Palliative Chemotherapy in Elderly Patients with Advanced Colorectal Cancer

    Get PDF
    To evaluate the toxicity and efficacy of a reduced dose intensity (mini-) FOLFOX-4 regimen as a first-line palliative chemotherapy in elderly patients (≥70 yr of age) with advanced colorectal cancer, data from prospective databases at Seoul National University Bundang Hospital and Seoul Municipal Boramae Hospital were analyzed. A total of 20 patients were enrolled between January 2001 and August 2004, and were treated with oxaliplatin 65 mg/m2 on day 1, and with 2-hr infusions of leucovorin 150 mg/m2 followed by a 5-FU bolus (300 mg/m2) and 22-hr continuous infusions (450 mg/m2) for 2 consecutive days every 2 weeks until progression, unacceptable toxicity or patient refusal. Sixteen patients were evaluable for response with an overall response rate of 43.8%. Median progression-free survival was 4.8 months (95% CI: 3.0-6.7) and overall survival was 13.5 months (95% CI: 11.1-16.0). The main side effects were anemia and neutropenia, which were observed in 20.8% and 17.7%, respectively, of the total cycles administered. There were no grade 4 toxicities and only one patient suffered from febrile neutropenia. No grade 3 toxicities occurred except for anemia (5.2%) and vomiting (1.0%). In conclusion, the mini-FOLFOX-4 regimen was found to be well tolerated with acceptable toxicity, and to provide a benefit for elderly patients with colorectal cancer

    Maternal, neonatal, and child health systems under rapid urbanization: a qualitative study in a suburban district in Vietnam

    Get PDF
    Background Vietnam has been successful in increasing access to maternal, neonatal, and child health (MNCH) services during last decades; however, little is known about whether the primary MNCH service utilization has been properly utilized under the recent rapid urbanization. We aimed to examine current MNCH service utilization patterns at a district level. Methods The study was conducted qualitatively in a rural district named Quốc Oai. Women who gave a birth within a year and medical staff at various levels participated through 43 individual in-depth interviews and 3 focus group interviews. Results Primary MNCH services were underutilized due to a failure to meet increased quality needs. Most of the mothers preferred private clinics for antenatal care and the district hospital for delivery due to the better service quality of these facilities compared to that of the commune health stations (CHSs). Mothers had few sociocultural barriers to acquiring service information or utilizing services based on their improved standard of living. A financial burden for some services, including caesarian section, still existed for uninsured mothers, while their insured counterparts had relatively few difficulties. Conclusions For the improved macro-efficiency of MNCH systems, the government needs to rearrange human resources and/or merge some CHSs to achieve economies of scale and align with service volume distribution across the different levels.This research was financially supported by the JW LEE Center for Global Medicine of Seoul National University College of Medicine, Seoul, South Korea. Vietnam Health System Strengthening project is part of a collaborative project by and JW LEE Center for Global Medicine of Seoul National University College of Medicine, Seoul, South Korea and University of Hanoi, University of Medicine and Pharmacy of Ho Chi Minh City. The funding source had no role in study design, data collection and analysis, interpretation of data, or preparation of the manuscript
    corecore