50 research outputs found

    Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification

    Full text link
    Speaker identification (SID) in the household scenario (e.g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances. Conventional speaker recognition systems generalize from a large random sample of speakers, causing the recognition to underperform for households drawn from specific cohorts or otherwise exhibiting high confusability. In this work, we propose a graph-based semi-supervised learning approach to improve household-level SID accuracy and robustness with locally adapted graph normalization and multi-signal fusion with multi-view graphs. Unlike other work on household SID, fairness, and signal fusion, this work focuses on speaker label inference (scoring) and provides a simple solution to realize household-specific adaptation and multi-signal fusion without tuning the embeddings or training a fusion network. Experiments on the VoxCeleb dataset demonstrate that our approach consistently improves the performance across households with different customer cohorts and degrees of confusability.Comment: To appear in Interspeech 2022. arXiv admin note: text overlap with arXiv:2106.0820

    Adaptive Endpointing with Deep Contextual Multi-armed Bandits

    Full text link
    Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search. Our method does not require ground truth labels, and only uses online learning from reward signals without requiring annotated labels. Specifically, we propose a deep contextual multi-armed bandit-based approach, which combines the representational power of neural networks with the action exploration behavior of Thompson modeling algorithms. We compare our approach to several baselines, and show that our deep bandit models also succeed in reducing early cutoff errors while maintaining low latency

    Cross-utterance ASR Rescoring with Graph-based Label Propagation

    Full text link
    We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity. In contrast to conventional neural language model (LM) based ASR rescoring/reranking models, our approach focuses on acoustic information and conducts the rescoring collaboratively among utterances, instead of individually. Experiments on the VCTK dataset demonstrate that our approach consistently improves ASR performance, as well as fairness across speaker groups with different accents. Our approach provides a low-cost solution for mitigating the majoritarian bias of ASR systems, without the need to train new domain- or accent-specific models.Comment: To appear in IEEE ICASSP 202

    Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

    Full text link
    Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments. Recent advancements in Text-to-Speech (TTS) synthesis-based augmentation for more fair SLU have struggled to accurately capture the unique vocal characteristics of atypical speakers, largely due to insufficient data. To address this issue, we present a novel data augmentation method for atypical speakers by finetuning a TTS model, called Aty-TTS. Aty-TTS models speaker and atypical characteristics via knowledge transferring from a voice conversion model. Then, we use the augmented data to train SLU models adapted to atypical speech. To train these data augmentation models and evaluate the resulting SLU systems, we have collected a new atypical speech dataset containing intent annotation. Both objective and subjective assessments validate that Aty-TTS is capable of generating high-quality atypical speech. Furthermore, it serves as an effective data augmentation strategy, contributing to more fair SLU systems that can better accommodate individuals with atypical speech patterns.Comment: Accepted at SyntheticData4ML 2023 Ora

    Monitoring of Multi-Aspect Drought Severity and Socio-Economic Status in the Semi-Arid Regions of Eastern Tamil Nadu, India

    Get PDF
    A framework was set up to monitor drought in the semi-arid regions of eastern Tamil Nadu, southern India, for the period of 2014–2018 CE with the application of the standardized precipitation index (SPI), the scaled drought-condition index (SDCI), and the standardized water-level index (SWI). The results emphasized that this region had a negative precipitation anomaly and vegetative stress, both of which triggered meteorological and agricultural droughts and caused significant losses in the farming sector. The distributions of extreme and high-level hydrological droughts were at their maximum in 2017 CE. The multi-drought severity index (MDSI), implemented to assess the combined impact and highlighting the gradient of affected areas, illustrated that the eastern region (i.e., Jayankondam block) was the most extremely affected, followed by the northern and southern regions (i.e., T.Palur and Andimadam), which were moderately affected by droughts. The extremely affected eastern region has less of an ability to overcome droughts due to its socio-economic vulnerability, with its greater population and household density leading to the over-exploitation of potential resources. Therefore, the focus of this study is on the monitoring of drought severity in micro-administrative units to suggest an appropriate management plan. Hence, the extreme-drought- prone block (Jayankondam) should be given high priority in monitoring and implementing long-term management practices for its conservation and resilience against the effects of severe droughts

    Solvent volume dependent physical properties and electrocatalytic ability of nebulizer spray deposited CuInGaS 2 counter electrode for dye-sensitized solar cells

    Get PDF
    CuInGaS2 (CIGS) thin films were coated using nebulizer spray technique for different solvent volumes (10, 30, 50 and 70 ml) at the substrate temperature of 350 °C. The structural, optical and electrical properties were studied for the prepared CIGS thin films. CIGS thin films exhibited tetragonal structure and the maximum crystallite size was calculated for the film deposited using 50 ml solvent volume. The surface morphology of CIGS thin films was analyzed from scanning electron microscopy and atomic force microscopy studies. The electrical parameters of CIGS thin films such as resistivity, carrier concentration and mobility were examined using four probe method and Hall measurements. Electrocatalytic activities of the CIGS films towards redox couple (I−/I3−) were analyzed by cyclic voltammograms, electrochemical impedance spectroscopy, and Tafel polarization measurements. The high photocurrent efficiency was obtained for the CIGS counter electrode prepared using 50 ml solvent volume

    Solvent volume-driven CuInAlS2 nanoflake counter electrode for effective electrocatalytic tri-iodide reduction in dye-sensitized solar cells

    Get PDF
    The influence of solvent volume on the properties of CuInAlS2 (CIAS) thin films deposited using simple and cost-effective nebulizer spray technique is studied. The polycrystalline CIAS thin films with tetragonal structure have been observed from the XRD results. SEM images show nanoflake-like structure on the film surface. The elemental presence and its chemical composition were examined by XPS and EDS. The deposited CIAS film for different solvent volume exhibited p-type semiconductor. Cyclic voltammetry, electrochemical impedance spectroscopy, and Tafel polarization measurements demonstrated that CIAS counter electrodes are capable of tri-iodide reduction process. The performances of photocurrent density-voltage for the CIAS CE exhibited the maximum efficiency of 2.55% with the short-circuit current density of 7.22 mA cm−2

    Low-cost and eco-friendly nebulizer spray coated CuInAlS 2 counter electrode for dye-sensitized solar cells

    Get PDF
    CuInAlS2 thin films for different substrate temperatures were deposited by a novel nebulizer spray technique. The polycrystalline CIAS thin film exhibited tetragonal structure with the preferential orientation of (1 1 2) plane. Nanoflakes were observed from the surface morphology of CIAS film. The peak position of core level spectra confirms the presence of CuInAlS2 from XPS analysis. The absorbance spectra and optical band gap were observed from the optical property. The activation energy, carrier concentration, hole mobility and resistivity were determined by linear four probe and Hall effect measurements. The CIAS film was used as a counter electrode (CE) in dye-sensitized solar cells (DSSCs) and is characterized by cyclic voltammetry, electrochemical impedance spectroscopy and Tafel measurements. DSSC fabricated with the CIAS CE achieved the photo conversion efficiency of about 2.55%

    Facile preparation of hierarchical nanostructured CuInS2 counter electrodes for dye-sensitized solar cells

    Get PDF
    CuInS2 (CIS) thin films have been synthesized onto the glass substrates for different solvent volumes (10, 30, 50 and 70 ml) by nebulizer spray technique. The effect of solvent volume on the structural, morphological, compositional, optical and electrical properties of CIS thin films has been investigated. X-ray diffraction patterns suggest that the obtained CIS films are polycrystalline with the tetragonal structure. The surface morphology of the prepared CIS films purely depends on the solvent volume. The elemental quantitative investigation and the stoichiometric ratio of the CIS thin films were verified from XPS and EDS. High absorbance with the optical band gap of 1.13 eV was obtained at the higher solvent volume. All the deposited CIS thin films exhibited p-type semiconducting behavior with the high electrical conductivity and carrier concentration. CIS thin films deposited onto the FTO substrate were used as a counter electrode (CE) in dye-sensitized solar cells. CIS CEs possessed high electrocatalytic behavior and fast electron charge transfer at the CE/electrolyte interface. The CIS CE prepared using 50 ml solvent volume generated high energy conversion efficiency of about 3.25%
    corecore