37 research outputs found

    On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

    Full text link
    The performance of neural network-based speech enhancement systems is primarily influenced by the model architecture, whereas training times and computational resource utilization are primarily affected by training parameters such as the batch size. Since noisy and reverberant speech mixtures can have different duration, a batching strategy is required to handle variable size inputs during training, in particular for state-of-the-art end-to-end systems. Such strategies usually strive a compromise between zero-padding and data randomization, and can be combined with a dynamic batch size for a more consistent amount of data in each batch. However, the effect of these practices on resource utilization and more importantly network performance is not well documented. This paper is an empirical study of the effect of different batching strategies and batch sizes on the training statistics and speech enhancement performance of a Conv-TasNet, evaluated in both matched and mismatched conditions. We find that using a small batch size during training improves performance in both conditions for all batching strategies. Moreover, using sorted or bucket batching with a dynamic batch size allows for reduced training time and GPU memory usage while achieving similar performance compared to random batching with a fixed batch size

    Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler

    Full text link
    Diffusion models are a new class of generative models that have recently been applied to speech enhancement successfully. Previous works have demonstrated their superior performance in mismatched conditions compared to state-of-the art discriminative models. However, this was investigated with a single database for training and another one for testing, which makes the results highly dependent on the particular databases. Moreover, recent developments from the image generation literature remain largely unexplored for speech enhancement. These include several design aspects of diffusion models, such as the noise schedule or the reverse sampler. In this work, we systematically assess the generalization performance of a diffusion-based speech enhancement model by using multiple speech, noise and binaural room impulse response (BRIR) databases to simulate mismatched acoustic conditions. We also experiment with a noise schedule and a sampler that have not been applied to speech enhancement before. We show that the proposed system substantially benefits from using multiple databases for training, and achieves superior performance compared to state-of-the-art discriminative models in both matched and mismatched conditions. We also show that a Heun-based sampler achieves superior performance at a smaller computational cost compared to a sampler commonly used for speech enhancement.Comment: Accepted to ICASSP 202

    Investigating the Design Space of Diffusion Models for Speech Enhancement

    Full text link
    Diffusion models are a new class of generative models that have shown outstanding performance in image generation literature. As a consequence, studies have attempted to apply diffusion models to other tasks, such as speech enhancement. A popular approach in adapting diffusion models to speech enhancement consists in modelling a progressive transformation between the clean and noisy speech signals. However, one popular diffusion model framework previously laid in image generation literature did not account for such a transformation towards the system input, which prevents from relating the existing diffusion-based speech enhancement systems with the aforementioned diffusion model framework. To address this, we extend this framework to account for the progressive transformation between the clean and noisy speech signals. This allows us to apply recent developments from image generation literature, and to systematically investigate design aspects of diffusion models that remain largely unexplored for speech enhancement, such as the neural network preconditioning, the training loss weighting, the stochastic differential equation (SDE), or the amount of stochasticity injected in the reverse process. We show that the performance of previous diffusion-based speech enhancement systems cannot be attributed to the progressive transformation between the clean and noisy speech signals. Moreover, we show that a proper choice of preconditioning, training loss weighting, SDE and sampler allows to outperform a popular diffusion-based speech enhancement system in terms of perceptual metrics while using fewer sampling steps, thus reducing the computational cost by a factor of four

    Bubble merging in breathing DNA as a vicious walker problem in opposite potentials

    Get PDF
    We investigate the coalescence of two DNA-bubbles initially located at weak domains and separated by a more stable barrier region in a designed construct of double-stranded DNA. In a continuum Fokker-Planck approach, the characteristic time for bubble coalescence and the corresponding distribution are derived, as well as the distribution of coalescence positions along the barrier. Below the melting temperature, we find a Kramers-type barrier crossing behavior, while at high temperatures, the bubble corners perform drift-diffusion towards coalescence. In the calculations, we map the bubble dynamics on the problem of two vicious walkers in opposite potentials. We also present a discrete master equation approach to the bubble coalescence problem. Numerical evaluation and stochastic simulation of the master equation show excellent agreement with the results from the continuum approach. Given that the coalesced state is thermodynamically stabilized against a state where only one or a few base pairs of the barrier region are re-established, it appears likely that this type of setup could be useful for the quantitative investigation of thermodynamic DNA stability data as well as the rate constants involved in the unzipping and zipping dynamics of DNA, in single molecule fluorescence experiments.Comment: 24 pages, 11 figures; substantially extended version of cond-mat/0610752; v2: minor text changes, virtually identical to the published versio

    Race By Hearts

    No full text
    Part 2: Entertainment for Purpose and PersuasionInternational audienceIn this paper, we explore the qualities of sharing biometric data in real-time between athletes, in order to increase two motivational factors for gym-goers: Enjoyment and social interaction. We present a novel smartphone application, called Race By Hearts, which enables competition based on heart rate data sharing between users in real-time. Through an empirical study conducted in the gym, we show that sharing biometric data in real-time can strengthen social relations between participants, increase motivation, and improve the enjoyment of the fitness activity. Nevertheless, we found that introducing competition based on real-time sharing of biometric data can cause exasperation and discouragement for some athletes. Based on our findings from the study, we discuss how technology can facilitate and modify competition in fitness exercises in general

    On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

    No full text
    The performance of neural network-based speech enhancement systems is primarily influenced by the model architecture, whereas training times and computational resource utilization are primarily affected by training parameters such as the batch size. Since noisy and reverberant speech mixtures can have different duration, a batching strategy is required to handle variable size inputs during training, in particular for state-of-the-art end-to-end systems. Such strategies usually strive for a compromise between zero-padding and data randomization, and can be combined with a dynamic batch size for a more consistent amount of data in each batch. However, the effect of these strategies on resource utilization and more importantly network performance is not well documented. This paper systematically investigates the effect of different batching strategies and batch sizes on the training statistics and speech enhancement performance of a Conv-TasNet, evaluated in both matched and mismatched conditions. We find that using a small batch size during training improves performance in both conditions for all batching strategies. Moreover, using sorted or bucket batching with a dynamic batch size allows for reduced training time and GPU memory usage while achieving similar performance compared to random batching with a fixed batch size.<br/

    A follow-up study of a successful assistive technology for children with ADHD and their families

    Get PDF
    Little research on assistive technologies for families of children with attention deficit hyperactivity disorder (ADHD) has investigated the long-term impact, after the assistive technology is returned to the researchers. In this paper, we report the outcomes of a follow-up study, conducted four-weeks after a field study of 13 children with ADHD and their families who used an assistive technology designed to help establish and change family practices. We show that some of the positive effects on parent frustration level and conflict level around morning and bedtime routines that we observed in the first phase of the study, continued even after the study period, when the technology was no longer available. We furthermore present insights into family practices in families of children with ADHD and how these could lead to unexpected challenges and implications related to the adoption, use, and outcome of the assistive technology
    corecore