Search CORE

37 research outputs found

On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

Author: Alstrøm Tommy Sonne
Gonzalez Philippe
May Tobias
Publication venue
Publication date: 25/01/2023
Field of study

The performance of neural network-based speech enhancement systems is primarily influenced by the model architecture, whereas training times and computational resource utilization are primarily affected by training parameters such as the batch size. Since noisy and reverberant speech mixtures can have different duration, a batching strategy is required to handle variable size inputs during training, in particular for state-of-the-art end-to-end systems. Such strategies usually strive a compromise between zero-padding and data randomization, and can be combined with a dynamic batch size for a more consistent amount of data in each batch. However, the effect of these practices on resource utilization and more importantly network performance is not well documented. This paper is an empirical study of the effect of different batching strategies and batch sizes on the training statistics and speech enhancement performance of a Conv-TasNet, evaluated in both matched and mismatched conditions. We find that using a small batch size during training improves performance in both conditions for all batching strategies. Moreover, using sorted or bucket batching with a dynamic batch size allows for reduced training time and GPU memory usage while achieving similar performance compared to random batching with a fixed batch size

arXiv.org e-Print Archive

Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler

Author: Alstrøm Tommy Sonne
Gonzalez Philippe
Jensen Jesper
May Tobias
Tan Zheng-Hua
Østergaard Jan
Publication venue
Publication date: 16/01/2024
Field of study

Diffusion models are a new class of generative models that have recently been applied to speech enhancement successfully. Previous works have demonstrated their superior performance in mismatched conditions compared to state-of-the art discriminative models. However, this was investigated with a single database for training and another one for testing, which makes the results highly dependent on the particular databases. Moreover, recent developments from the image generation literature remain largely unexplored for speech enhancement. These include several design aspects of diffusion models, such as the noise schedule or the reverse sampler. In this work, we systematically assess the generalization performance of a diffusion-based speech enhancement model by using multiple speech, noise and binaural room impulse response (BRIR) databases to simulate mismatched acoustic conditions. We also experiment with a noise schedule and a sampler that have not been applied to speech enhancement before. We show that the proposed system substantially benefits from using multiple databases for training, and achieves superior performance compared to state-of-the-art discriminative models in both matched and mismatched conditions. We also show that a Heun-based sampler achieves superior performance at a smaller computational cost compared to a sampler commonly used for speech enhancement.Comment: Accepted to ICASSP 202

arXiv.org e-Print Archive

Investigating the Design Space of Diffusion Models for Speech Enhancement

Author: Alstrøm Tommy Sonne
Gonzalez Philippe
Jensen Jesper
May Tobias
Tan Zheng-Hua
Østergaard Jan
Publication venue
Publication date: 07/12/2023
Field of study

Diffusion models are a new class of generative models that have shown outstanding performance in image generation literature. As a consequence, studies have attempted to apply diffusion models to other tasks, such as speech enhancement. A popular approach in adapting diffusion models to speech enhancement consists in modelling a progressive transformation between the clean and noisy speech signals. However, one popular diffusion model framework previously laid in image generation literature did not account for such a transformation towards the system input, which prevents from relating the existing diffusion-based speech enhancement systems with the aforementioned diffusion model framework. To address this, we extend this framework to account for the progressive transformation between the clean and noisy speech signals. This allows us to apply recent developments from image generation literature, and to systematically investigate design aspects of diffusion models that remain largely unexplored for speech enhancement, such as the neural network preconditioning, the training loss weighting, the stochastic differential equation (SDE), or the amount of stochasticity injected in the reverse process. We show that the performance of previous diffusion-based speech enhancement systems cannot be attributed to the progressive transformation between the clean and noisy speech signals. Moreover, we show that a proper choice of preconditioning, training loss weighting, SDE and sampler allows to outperform a popular diffusion-based speech enhancement system in terms of perceptual metrics while using fewer sampling steps, thus reducing the computational cost by a factor of four

arXiv.org e-Print Archive

Bubble merging in breathing DNA as a vicious walker problem in opposite potentials

Author: Cantor C. R.
Delcourt S. G.
Gardiner C. W.
Jonas Nyvold Pedersen
Kornberg A.
Kornberg A.
Mikael Sonne Hansen
Poland D.
Ralf Metzler
Tobias Ambjörnsson
Tomáš Novotný
van Kampen N. G.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2009
Field of study

We investigate the coalescence of two DNA-bubbles initially located at weak domains and separated by a more stable barrier region in a designed construct of double-stranded DNA. In a continuum Fokker-Planck approach, the characteristic time for bubble coalescence and the corresponding distribution are derived, as well as the distribution of coalescence positions along the barrier. Below the melting temperature, we find a Kramers-type barrier crossing behavior, while at high temperatures, the bubble corners perform drift-diffusion towards coalescence. In the calculations, we map the bubble dynamics on the problem of two vicious walkers in opposite potentials. We also present a discrete master equation approach to the bubble coalescence problem. Numerical evaluation and stochastic simulation of the master equation show excellent agreement with the results from the continuum approach. Given that the coalesced state is thermodynamically stabilized against a state where only one or a few base pairs of the barrier region are re-established, it appears likely that this type of setup could be useful for the quantitative investigation of thermodynamic DNA stability data as well as the rate constants involved in the unzipping and zipping dynamics of DNA, in single molecule fluorescence experiments.Comment: 24 pages, 11 figures; substantially extended version of cond-mat/0610752; v2: minor text changes, virtually identical to the published versio

arXiv.org e-Print Archive

Lund University Publications

Crossref

Online Research Database In Technology

Seroprevalence of avian influenza in Baltic common eiders (<i>Somateria mollissima</i>) and pink-footed geese (<i>Anser brachyrhynchus</i>)

Author: Andersen-Ranberg Emilie U.
Charbonneaux Maël
Christensen Thomas Kjær
Daugaard-Petersen Tobias
Dietz Rune
Garbus Svend Erik
Lam Su Shiung
Lierz Michael
Lyngs Peter
Madsen Jesper
Maier-Sam Kristina
Ortiz Jose Maria Castaño
Peng Wanxi
Rivas Esteban Iglesias
Siebert Ursula
Sonne Christian
Therkildsen Ole Roland
Tjørnløv Rune Skjold
Tombre Ingunn M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Copenhagen University Research Information System

Race By Hearts

Author: Jensen Mads,
Sonne Tobias
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2014
Field of study

Part 2: Entertainment for Purpose and PersuasionInternational audienceIn this paper, we explore the qualities of sharing biometric data in real-time between athletes, in order to increase two motivational factors for gym-goers: Enjoyment and social interaction. We present a novel smartphone application, called Race By Hearts, which enables competition based on heart rate data sharing between users in real-time. Through an empirical study conducted in the gym, we show that sharing biometric data in real-time can strengthen social relations between participants, increase motivation, and improve the enjoyment of the fitness activity. Nevertheless, we found that introducing competition based on real-time sharing of biometric data can cause exasperation and discouragement for some athletes. Based on our findings from the study, we discuss how technology can facilitate and modify competition in fitness exercises in general

On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

Author: Alstrøm Tommy Sonne
Gonzalez Philippe
May Tobias
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

The performance of neural network-based speech enhancement systems is primarily influenced by the model architecture, whereas training times and computational resource utilization are primarily affected by training parameters such as the batch size. Since noisy and reverberant speech mixtures can have different duration, a batching strategy is required to handle variable size inputs during training, in particular for state-of-the-art end-to-end systems. Such strategies usually strive for a compromise between zero-padding and data randomization, and can be combined with a dynamic batch size for a more consistent amount of data in each batch. However, the effect of these strategies on resource utilization and more importantly network performance is not well documented. This paper systematically investigates the effect of different batching strategies and batch sizes on the training statistics and speech enhancement performance of a Conv-TasNet, evaluated in both matched and mismatched conditions. We find that using a small batch size during training improves performance in both conditions for all batching strategies. Moreover, using sorted or bucket batching with a dynamic batch size allows for reduced training time and GPU memory usage while achieving similar performance compared to random batching with a fixed batch size.<br/

Online Research Database In Technology

Assessing the generalization gap of a deep neural network-based speech enhancement system in noisy and reverberant environments

Author: Alstrøm Tommy Sonne
Gonzalez Philippe
May Tobias
Publication venue
Publication date: 01/01/2022
Field of study

Online Research Database In Technology

The experience onboard the Oslo ferry

Author: Pedersen Emil Stald
Rostrup Tobias Nørgaard
Sonne Laura Bjørndal
Publication venue
Publication date: 01/01/2023
Field of study

Roskilde Universitet

A follow-up study of a successful assistive technology for children with ADHD and their families

Author: Grønbæk Kaj
Marshall Paul
Müller Jörg
Obel Carsten
Sonne Tobias
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/06/2016
Field of study

Little research on assistive technologies for families of children with attention deficit hyperactivity disorder (ADHD) has investigated the long-term impact, after the assistive technology is returned to the researchers. In this paper, we report the outcomes of a follow-up study, conducted four-weeks after a field study of 13 children with ADHD and their families who used an assistive technology designed to help establish and change family practices. We show that some of the positive effects on parent frustration level and conflict level around morning and bedtime routines that we observed in the first phase of the study, continued even after the study period, when the technology was no longer available. We furthermore present insights into family practices in families of children with ADHD and how these could lead to unexpected challenges and implications related to the adoption, use, and outcome of the assistive technology

Crossref

UCL Discovery

Explore Bristol Research