1,619 research outputs found

    Timbre transfer using image-to-image denoising diffusion implicit models

    Full text link
    Timbre transfer techniques aim at converting the sound of a musical piece generated by one instrument into the same one as if it was played by another instrument, while maintaining as much as possible the content in terms of musical characteristics such as melody and dynamics. Following their recent breakthroughs in deep learning-based generation, we apply Denoising Diffusion Models (DDMs) to perform timbre transfer. Specifically, we apply the recently proposed Denoising Diffusion Implicit Models (DDIMs) that enable to accelerate the sampling procedure. Inspired by the recent application of DDMs to image translation problems we formulate the timbre transfer task similarly, by first converting the audio tracks into log mel spectrograms and by conditioning the generation of the desired timbre spectrogram through the input timbre spectrogram. We perform both one-to-one and many-to-many timbre transfer, by converting audio waveforms containing only single instruments and multiple instruments, respectively. We compare the proposed technique with existing state-of-the-art methods both through listening tests and objective measures in order to demonstrate the effectiveness of the proposed model

    Synthesis of Soundfields through Irregular Loudspeaker Arrays Based on Convolutional Neural Networks

    Full text link
    Most soundfield synthesis approaches deal with extensive and regular loudspeaker arrays, which are often not suitable for home audio systems, due to physical space constraints. In this article we propose a technique for soundfield synthesis through more easily deployable irregular loudspeaker arrays, i.e. where the spacing between loudspeakers is not constant, based on deep learning. The input are the driving signals obtained through a plane wave decomposition-based technique. While the considered driving signals are able to correctly reproduce the soundfield with a regular array, they show degraded performances when using irregular setups. Through a Convolutional Neural Network (CNN) we modify the driving signals in order to compensate the errors in the reproduction of the desired soundfield. Since no ground-truth driving signals are available for the compensated ones, we train the model by calculating the loss between the desired soundfield at a number of control points and the one obtained through the driving signals estimated by the network. Numerical results show better reproduction accuracy both with respect to the plane wave decomposition-based technique and the pressure-matching approach

    Frequency-Sliding Generalized Cross-Correlation: A Sub-band Time Delay Estimation Approach

    Full text link
    The generalized cross correlation (GCC) is regarded as the most popular approach for estimating the time difference of arrival (TDOA) between the signals received at two sensors. Time delay estimates are obtained by maximizing the GCC output, where the direct-path delay is usually observed as a prominent peak. Moreover, GCCs play also an important role in steered response power (SRP) localization algorithms, where the SRP functional can be written as an accumulation of the GCCs computed from multiple sensor pairs. Unfortunately, the accuracy of TDOA estimates is affected by multiple factors, including noise, reverberation and signal bandwidth. In this paper, a sub-band approach for time delay estimation aimed at improving the performance of the conventional GCC is presented. The proposed method is based on the extraction of multiple GCCs corresponding to different frequency bands of the cross-power spectrum phase in a sliding-window fashion. The major contributions of this paper include: 1) a sub-band GCC representation of the cross-power spectrum phase that, despite having a reduced temporal resolution, provides a more suitable representation for estimating the true TDOA; 2) such matrix representation is shown to be rank one in the ideal noiseless case, a property that is exploited in more adverse scenarios to obtain a more robust and accurate GCC; 3) we propose a set of low-rank approximation alternatives for processing the sub-band GCC matrix, leading to better TDOA estimates and source localization performance. An extensive set of experiments is presented to demonstrate the validity of the proposed approach.Comment: Article accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks

    Get PDF
    In this manuscript, we propose a novel method to perform audio inpainting, i.e., the restoration of audio signals presenting multiple missing parts. Audio inpainting can be interpreted in the context of inverse problems as the task of reconstructing an audio signal from its corrupted observation. For this reason, our method is based on a deep prior approach, a recently proposed technique that proved to be effective in the solution of many inverse problems, among which image inpainting. Deep prior allows one to consider the structure of a neural network as an implicit prior and to adopt it as a regularizer. Differently from the classical deep learning paradigm, deep prior performs a single-element training and thus it can be applied to corrupted audio signals independently from the available training data sets. In the context of audio inpainting, a network presenting relevant audio priors will possibly generate a restored version of an audio signal, only provided with its corrupted observation. Our method exploits a time-frequency representation of audio signals and makes use of a multi-resolution convolutional autoencoder, that has been enhanced to perform the harmonic convolution operation. Results show that the proposed technique is able to provide a coherent and meaningful reconstruction of the corrupted audio. It is also able to outperform the methods considered for comparison, in its domain of application

    Reconstruction of Sound Field through Diffusion Models

    Full text link
    Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR). In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. We introduce, for the first time, the use of a conditional Denoising Diffusion Probabilistic Model (DDPM) trained in order to reconstruct the sound field (SF-Diff) over an extended domain. The architecture is devised in order to be conditioned on a set of limited available measurements at different frequencies and generate the sound field in target, unknown, locations. The results show that SF-Diff is able to provide accurate reconstructions, outperforming a state-of-the-art baseline based on kernel interpolation.Comment: Accepted for publication at ICASSP 202

    Wave-Based Analysis of Large Nonlinear Photovoltaic Arrays

    Get PDF
    open4siIn this paper, a novel analysis method based on wave digital (WD) principles is presented. The method is employed for modeling and efficiently simulating large photovoltaic (PV) arrays under partial shading conditions. The WD method allows rapid exploration of the current-voltage curve at the load of the PV array, given: the irradiation pattern, the nonlinear PV unit model (e.g., exponential junction model with bypass diode) and the corresponding parameters. The maximum power point can therefore easily be deduced. The main features of the proposed method are the use of a scattering matrix that is able to incorporate any PV array topology and the adoption of independent 1-D nonlinear solvers to handle the constitutive equations of PV units. It is shown that the WD method can be considered as an iterative relaxation method that always converges to the PV array solution. Rigorous proof of convergence and results about the speed of convergence are provided. Compared with standard spice-like simulators, the WD method results to be 35 times faster for PV arrays made of some thousands elements. This paves the way to possible implementations of the method in specialized hardware/software for the real time control and optimization of complex PV plants.openBernardini, Alberto; Maffezzoni, Paolo; Daniel, Luca; Sarti, AugustoBernardini, Alberto; Maffezzoni, Paolo; Daniel, Luca; Sarti, August

    HandMonizer: a case study for personalized digital musical instrument design

    Get PDF
    The rapid evolution in technology has found its way to introducing novelty in today’s live music performances. In this context, the development of Digital Musical Instruments (DMIs) has obtained increasing attention in recent years. In this paper, we present the development of a DMI called Handmonizer, an interactive artist-oriented harmonizer for musical performance adapted to the needs of a specific singer. A key component of our work is the combination of hand motion recognition and audio signal processing to obtain a smoother interaction. We describe the development methodology, but we also focus on our collaboration with the artist to conceptualize and then refine this tool until the development of the final product. At the end of this paper, we define an evaluation strategy, collecting feedback with a questionnaire addressed to the singer. Our aim in presenting this evaluation strategy is to help other engineers keen to develop cutting-edge technologies by working in partnership with artists. While results are not definitive, we believe that the chosen methodology could be of interest to other DMI researchers. Moreover, the modular nature of the Handmonizer makes it easily adaptable to further developments concerning the Internet of Sounds (IoS) and Networked Music Performances (NMP)

    The effect of humidity on the CO2/N2 separation performance of copolymers based on hard polyimide segments and soft polyether chains: Experimental and modeling

    Get PDF
    In this work, we studied two copolymers formed by segments of a rubbery polyether (PPO or PEO) and of a glassy polyimide (BPDA-ODA or BKDA-ODA) suitable for gas separation and CO2 capture. Firstly, we assessed the absorption of water vapor in the materials, as a function of relative humidity (R.H.), finding that the humidity uptake of the copolymers lies between that of the corresponding pure homopolymers values. Furthermore, we studied the effect of humidity on CO2 and N2 permeability, as well as on CO2/N2 selectivity, up to R.H. of 75%. The permeability decreases with increasing humidity, while the ideal selectivity remains approximately constant in the entire range of water activity investigated. The humidity-induced decrease of permeability in the copolymers is much smaller than the one observed in polyimides such as Matrimid® confirming the positive effect of the polyether phase on the membrane performance.Finally, we modeled the humidity-induced decrease of gas solubility, diffusivity and, consequently, permeability, using a suitable approach that considers the free volume theory for diffusion and LF model for solubility. Such model allows estimating the extent of competition that the gases undergo with water during sorption in the membranes, as a function of the relative humidity, as well as the expected reduction of free volume by means of water molecules occupation and consequent reduction of diffusivity. Keywords: CO2 capture, Humid gas permeation, Transport properties in polymeric membranes, Water vapor sorption, Modelin
    • …
    corecore