22 research outputs found

    Synthesis of Soundfields through Irregular Loudspeaker Arrays Based on Convolutional Neural Networks

    Full text link
    Most soundfield synthesis approaches deal with extensive and regular loudspeaker arrays, which are often not suitable for home audio systems, due to physical space constraints. In this article we propose a technique for soundfield synthesis through more easily deployable irregular loudspeaker arrays, i.e. where the spacing between loudspeakers is not constant, based on deep learning. The input are the driving signals obtained through a plane wave decomposition-based technique. While the considered driving signals are able to correctly reproduce the soundfield with a regular array, they show degraded performances when using irregular setups. Through a Convolutional Neural Network (CNN) we modify the driving signals in order to compensate the errors in the reproduction of the desired soundfield. Since no ground-truth driving signals are available for the compensated ones, we train the model by calculating the loss between the desired soundfield at a number of control points and the one obtained through the driving signals estimated by the network. Numerical results show better reproduction accuracy both with respect to the plane wave decomposition-based technique and the pressure-matching approach

    Timbre transfer using image-to-image denoising diffusion implicit models

    Full text link
    Timbre transfer techniques aim at converting the sound of a musical piece generated by one instrument into the same one as if it was played by another instrument, while maintaining as much as possible the content in terms of musical characteristics such as melody and dynamics. Following their recent breakthroughs in deep learning-based generation, we apply Denoising Diffusion Models (DDMs) to perform timbre transfer. Specifically, we apply the recently proposed Denoising Diffusion Implicit Models (DDIMs) that enable to accelerate the sampling procedure. Inspired by the recent application of DDMs to image translation problems we formulate the timbre transfer task similarly, by first converting the audio tracks into log mel spectrograms and by conditioning the generation of the desired timbre spectrogram through the input timbre spectrogram. We perform both one-to-one and many-to-many timbre transfer, by converting audio waveforms containing only single instruments and multiple instruments, respectively. We compare the proposed technique with existing state-of-the-art methods both through listening tests and objective measures in order to demonstrate the effectiveness of the proposed model

    Frequency-Sliding Generalized Cross-Correlation: A Sub-band Time Delay Estimation Approach

    Full text link
    The generalized cross correlation (GCC) is regarded as the most popular approach for estimating the time difference of arrival (TDOA) between the signals received at two sensors. Time delay estimates are obtained by maximizing the GCC output, where the direct-path delay is usually observed as a prominent peak. Moreover, GCCs play also an important role in steered response power (SRP) localization algorithms, where the SRP functional can be written as an accumulation of the GCCs computed from multiple sensor pairs. Unfortunately, the accuracy of TDOA estimates is affected by multiple factors, including noise, reverberation and signal bandwidth. In this paper, a sub-band approach for time delay estimation aimed at improving the performance of the conventional GCC is presented. The proposed method is based on the extraction of multiple GCCs corresponding to different frequency bands of the cross-power spectrum phase in a sliding-window fashion. The major contributions of this paper include: 1) a sub-band GCC representation of the cross-power spectrum phase that, despite having a reduced temporal resolution, provides a more suitable representation for estimating the true TDOA; 2) such matrix representation is shown to be rank one in the ideal noiseless case, a property that is exploited in more adverse scenarios to obtain a more robust and accurate GCC; 3) we propose a set of low-rank approximation alternatives for processing the sub-band GCC matrix, leading to better TDOA estimates and source localization performance. An extensive set of experiments is presented to demonstrate the validity of the proposed approach.Comment: Article accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks

    Get PDF
    In this manuscript, we propose a novel method to perform audio inpainting, i.e., the restoration of audio signals presenting multiple missing parts. Audio inpainting can be interpreted in the context of inverse problems as the task of reconstructing an audio signal from its corrupted observation. For this reason, our method is based on a deep prior approach, a recently proposed technique that proved to be effective in the solution of many inverse problems, among which image inpainting. Deep prior allows one to consider the structure of a neural network as an implicit prior and to adopt it as a regularizer. Differently from the classical deep learning paradigm, deep prior performs a single-element training and thus it can be applied to corrupted audio signals independently from the available training data sets. In the context of audio inpainting, a network presenting relevant audio priors will possibly generate a restored version of an audio signal, only provided with its corrupted observation. Our method exploits a time-frequency representation of audio signals and makes use of a multi-resolution convolutional autoencoder, that has been enhanced to perform the harmonic convolution operation. Results show that the proposed technique is able to provide a coherent and meaningful reconstruction of the corrupted audio. It is also able to outperform the methods considered for comparison, in its domain of application

    HandMonizer: a case study for personalized digital musical instrument design

    Get PDF
    The rapid evolution in technology has found its way to introducing novelty in today’s live music performances. In this context, the development of Digital Musical Instruments (DMIs) has obtained increasing attention in recent years. In this paper, we present the development of a DMI called Handmonizer, an interactive artist-oriented harmonizer for musical performance adapted to the needs of a specific singer. A key component of our work is the combination of hand motion recognition and audio signal processing to obtain a smoother interaction. We describe the development methodology, but we also focus on our collaboration with the artist to conceptualize and then refine this tool until the development of the final product. At the end of this paper, we define an evaluation strategy, collecting feedback with a questionnaire addressed to the singer. Our aim in presenting this evaluation strategy is to help other engineers keen to develop cutting-edge technologies by working in partnership with artists. While results are not definitive, we believe that the chosen methodology could be of interest to other DMI researchers. Moreover, the modular nature of the Handmonizer makes it easily adaptable to further developments concerning the Internet of Sounds (IoS) and Networked Music Performances (NMP)

    Diversity of greek meningococcal serogroup B isolates and estimated coverage of the 4CMenB meningococcal vaccine

    Get PDF
    International audienceBACKGROUND: Serogroup B meningococcal (MenB) isolates currently account for approximately 90% of invasive meningococcal disease (IMD) in Greece with ST-162 clonal complex predominating. The potential of a multicomponent meningococcal B vaccine (4CMenB) recently licensed in Europe was investigated in order to find whether the aforementioned vaccine will cover the MenB strains circulating in Greece. A panel of 148 serogroup B invasive meningococcal strains was characterized by multilocus sequence typing (MLST) and PorA subtyping. Vaccine components were typed by sequencing for factor H-binding protein (fHbp), Neisserial Heparin Binding Antigen (NHBA) and Neisseria adhesin A (NadA). Their expression was explored by Meningococcal Antigen Typing System (MATS). RESULTS: Global strain coverage predicted by MATS was 89.2% (95% CI 63.5%-98.6%) with 44.6%, 38.5% and 6.1% of strains covered by one, two and three vaccine antigens respectively. NHBA was the antigen responsible for the highest coverage (78.4%), followed by fHbp (52.7%), PorA (8.1%) and NadA (0.7%). The coverage of the major genotypes did not differ significantly. The most prevalent MLST genotype was the ST-162 clonal complex , accounting for 44.6% of the strains in the panel and with a predicted coverage of 86.4%, mainly due to NHBA and fHbp. CONCLUSIONS: 4CMenB has the potential to protect against a significant proportion of Greek invasive MenB strains

    Ray space transform interpolation with convolutional autoencoder

    Get PDF
    In this paper we propose an algorithm for the reconstruction of the Ray Space Transform (RST) through the use of neural networks. In particular, our aim is to reconstruct the magnitude of the RST acquired from a linear microphone array, as if the array were composed by a larger amount of microphones. This is useful for applications that need a higher RST resolution when only a limited amount of microphones can be used due to practical constraints or physical limitations. The proposed solution leverages recent advancements in deep learning as it is based on a fully convolutional autoencoder. To validate our method, we show through a simulative campaign that it is possible to improve sound source localization using the reconstructed RST compared to the use of the original RST

    Investigating Networked Music Performances in Pedagogical Scenarios for the InterMUSIC Project

    Get PDF
    With the big improvement of digital communication networks, Networked Music Performances (NMP) received a great interest from music live performance and music recording industry. The positive impact of NMP in pedagogical appli- cations, instead, has been only preliminary explored. Within the InterMUSIC project, we aim to investigate NMP from a pedagogical perspective, that has considerable differences with respect to music performances, and to develop tools to improve distance learning experiences. In this paper, we introduce a conceptual framework designed to be the foundation for all the experiments conducted in the project. We also present two preliminary experiments that investigate the sense of presence of geographically-distant musicians in a distance learning scenario. We discuss the comments provided by the musicians as a set of requirements and guidelines for future experiments

    Thinkmix: dall'idea alla progettazione dello story network

    Get PDF
    Thinkmix è un network con componenti social che permette agli utenti di creare storie collaborando. La tesi illustra tutti gli step che hanno portato alla realizzazione di questo network: l'idea, la progettazione, lo sviluppo e la pubblicazione. Ritengo si tratti di un progetto ben riuscito e completo che mi ha permesso di esprimere totalmente le capacità acquisite durante il corso nelle varie materie (e non unicamente nella materia di riferimento) e di avvicinarmi a tematiche esterne a una laurea triennale (sicurezza informatica, marketing, acquisto e gestione di server dedicati, burocrazia per la pubblicazione di applicazione iOS e sito web con dominio)
    corecore