Search CORE

10 research outputs found

Differentiable Time-Frequency Scattering on GPU

Author: Fazekas George
Han Han
Lagrange Mathieu
Lostanlen Vincent
Muradeli John
Vahidi Cyrus
Wang Changhong
Publication venue
Publication date: 19/07/2022
Field of study

Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained outside of the standard toolkit of perceptual similarity measures and evaluation methods for audio generation. We trace this issue down to three limitations: differentiability, speed, and flexibility. In this paper, we present an implementation of time-frequency scattering in Python. Unlike prior implementations, ours accommodates NumPy, PyTorch, and TensorFlow as backends and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS via three applications: unsupervised manifold learning of spectrotemporal modulations, supervised classification of musical instruments, and texture resynthesis of bioacoustic sounds.Comment: 8 pages, 6 figures. Submitted to the International Conference on Digital Audio Effects (DAFX) 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

AI (r)evolution -- where are we heading? Thoughts about the future of music and sound technologies in the era of deep learning

Author: Barthet Mathieu
Bindi Giovanni
Bryan-Kinns Nick
Demerlé Nils
Diaz Rodrigo
Genova David
Giavitto Jean-Louis
Golvet Aliénor
Hayes Ben
Huang Jiawen
Liu Lele
Martos Vincent
Nabi Sarah
Pelinski Teresa
Renault Lenny
Roebel Axel
Sarkar Saurjya
Sarmento Pedro
Vahidi Cyrus
Wolstanholme Lewis
Zhang Yixiao
Publication venue
Publication date: 20/09/2023
Field of study

Artificial Intelligence (AI) technologies such as deep learning are evolving very quickly bringing many changes to our everyday lives. To explore the future impact and potential of AI in the field of music and sound technologies a doctoral day was held between Queen Mary University of London (QMUL, UK) and Sciences et Technologies de la Musique et du Son (STMS, France). Prompt questions about current trends in AI and music were generated by academics from QMUL and STMS. Students from the two institutions then debated these questions. This report presents a summary of the student debates on the topics of: Data, Impact, and the Environment; Responsible Innovation and Creative Practice; Creativity and Bias; and From Tools to the Singularity. The students represent the future generation of AI and music researchers. The academics represent the incumbent establishment. The student debates reported here capture visions, dreams, concerns, uncertainties, and contentious issues for the future of AI and music as the establishment is rightfully challenged by the next generation

arXiv.org e-Print Archive

Vahidi, Cyrus

Author: Vahidi Cyrus
Publication venue
Publication date: 28/03/2024
Field of study

Tilburg University Repository

Mésostructures : au-delà de la perte spectrale en analyse temps-fréquence différentiable

Author: Fazekas György
Han Han
Lagrange Mathieu
Lostanlen Vincent
Vahidi Cyrus
Wang Changhong
Publication venue: 'Audio Engineering Society'
Publication date: 01/01/2023
Field of study

International audienceComputer musicians refer to mesostructures as the intermediate levels of articulation between the microstructure of waveshapes and the macrostructure of musical forms. Examples of mesostructures include melody, arpeggios, syncopation, polyphonic grouping, and textural contrast. Despite their central role in musical expression, they have received limited attention in recent applications of deep learning to the analysis and synthesis of musical audio. Currently, autoencoders and neural audio synthesizers are only trained and evaluated at the scale of microstructure: i.e., local amplitude variations up to 100 milliseconds or so. In this paper, we formulate and address the problem of mesostructural audio modeling via a composition of a differentiable arpeggiator and time-frequency scattering. We empirically demonstrate that time-frequency scattering serves as a differentiable model of similarity between synthesis parameters that govern mesostructure. By exposing the sensitivity of short-time spectral distances to time alignment, we motivate the need for a time-invariant and multiscale differentiable time-frequency model of similarity at the level of both local spectra and spectrotemporal modulations

INRIA a CCSD electronic archive server

Differentiable Time-Frequency Scattering On GPU

Author: Fazekas George
Han Han
Lagrange Mathieu
Lostanlen Vincent
Muradeli John
Vahidi Cyrus
Wang Changhong
Publication venue: HAL CCSD
Publication date: 06/09/2022
Field of study

International audienceJoint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained outside of the standard toolkit of perceptual similarity measures and evaluation methods for audio generation. We trace this issue down to three limitations: differentiability, speed, and flexibility. In this paper, we present an implementation of time-frequency scattering in Python. Unlike prior implementations, ours accommodates NumPy, PyTorch, and TensorFlow as backends and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS via three applications: unsupervised manifold learning of spectrotemporal modulations, supervised classification of musical instruments, and texture resynthesis of bioacoustic sounds

INRIA a CCSD electronic archive server

Perceptual musical similarity metric learning with graph neural networks

Author: Benetos Emmanouil
Fazekas György
Lagrange Mathieu
Phan Huy
Singh Shubhr
Stowell Dan
Vahidi Cyrus
Publication venue: HAL CCSD
Publication date: 22/10/2023
Field of study

International audienceSound retrieval for assisted music composition depends on evaluating similarity between musical instrument sounds, which is partly influenced by playing techniques. Previous methods utilizing Euclidean nearest neighbours over acoustic features show some limitations in retrieving sounds sharing equivalent timbral properties, but potentially generated using a different instrument, playing technique, pitch or dynamic. In this paper, we present a metric learning system designed to approximate human similarity judgments between extended musical playing techniques using graph neural networks. Such structure is a natural candidate for solving similarity retrieval tasks, yet have seen little application in modelling perceptual music similarity. We optimize a Graph Convolutional Network (GCN) over acoustic features via a proxy metric learning loss to learn embeddings that reflect perceptual similarities. Specifically, we construct the graph's adjacency matrix from the acoustic data manifold with an example-wise adaptive k-nearest neighbourhood graph: Adaptive Neighbourhood Graph Neural Network (AN-GNN). Our approach achieves 96.4% retrieval accuracy compared to 38.5% with a Euclidean metric and 86.0% with a multilayer perceptron (MLP), while effectively considering retrievals from distinct playing techniques to the query example

INRIA a CCSD electronic archive server

Differentiable Time-Frequency Scattering on GPU

Author: Fazekas George
Han Han
Lagrange Mathieu
Lostanlen Vincent
Muradeli John
Vahidi Cyrus
Wang Changhong
Publication venue: HAL CCSD
Publication date: 04/09/2022
Field of study

8 pages, 6 figures. Submitted to the International Conference on Digital Audio Effects (DAFX) 2022Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained outside of the standard toolkit of perceptual similarity measures and evaluation methods for audio generation. We trace this issue down to three limitations: differentiability, speed, and flexibility. In this paper, we present an implementation of time-frequency scattering in Python. Unlike prior implementations, ours accommodates NumPy, PyTorch, and TensorFlow as backends and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS via three applications: unsupervised manifold learning of spectrotemporal modulations, supervised classification of musical instruments, and texture resynthesis of bioacoustic sounds

INRIA a CCSD electronic archive server

Differentiable Time-Frequency Scattering on GPU

Author: Fazekas George
Han Han
Lagrange Mathieu
Lostanlen Vincent
Muradeli John
Vahidi Cyrus
Wang Changhong
Publication venue: HAL CCSD
Publication date: 04/09/2022
Field of study

Hal-Diderot

Perceptual Musical Similarity Metric Learning with Graph Neural Networks

Author: Benetos Emmanouil
Fazekas György
Lagrange Mathieu
Phan Huy
Singh Shubhr
Stowell Dan
Stowell Dan
Vahidi Cyrus
Publication venue
Publication date: 22/10/2023
Field of study

Tilburg University Repository