Search CORE

2,679 research outputs found

Recommended from our members

Generative music with stochastic diffusion search

Author: Al-Rifaie Asmaa Majid
Al-Rifaie Mohammad Majid
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/03/2015
Field of study

This paper introduces an approach for using a swarm intelligence algorithm, Stochastic Diffusion Search (SDS) – inspired by one species of ants, Leptothorax acervorum – in order to generate music from plain text. In this approach , SDS is adapted in such a way to vocalise the agents, to hear their “chit-chat” . While the generated music depends on the input text, the algorithm’s search capability in locating the words in the input text is reflected in the duration and dynamic of the resulting musical notes. In other words, the generated music depends on the behaviour of the algorithm and the communication between its agents. This novel approach, while staying loyal to the original input text, when run each time, ‘vocalises’ the input text in varying ‘flavours’

Greenwich Academic Literature Archive

Generative Music with Stochastic Diffusion Search

Author: Al-Rifaie Asmaa Majid
Al-Rifaie Mohammad Majid
Publication venue: Springer International Publishing
Publication date
Field of study

Goldsmiths Research Online

A Survey of AI Music Generation Tools and Models

Author: Baca Jared
Rawassizadeh Reza
Rekabdar Banafsheh
Zhu Yueyue
Publication venue
Publication date: 23/08/2023
Field of study

In this work, we provide a comprehensive survey of AI music generation tools, including both research projects and commercialized applications. To conduct our analysis, we classified music generation approaches into three categories: parameter-based, text-based, and visual-based classes. Our survey highlights the diverse possibilities and functional features of these tools, which cater to a wide range of users, from regular listeners to professional musicians. We observed that each tool has its own set of advantages and limitations. As a result, we have compiled a comprehensive list of these factors that should be considered during the tool selection process. Moreover, our survey offers critical insights into the underlying mechanisms and challenges of AI music generation

arXiv.org e-Print Archive

VIP: Incorporating Human Cognitive Biases in a Probabilistic Model of Retweeting

Author: A Chaudhry
BA Huberman
K Lerman
NJ Blunch
R Salakhutdinov
Publication venue
Publication date: 02/02/2015
Field of study

Information spread in social media depends on a number of factors, including how the site displays information, how users navigate it to find items of interest, users' tastes, and the `virality' of information, i.e., its propensity to be adopted, or retweeted, upon exposure. Probabilistic models can learn users' tastes from the history of their item adoptions and recommend new items to users. However, current models ignore cognitive biases that are known to affect behavior. Specifically, people pay more attention to items at the top of a list than those in lower positions. As a consequence, items near the top of a user's social media stream have higher visibility, and are more likely to be seen and adopted, than those appearing below. Another bias is due to the item's fitness: some items have a high propensity to spread upon exposure regardless of the interests of adopting users. We propose a probabilistic model that incorporates human cognitive biases and personal relevance in the generative model of information spread. We use the model to predict how messages containing URLs spread on Twitter. Our work shows that models of user behavior that account for cognitive factors can better describe and predict user behavior in social media.Comment: SBP 201

arXiv.org e-Print Archive

Crossref

Zero-Shot Blind Audio Bandwidth Extension

Author: Elvander Filip
Moliner Eloi
Välimäki Vesa
Publication venue
Publication date: 02/06/2023
Field of study

Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to non-blind filter-informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

arXiv.org e-Print Archive

Information dynamics: patterns of expectation and surprise in the perception of music

Author: Amari S.
Berlyne D. E.
Berlyne D. E.
Bod R.
Boden M.
Davies S.
de Finetti B.
Eerola T.
Gibson J. J.
Hanslick E.
Huron D.
Itti L.
Köhler W.
Langer S. K.
Lerdahl F.
Levitt T. S.
Mark Plumbley
Meyer L. B.
Meyer L. B.
Moles A.
Narmour E.
Nichols E.
Samer Abdallah
Stern D.
Werner G. M.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2009
Field of study

This is a postprint of an article submitted for consideration in Connection Science © 2009 [copyright Taylor & Francis]; Connection Science is available online at:http://www.tandfonline.com/openurl?genre=article&issn=0954-0091&volume=21&issue=2-3&spage=8

Crossref

Queen Mary Research Online

Surrey Research Insight

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

Author: Cosmo Luca
Mancusi Michele
Mariani Giorgio
Postolache Emilian
Rodolà Emanuele
Tallini Irene
Publication venue
Publication date: 30/05/2023
Field of study

In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e., generating a mixture, separating the sources), we also introduce and experiment on the partial generation task of source imputation, where we generate a subset of the sources given the others (e.g., play a piano track that goes well with the drums). Additionally, we introduce a novel inference method for the separation task based on Dirac likelihood functions. We train our model on Slakh2100, a standard dataset for musical source separation, provide qualitative results in the generation settings, and showcase competitive quantitative results in the source separation setting. Our method is the first example of a single model that can handle both generation and separation tasks, thus representing a step toward general audio models.Comment: Demo page: https://gladia-research-group.github.io/multi-source-diffusion-models

arXiv.org e-Print Archive

Unsupervised vocal dereverberation with diffusion-based generative models

Author: Fukui Takao
Lai Chieh-Hsin
Mitsufuji Yuki
Murata Naoki
Saito Koichi
Takida Yuhta
Uesaka Toshimitsu
Publication venue
Publication date: 08/11/2022
Field of study

Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its various parameter setups and reverberation types. However, recent supervised dereverberation methods may fail because they rely on sufficiently diverse and numerous pairs of reverberant observations and retrieved data for training in order to be generalizable to unseen observations during inference. To resolve these problems, we propose an unsupervised method that can remove a general kind of artificial reverb for music without requiring pairs of data for training. The proposed method is based on diffusion models, where it initializes the unknown reverberation operator with a conventional signal processing technique and simultaneously refines the estimate with the help of diffusion models. We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.Comment: 6 pages, 2 figures, submitted to ICASSP 202

arXiv.org e-Print Archive

AI-generated Content for Various Data Modalities: A Survey

Author: Foo Lin Geng
Liu Jun
Rahmani Hossein
Publication venue
Publication date: 04/09/2023
Field of study

AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

arXiv.org e-Print Archive

Unseen Image Synthesis with Diffusion Models

Author: Deng Zhiwei
Russakovsky Olga
Wu Yu
Yan Yan
Zhu Ye
Publication venue
Publication date: 13/10/2023
Field of study

While the current trend in the generative field is scaling up towards larger models and more training data for generalized domain representations, we go the opposite direction in this work by synthesizing unseen domain images without additional training. We do so via latent sampling and geometric optimization using pre-trained and frozen Denoising Diffusion Probabilistic Models (DDPMs) on single-domain datasets. Our key observation is that DDPMs pre-trained even just on single-domain images are already equipped with sufficient representation abilities to reconstruct arbitrary images from the inverted latent encoding following bi-directional deterministic diffusion and denoising trajectories. This motivates us to investigate the statistical and geometric behaviors of the Out-Of-Distribution (OOD) samples from unseen image domains in the latent spaces along the denoising chain. Notably, we theoretically and empirically show that the inverted OOD samples also establish Gaussians that are distinguishable from the original In-Domain (ID) samples in the intermediate latent spaces, which allows us to sample from them directly. Geometrical domain-specific and model-dependent information of the unseen subspace (e.g., sample-wise distance and angles) is used to further optimize the sampled OOD latent encodings from the estimated Gaussian prior. We conduct extensive analysis and experiments using pre-trained diffusion models (DDPM, iDDPM) on different datasets (AFHQ, CelebA-HQ, LSUN-Church, and LSUN-Bedroom), proving the effectiveness of this novel perspective to explore and re-think the diffusion models' data synthesis generalization ability.Comment: 28 pages including appendice

arXiv.org e-Print Archive