2,679 research outputs found
Recommended from our members
Generative music with stochastic diffusion search
This paper introduces an approach for using a swarm intelligence algorithm, Stochastic Diffusion Search (SDS) – inspired by one
species of ants, Leptothorax acervorum – in order to generate music from plain text. In this approach , SDS is adapted in such a way to vocalise the agents, to hear their “chit-chat” . While the generated music depends on the input text, the algorithm’s search capability in locating the words in the input text is reflected in the duration and dynamic of the resulting musical notes. In other words, the generated music depends on the behaviour of the algorithm and the communication between its agents. This novel approach, while staying loyal to the original input text, when run each time, ‘vocalises’ the input text in varying ‘flavours’
A Survey of AI Music Generation Tools and Models
In this work, we provide a comprehensive survey of AI music generation tools,
including both research projects and commercialized applications. To conduct
our analysis, we classified music generation approaches into three categories:
parameter-based, text-based, and visual-based classes. Our survey highlights
the diverse possibilities and functional features of these tools, which cater
to a wide range of users, from regular listeners to professional musicians. We
observed that each tool has its own set of advantages and limitations. As a
result, we have compiled a comprehensive list of these factors that should be
considered during the tool selection process. Moreover, our survey offers
critical insights into the underlying mechanisms and challenges of AI music
generation
VIP: Incorporating Human Cognitive Biases in a Probabilistic Model of Retweeting
Information spread in social media depends on a number of factors, including
how the site displays information, how users navigate it to find items of
interest, users' tastes, and the `virality' of information, i.e., its
propensity to be adopted, or retweeted, upon exposure. Probabilistic models can
learn users' tastes from the history of their item adoptions and recommend new
items to users. However, current models ignore cognitive biases that are known
to affect behavior. Specifically, people pay more attention to items at the top
of a list than those in lower positions. As a consequence, items near the top
of a user's social media stream have higher visibility, and are more likely to
be seen and adopted, than those appearing below. Another bias is due to the
item's fitness: some items have a high propensity to spread upon exposure
regardless of the interests of adopting users. We propose a probabilistic model
that incorporates human cognitive biases and personal relevance in the
generative model of information spread. We use the model to predict how
messages containing URLs spread on Twitter. Our work shows that models of user
behavior that account for cognitive factors can better describe and predict
user behavior in social media.Comment: SBP 201
Zero-Shot Blind Audio Bandwidth Extension
Audio bandwidth extension involves the realistic reconstruction of
high-frequency spectra from bandlimited observations. In cases where the
lowpass degradation is unknown, such as in restoring historical audio
recordings, this becomes a blind problem. This paper introduces a novel method
called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem
in a zero-shot setting, leveraging the generative priors of a pre-trained
unconditional diffusion model. During the inference process, BABE utilizes a
generalized version of diffusion posterior sampling, where the degradation
operator is unknown but parametrized and inferred iteratively. The performance
of the proposed method is evaluated using objective and subjective metrics, and
the results show that BABE surpasses state-of-the-art blind bandwidth extension
baselines and achieves competitive performance compared to non-blind
filter-informed methods when tested with synthetic data. Moreover, BABE
exhibits robust generalization capabilities when enhancing real historical
recordings, effectively reconstructing the missing high-frequency content while
maintaining coherence with the original recording. Subjective preference tests
confirm that BABE significantly improves the audio quality of historical music
recordings. Examples of historical recordings restored with the proposed method
are available on the companion webpage:
(http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language
Processin
Information dynamics: patterns of expectation and surprise in the perception of music
This is a postprint of an article submitted for consideration in Connection Science © 2009 [copyright Taylor & Francis]; Connection Science is available online at:http://www.tandfonline.com/openurl?genre=article&issn=0954-0091&volume=21&issue=2-3&spage=8
Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
In this work, we define a diffusion-based generative model capable of both
music synthesis and source separation by learning the score of the joint
probability density of sources sharing a context. Alongside the classic total
inference tasks (i.e., generating a mixture, separating the sources), we also
introduce and experiment on the partial generation task of source imputation,
where we generate a subset of the sources given the others (e.g., play a piano
track that goes well with the drums). Additionally, we introduce a novel
inference method for the separation task based on Dirac likelihood functions.
We train our model on Slakh2100, a standard dataset for musical source
separation, provide qualitative results in the generation settings, and
showcase competitive quantitative results in the source separation setting. Our
method is the first example of a single model that can handle both generation
and separation tasks, thus representing a step toward general audio models.Comment: Demo page:
https://gladia-research-group.github.io/multi-source-diffusion-models
Unsupervised vocal dereverberation with diffusion-based generative models
Removing reverb from reverberant music is a necessary technique to clean up
audio for downstream music manipulations. Reverberation of music contains two
categories, natural reverb, and artificial reverb. Artificial reverb has a
wider diversity than natural reverb due to its various parameter setups and
reverberation types. However, recent supervised dereverberation methods may
fail because they rely on sufficiently diverse and numerous pairs of
reverberant observations and retrieved data for training in order to be
generalizable to unseen observations during inference. To resolve these
problems, we propose an unsupervised method that can remove a general kind of
artificial reverb for music without requiring pairs of data for training. The
proposed method is based on diffusion models, where it initializes the unknown
reverberation operator with a conventional signal processing technique and
simultaneously refines the estimate with the help of diffusion models. We show
through objective and perceptual evaluations that our method outperforms the
current leading vocal dereverberation benchmarks.Comment: 6 pages, 2 figures, submitted to ICASSP 202
AI-generated Content for Various Data Modalities: A Survey
AI-generated content (AIGC) methods aim to produce text, images, videos, 3D
assets, and other media using AI algorithms. Due to its wide range of
applications and the demonstrated potential of recent works, AIGC developments
have been attracting lots of attention recently, and AIGC methods have been
developed for various data modalities, such as image, video, text, 3D shape (as
voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human
avatar (body and head), 3D motion, and audio -- each presenting different
characteristics and challenges. Furthermore, there have also been many
significant developments in cross-modality AIGC methods, where generative
methods can receive conditioning input in one modality and produce outputs in
another. Examples include going from various modalities to image, video, 3D
shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar),
and audio modalities. In this paper, we provide a comprehensive review of AIGC
methods across different data modalities, including both single-modality and
cross-modality methods, highlighting the various challenges, representative
works, and recent technical directions in each setting. We also survey the
representative datasets throughout the modalities, and present comparative
results for various modalities. Moreover, we also discuss the challenges and
potential future research directions
Unseen Image Synthesis with Diffusion Models
While the current trend in the generative field is scaling up towards larger
models and more training data for generalized domain representations, we go the
opposite direction in this work by synthesizing unseen domain images without
additional training. We do so via latent sampling and geometric optimization
using pre-trained and frozen Denoising Diffusion Probabilistic Models (DDPMs)
on single-domain datasets. Our key observation is that DDPMs pre-trained even
just on single-domain images are already equipped with sufficient
representation abilities to reconstruct arbitrary images from the inverted
latent encoding following bi-directional deterministic diffusion and denoising
trajectories. This motivates us to investigate the statistical and geometric
behaviors of the Out-Of-Distribution (OOD) samples from unseen image domains in
the latent spaces along the denoising chain. Notably, we theoretically and
empirically show that the inverted OOD samples also establish Gaussians that
are distinguishable from the original In-Domain (ID) samples in the
intermediate latent spaces, which allows us to sample from them directly.
Geometrical domain-specific and model-dependent information of the unseen
subspace (e.g., sample-wise distance and angles) is used to further optimize
the sampled OOD latent encodings from the estimated Gaussian prior. We conduct
extensive analysis and experiments using pre-trained diffusion models (DDPM,
iDDPM) on different datasets (AFHQ, CelebA-HQ, LSUN-Church, and LSUN-Bedroom),
proving the effectiveness of this novel perspective to explore and re-think the
diffusion models' data synthesis generalization ability.Comment: 28 pages including appendice
- …