5,748 research outputs found
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Generating music has a few notable differences from generating images and
videos. First, music is an art of time, necessitating a temporal model. Second,
music is usually composed of multiple instruments/tracks with their own
temporal dynamics, but collectively they unfold over time interdependently.
Lastly, musical notes are often grouped into chords, arpeggios or melodies in
polyphonic music, and thereby introducing a chronological ordering of notes is
not naturally suitable. In this paper, we propose three models for symbolic
multi-track music generation under the framework of generative adversarial
networks (GANs). The three models, which differ in the underlying assumptions
and accordingly the network architectures, are referred to as the jamming
model, the composer model and the hybrid model. We trained the proposed models
on a dataset of over one hundred thousand bars of rock music and applied them
to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings.
A few intra-track and inter-track objective metrics are also proposed to
evaluate the generative results, in addition to a subjective user study. We
show that our models can generate coherent music of four bars right from
scratch (i.e. without human inputs). We also extend our models to human-AI
cooperative music generation: given a specific track composed by human, we can
generate four additional tracks to accompany it. All code, the dataset and the
rendered audio samples are available at https://salu133445.github.io/musegan/ .Comment: to appear at AAAI 201
Revisiting the problem of audio-based hit song prediction using convolutional neural networks
Being able to predict whether a song can be a hit has impor- tant
applications in the music industry. Although it is true that the popularity of
a song can be greatly affected by exter- nal factors such as social and
commercial influences, to which degree audio features computed from musical
signals (whom we regard as internal factors) can predict song popularity is an
interesting research question on its own. Motivated by the recent success of
deep learning techniques, we attempt to ex- tend previous work on hit song
prediction by jointly learning the audio features and prediction models using
deep learning. Specifically, we experiment with a convolutional neural net-
work model that takes the primitive mel-spectrogram as the input for feature
learning, a more advanced JYnet model that uses an external song dataset for
supervised pre-training and auto-tagging, and the combination of these two
models. We also consider the inception model to characterize audio infor-
mation in different scales. Our experiments suggest that deep structures are
indeed more accurate than shallow structures in predicting the popularity of
either Chinese or Western Pop songs in Taiwan. We also use the tags predicted
by JYnet to gain insights into the result of different models.Comment: To appear in the proceedings of 2017 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP
Self-Supervised Learning for Speech Enhancement through Synthesis
Modern speech enhancement (SE) networks typically implement noise suppression
through time-frequency masking, latent representation masking, or
discriminative signal prediction. In contrast, some recent works explore SE via
generative speech synthesis, where the system's output is synthesized by a
neural vocoder after an inherently lossy feature-denoising step. In this paper,
we propose a denoising vocoder (DeVo) approach, where a vocoder accepts noisy
representations and learns to directly synthesize clean speech. We leverage
rich representations from self-supervised learning (SSL) speech models to
discover relevant features. We conduct a candidate search across 15 potential
SSL front-ends and subsequently train our vocoder adversarially with the best
SSL configuration. Additionally, we demonstrate a causal version capable of
running on streaming audio with 10ms latency and minimal performance
degradation. Finally, we conduct both objective evaluations and subjective
listening studies to show our system improves objective metrics and outperforms
an existing state-of-the-art SE model subjectively
CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment
Speech quality assessment has been a critical component in many voice
communication related applications such as telephony and online conferencing.
Traditional intrusive speech quality assessment requires the clean reference of
the degraded utterance to provide an accurate quality measurement. This
requirement limits the usability of these methods in real-world scenarios. On
the other hand, non-intrusive subjective measurement is the ``golden standard"
in evaluating speech quality as human listeners can intrinsically evaluate the
quality of any degraded speech with ease. In this paper, we propose a novel
end-to-end model structure called Convolutional Context-Aware Transformer
(CCAT) network to predict the mean opinion score (MOS) of human raters. We
evaluate our model on three MOS-annotated datasets spanning multiple languages
and distortion types and submit our results to the ConferencingSpeech 2022
Challenge. Our experiments show that CCAT provides promising MOS predictions
compared to current state-of-art non-intrusive speech assessment models with
average Pearson correlation coefficient (PCC) increasing from 0.530 to 0.697
and average RMSE decreasing from 0.768 to 0.570 compared to the baseline model
on the challenge evaluation test set
Inscuteable and Staufen Mediate Asymmetric Localization and Segregation of prosperoRNA during Drosophila Neuroblast Cell Divisions
AbstractWhen neuroblasts divide, inscuteable acts to coordinate protein localization and mitotic spindle orientation, ensuring that asymmetrically localized determinants like Prospero partition into one progeny. staufen encodes a dsRNA-binding protein implicated in mRNA transport in oocytes. We demonstrate that prospero RNA is also asymmetrically localized and partitioned during neuroblast cell divisions, a process requiring both inscuteable and staufen. Inscuteable and Staufen interact and colocalize with prospero RNA on the apical cortex of interphase neuroblasts. Staufen binds prospero RNA in its 3′UTR. Our findings suggest that Inscuteable nucleates an apical complex and is required for protein localization, spindle orientation, and RNA localization. Stau, as one component of this complex, is required only for RNA localization. Hence staufen also acts zygotically, downstream of inscuteable, to effect aspects of neuroblast asymmetry
BIOMECHANICAL ANALYSIS DURING COUNTERMOVEMENT JUMP IN CHILDREN AND ADULTS
This study was to examine the biomechanical characteristics of children and adults during countermovement jump. Seven children and seven adult males were recruited to the study. A Peak high-speed camera (120Hz) synchronized with a force plate (600Hz) were used to record vertical jumping action. The kinetic parameters were calculated by using inverse dynamic method. Results showed that the children had both immature joint function prior to propulsion and inadequate knee and ankle joints function during propulsion. It is concluded that a lack of form in jumping strategy was performed during vertical jumpings in the children's group in terms of the kinetic methods was performed. This information may be used in following studies about countermovement jump, avoiding some important information needed only by kinematic analysis, it will be more complete to apply kinetic analysis for children movement researches
Doping and temperature dependence of electron spectrum and quasiparticle dispersion in doped bilayer cuprates
Within the t-t'-J model, the electron spectrum and quasiparticle dispersion
in doped bilayer cuprates in the normal state are discussed by considering the
bilayer interaction. It is shown that the bilayer interaction splits the
electron spectrum of doped bilayer cuprates into the bonding and antibonding
components around the point. The differentiation between the bonding
and antibonding components is essential, which leads to two main flat bands
around the point below the Fermi energy. In analogy to the doped
single layer cuprates, the lowest energy states in doped bilayer cuprates are
located at the point. Our results also show that the striking
behavior of the electronic structure in doped bilayer cuprates is intriguingly
related to the bilayer interaction together with strong coupling between the
electron quasiparticles and collective magnetic excitations.Comment: 9 pages, 4 figures, updated references, added figures and
discussions, accepted for publication in Phys. Rev.
- …