107,500 research outputs found
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Current Challenges and Visions in Music Recommender Systems Research
Music recommender systems (MRS) have experienced a boom in recent years,
thanks to the emergence and success of online streaming services, which
nowadays make available almost all music in the world at the user's fingertip.
While today's MRS considerably help users to find interesting music in these
huge catalogs, MRS research is still facing substantial challenges. In
particular when it comes to build, incorporate, and evaluate recommendation
strategies that integrate information beyond simple user--item interactions or
content-based descriptors, but dig deep into the very essence of listener
needs, preferences, and intentions, MRS research becomes a big endeavor and
related publications quite sparse.
The purpose of this trends and survey article is twofold. We first identify
and shed light on what we believe are the most pressing challenges MRS research
is facing, from both academic and industry perspectives. We review the state of
the art towards solving these challenges and discuss its limitations. Second,
we detail possible future directions and visions we contemplate for the further
evolution of the field. The article should therefore serve two purposes: giving
the interested reader an overview of current challenges in MRS research and
providing guidance for young researchers by identifying interesting, yet
under-researched, directions in the field
Handwriting styles: benchmarks and evaluation metrics
Evaluating the style of handwriting generation is a challenging problem,
since it is not well defined. It is a key component in order to develop in
developing systems with more personalized experiences with humans. In this
paper, we propose baseline benchmarks, in order to set anchors to estimate the
relative quality of different handwriting style methods. This will be done
using deep learning techniques, which have shown remarkable results in
different machine learning tasks, learning classification, regression, and most
relevant to our work, generating temporal sequences. We discuss the challenges
associated with evaluating our methods, which is related to evaluation of
generative models in general. We then propose evaluation metrics, which we find
relevant to this problem, and we discuss how we evaluate the evaluation
metrics. In this study, we use IRON-OFF dataset. To the best of our knowledge,
there is no work done before in generating handwriting (either in terms of
methodology or the performance metrics), our in exploring styles using this
dataset.Comment: Submitted to IEEE International Workshop on Deep and Transfer
Learning (DTL 2018
Deep Cross-Modal Audio-Visual Generation
Cross-modal audio-visual perception has been a long-lasting topic in
psychology and neurology, and various studies have discovered strong
correlations in human perception of auditory and visual stimuli. Despite works
in computational multimodal modeling, the problem of cross-modal audio-visual
generation has not been systematically studied in the literature. In this
paper, we make the first attempt to solve this cross-modal generation problem
leveraging the power of deep generative adversarial training. Specifically, we
use conditional generative adversarial networks to achieve cross-modal
audio-visual generation of musical performances. We explore different encoding
methods for audio and visual signals, and work on two scenarios:
instrument-oriented generation and pose-oriented generation. Being the first to
explore this new problem, we compose two new datasets with pairs of images and
sounds of musical performances of different instruments. Our experiments using
both classification and human evaluations demonstrate that our model has the
ability to generate one modality, i.e., audio/visual, from the other modality,
i.e., visual/audio, to a good extent. Our experiments on various design choices
along with the datasets will facilitate future research in this new problem
space
Style Transfer and Extraction for the Handwritten Letters Using Deep Learning
How can we learn, transfer and extract handwriting styles using deep neural
networks? This paper explores these questions using a deep conditioned
autoencoder on the IRON-OFF handwriting data-set. We perform three experiments
that systematically explore the quality of our style extraction procedure.
First, We compare our model to handwriting benchmarks using multidimensional
performance metrics. Second, we explore the quality of style transfer, i.e. how
the model performs on new, unseen writers. In both experiments, we improve the
metrics of state of the art methods by a large margin. Lastly, we analyze the
latent space of our model, and we see that it separates consistently writing
styles.Comment: Accepted in ICAART 201
Deep Learning based Recommender System: A Survey and New Perspectives
With the ever-growing volume of online information, recommender systems have
been an effective strategy to overcome such information overload. The utility
of recommender systems cannot be overstated, given its widespread adoption in
many web applications, along with its potential impact to ameliorate many
problems related to over-choice. In recent years, deep learning has garnered
considerable interest in many research fields such as computer vision and
natural language processing, owing not only to stellar performance but also the
attractive property of learning feature representations from scratch. The
influence of deep learning is also pervasive, recently demonstrating its
effectiveness when applied to information retrieval and recommender systems
research. Evidently, the field of deep learning in recommender system is
flourishing. This article aims to provide a comprehensive review of recent
research efforts on deep learning based recommender systems. More concretely,
we provide and devise a taxonomy of deep learning based recommendation models,
along with providing a comprehensive summary of the state-of-the-art. Finally,
we expand on current trends and provide new perspectives pertaining to this new
exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys.
https://doi.acm.org/10.1145/328502
- …