67 research outputs found
Character-level Recurrent Neural Networks in Practice: Comparing Training and Sampling Schemes
Recurrent neural networks are nowadays successfully used in an abundance of
applications, going from text, speech and image processing to recommender
systems. Backpropagation through time is the algorithm that is commonly used to
train these networks on specific tasks. Many deep learning frameworks have
their own implementation of training and sampling procedures for recurrent
neural networks, while there are in fact multiple other possibilities to choose
from and other parameters to tune. In existing literature this is very often
overlooked or ignored. In this paper we therefore give an overview of possible
training and sampling schemes for character-level recurrent neural networks to
solve the task of predicting the next token in a given sequence. We test these
different schemes on a variety of datasets, neural network architectures and
parameter settings, and formulate a number of take-home recommendations. The
choice of training and sampling scheme turns out to be subject to a number of
trade-offs, such as training stability, sampling time, model performance and
implementation effort, but is largely independent of the data. Perhaps the most
surprising result is that transferring hidden states for correctly initializing
the model on subsequences often leads to unstable training behavior depending
on the dataset.Comment: 23 pages, 11 figures, 4 table
Semantics-driven event clustering in Twitter feeds
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall
Representation learning for very short texts using weighted word embedding aggregation
Short text messages such as tweets are very noisy and sparse in their use of
vocabulary. Traditional textual representations, such as tf-idf, have
difficulty grasping the semantic meaning of such texts, which is important in
applications such as event detection, opinion mining, news recommendation, etc.
We constructed a method based on semantic word embeddings and frequency
information to arrive at low-dimensional representations for short texts
designed to capture semantic similarity. For this purpose we designed a
weight-based model and a learning procedure based on a novel median-based loss
function. This paper discusses the details of our model and the optimization
methods, together with the experimental results on both Wikipedia and Twitter
data. We find that our method outperforms the baseline approaches in the
experiments, and that it generalizes well on different word embeddings without
retraining. Our method is therefore capable of retaining most of the semantic
information in the text, and is applicable out-of-the-box.Comment: 8 pages, 3 figures, 2 tables, appears in Pattern Recognition Letter
Learning perception and planning with deep active inference
Active inference is a process theory of the brain that states that all living organisms infer actions in order to minimize their (expected) free energy. However, current experiments are limited to predefined, often discrete, state spaces. In this paper we use recent advances in deep learning to learn the state space and approximate the necessary probability distributions to engage in active inference
Efficiency Evaluation of Character-level RNN Training Schedules
We present four training and prediction schedules from the same
character-level recurrent neural network. The efficiency of these schedules is
tested in terms of model effectiveness as a function of training time and
amount of training data seen. We show that the choice of training and
prediction schedule potentially has a considerable impact on the prediction
effectiveness for a given training budget.Comment: 3 pages, 3 figure
Sigmoidal NMFD : convolutional NMF with saturating activations for drum mixture decomposition
In many types of music, percussion plays an essential role to establish the rhythm and the groove of the music. Algorithms that can decompose the percussive signal into its constituent components would therefore be very useful, as they would enable many analytical and creative applications. This paper describes a method for the unsupervised decomposition of percussive recordings, building on the non-negative matrix factor deconvolution (NMFD) algorithm. Given a percussive music recording, NMFD discovers a dictionary of time-varying spectral templates and corresponding activation functions, representing its constituent sounds and their positions in the mix. We observe, however, that the activation functions discovered using NMFD do not show the expected impulse-like behavior for percussive instruments. We therefore enforce this behavior by specifying that the activations should take on binary values: either an instrument is hit, or it is not. To this end, we rewrite the activations as the output of a sigmoidal function, multiplied with a per-component amplitude factor. We furthermore define a regularization term that biases the decomposition to solutions with saturated activations, leading to the desired binary behavior. We evaluate several optimization strategies and techniques that are designed to avoid poor local minima. We show that incentivizing the activations to be binary indeed leads to the desired impulse-like behavior, and that the resulting components are better separated, leading to more interpretable decompositions
Audio-guided Album Cover Art Generation with Genetic Algorithms
Over 60,000 songs are released on Spotify every day, and the competition for
the listener's attention is immense. In that regard, the importance of
captivating and inviting cover art cannot be underestimated, because it is
deeply entangled with a song's character and the artist's identity, and remains
one of the most important gateways to lead people to discover music. However,
designing cover art is a highly creative, lengthy and sometimes expensive
process that can be daunting, especially for non-professional artists. For this
reason, we propose a novel deep-learning framework to generate cover art guided
by audio features. Inspired by VQGAN-CLIP, our approach is highly flexible
because individual components can easily be replaced without the need for any
retraining. This paper outlines the architectural details of our models and
discusses the optimization challenges that emerge from them. More specifically,
we will exploit genetic algorithms to overcome bad local minima and adversarial
examples. We find that our framework can generate suitable cover art for most
genres, and that the visual features adapt themselves to audio feature changes.
Given these results, we believe that our framework paves the road for
extensions and more advanced applications in audio-guided visual generation
tasks.Comment: 8 pages, 6 figures, 4 table
- …