52,048 research outputs found
Recognition of variations using automatic Schenkerian reduction.
Experiments on techniques to automatically recognise whether or not an extract of music is a variation of a given theme are reported, using a test corpus derived from ten of Mozart's sets of variations for piano. Methods which examine the notes of the 'surface' are compared with methods which make use of an automatically derived quasi-Schenkerian reduction of the theme and the extract in question. The maximum average F-measure achieved was 0.87. Unexpectedly, this was for a method of matching based on the surface alone, and in general the results for matches based on the surface were marginally better than those based on reduction, though the small number of possible test queries means that this result cannot be regarded as conclusive. Other inferences on which factors seem to be important in recognising variations are discussed. Possibilities for improved recognition of matching using reduction are outlined
The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation
With recent breakthroughs in artificial neural networks, deep generative
models have become one of the leading techniques for computational creativity.
Despite very promising progress on image and short sequence generation,
symbolic music generation remains a challenging problem since the structure of
compositions are usually complicated. In this study, we attempt to solve the
melody generation problem constrained by the given chord progression. This
music meta-creation problem can also be incorporated into a plan recognition
system with user inputs and predictive structural outputs. In particular, we
explore the effect of explicit architectural encoding of musical structure via
comparing two sequential generative models: LSTM (a type of RNN) and WaveNet
(dilated temporal-CNN). As far as we know, this is the first study of applying
WaveNet to symbolic music generation, as well as the first systematic
comparison between temporal-CNN and RNN for music generation. We conduct a
survey for evaluation in our generations and implemented Variable Markov Oracle
in music pattern discovery. Experimental results show that to encode structure
more explicitly using a stack of dilated convolution layers improved the
performance significantly, and a global encoding of underlying chord
progression into the generation procedure gains even more.Comment: 8 pages, 13 figure
Memory Based Online Learning of Deep Representations from Video Streams
We present a novel online unsupervised method for face identity learning from
video streams. The method exploits deep face descriptors together with a memory
based learning mechanism that takes advantage of the temporal coherence of
visual data. Specifically, we introduce a discriminative feature matching
solution based on Reverse Nearest Neighbour and a feature forgetting strategy
that detect redundant features and discard them appropriately while time
progresses. It is shown that the proposed learning procedure is asymptotically
stable and can be effectively used in relevant applications like multiple face
identification and tracking from unconstrained video streams. Experimental
results show that the proposed method achieves comparable results in the task
of multiple face tracking and better performance in face identification with
offline approaches exploiting future information. Code will be publicly
available.Comment: arXiv admin note: text overlap with arXiv:1708.0361
A Review of Audio Features and Statistical Models Exploited for Voice Pattern Design
Audio fingerprinting, also named as audio hashing, has been well-known as a
powerful technique to perform audio identification and synchronization. It
basically involves two major steps: fingerprint (voice pattern) design and
matching search. While the first step concerns the derivation of a robust and
compact audio signature, the second step usually requires knowledge about
database and quick-search algorithms. Though this technique offers a wide range
of real-world applications, to the best of the authors' knowledge, a
comprehensive survey of existing algorithms appeared more than eight years ago.
Thus, in this paper, we present a more up-to-date review and, for emphasizing
on the audio signal processing aspect, we focus our state-of-the-art survey on
the fingerprint design step for which various audio features and their
tractable statistical models are discussed.Comment: http://www.iaria.org/conferences2015/PATTERNS15.html ; Seventh
International Conferences on Pervasive Patterns and Applications (PATTERNS
2015), Mar 2015, Nice, Franc
- …