1,721 research outputs found
Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks
Motivic pattern classification from music audio recordings is a challenging task. More so in the case of a cappella flamenco cantes, characterized by complex melodic variations, pitch instability, timbre changes, extreme vibrato oscillations, microtonal ornamentations, and noisy conditions of the recordings. Convolutional Neural Networks (CNN) have proven to be very effective algorithms in image classification. Recent work in large-scale audio classification has shown that CNN architectures, originally developed for image problems, can be applied successfully to audio event recognition and classification with little or no modifications to the networks. In this paper, CNN architectures are tested in a more nuanced problem: flamenco cantes intra-style classification using small motivic patterns. A new architecture is proposed that uses the advantages of residual CNN as feature extractors, and a bidirectional LSTM layer to exploit the sequential nature of musical audio data. We present a full end-to-end pipeline for audio music classification that includes a sequential pattern mining technique and a contour simplification method to extract relevant motifs from audio recordings. Mel-spectrograms of the extracted motifs are then used as the input for the different architectures tested. We investigate the usefulness of motivic patterns for the automatic classification of music recordings and the effect of the length of the audio and corpus size on the overall classification accuracy. Results show a relative accuracy improvement of up to 20.4% when CNN architectures are trained using acoustic representations from motivic patterns
Affective Computing
This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
The computer synthesis of expressive three-dimensional facial character animation.
This present research is concerned with the design, development and implementation of three-dimensional
computer-generated facial images capable of expression
gesture and speech.
A review of previous work in chapter one shows that to date
the model of computer-generated faces has been one in which
construction and animation were not separated and which
therefore possessed only a limited expressive range. It is
argued in chapter two that the physical description of the
face cannot be seen as originating from a single generic
mould. Chapter three therefore describes data acquisition
techniques employed in the computer generation of free-form
surfaces which are applicable to three-dimensional faces.
Expressions are the result of the distortion of the surface
of the skin by the complex interactions of bone, muscle and
skin. Chapter four demonstrates with static images and short
animation sequences in video that a muscle model process
algorithm can simulate the primary characteristics of the
facial muscles.
Three-dimensional speech synchronization was the most
complex problem to achieve effectively. Chapter five
describes two successful approaches: the direct mapping of
mouth shapes in two dimensions to the model in three
dimensions, and geometric distortions of the mouth created
by the contraction of specified muscle combinations.
Chapter six describes the implementation of software for
this research and argues the case for a parametric approach.
Chapter seven is concerned with the control of facial
articulations and discusses a more biological approach to
these. Finally chapter eight draws conclusions from the
present research and suggests further extensions
- …