8,231 research outputs found
The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation
With recent breakthroughs in artificial neural networks, deep generative
models have become one of the leading techniques for computational creativity.
Despite very promising progress on image and short sequence generation,
symbolic music generation remains a challenging problem since the structure of
compositions are usually complicated. In this study, we attempt to solve the
melody generation problem constrained by the given chord progression. This
music meta-creation problem can also be incorporated into a plan recognition
system with user inputs and predictive structural outputs. In particular, we
explore the effect of explicit architectural encoding of musical structure via
comparing two sequential generative models: LSTM (a type of RNN) and WaveNet
(dilated temporal-CNN). As far as we know, this is the first study of applying
WaveNet to symbolic music generation, as well as the first systematic
comparison between temporal-CNN and RNN for music generation. We conduct a
survey for evaluation in our generations and implemented Variable Markov Oracle
in music pattern discovery. Experimental results show that to encode structure
more explicitly using a stack of dilated convolution layers improved the
performance significantly, and a global encoding of underlying chord
progression into the generation procedure gains even more.Comment: 8 pages, 13 figure
A Dynamic Approach to Rhythm in Language: Toward a Temporal Phonology
It is proposed that the theory of dynamical systems offers appropriate tools
to model many phonological aspects of both speech production and perception. A
dynamic account of speech rhythm is shown to be useful for description of both
Japanese mora timing and English timing in a phrase repetition task. This
orientation contrasts fundamentally with the more familiar symbolic approach to
phonology, in which time is modeled only with sequentially arrayed symbols. It
is proposed that an adaptive oscillator offers a useful model for perceptual
entrainment (or `locking in') to the temporal patterns of speech production.
This helps to explain why speech is often perceived to be more regular than
experimental measurements seem to justify. Because dynamic models deal with
real time, they also help us understand how languages can differ in their
temporal detail---contributing to foreign accents, for example. The fact that
languages differ greatly in their temporal detail suggests that these effects
are not mere motor universals, but that dynamical models are intrinsic
components of the phonological characterization of language.Comment: 31 pages; compressed, uuencoded Postscrip
Modulation-frequency acts as a primary cue for auditory stream segregation
In our surrounding acoustic world sounds are produced by different sources and interfere with each other before arriving to the ears. A key function of the auditory system is to provide consistent and robust descriptions of the coherent sound groupings and sequences (auditory objects), which likely correspond to the various sound sources in the environment. This function has been termed auditory stream segregation. In the current study we tested the effects of separation in the frequency of amplitude modulation on the segregation of concurrent sound sequences in the auditory stream-segregation paradigm (van Noorden 1975). The aim of the study was to assess 1) whether differential amplitude modulation would help in separating concurrent sound sequences and 2) whether this cue would interact with previously studied static cues (carrier frequency and location difference) in segregating concurrent streams of sound. We found that amplitude modulation difference is utilized as a primary cue for the stream segregation and it interacts with other primary cues such as frequency and location difference
Recommended from our members
Periodicity and frequency coding in human auditory cortex
Understanding the neural coding of pitch and frequency is fundamental to the understanding of speech comprehension, music perception and the segregation of concurrent sound sources. Neuroimaging has made important contributions to defining the pattern of frequency sensitivity in humans. However, the precise way in which pitch sensitivity relates to these frequency-dependent regions remains unclear. Single-frequency tones also cannot be used to test this hypothesis as their pitch always equals their frequency. Here, temporal pitch (periodicity) and frequency coding were dissociated using stimuli that were bandpassed in different frequency spectra (centre frequencies 800 and 4500 Hz), yet were matched in their pitch characteristics. Cortical responses to both pitch-evoking stimuli typically occurred within a region that was also responsive to low frequencies. Its location extended across both primary and nonprimary auditory cortex. An additional control experiment demonstrated that this pitch-related effect was not simply caused by the generation of combination tones. Our findings support recent neurophysiological evidence for a cortical representation of pitch at the lateral border of the primary auditory cortex, while revealing new evidence that additional auditory fields are also likely to play a role in pitch coding
- âŠ