2,265 research outputs found
Recommended from our members
Real-time decoding of question-and-answer speech dialogue using human cortical activity.
Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
Recommended from our members
Parallels in the sequential organization of birdsong and human speech.
Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes
Recognition of Harmonic Sounds in Polyphonic Audio using a Missing Feature Approach: Extended Report
A method based on local spectral features and missing feature techniques
is proposed for the recognition of harmonic sounds in mixture
signals. A mask estimation algorithm is proposed for identifying
spectral regions that contain reliable information for each sound
source and then bounded marginalization is employed to treat the
feature vector elements that are determined as unreliable. The proposed
method is tested on musical instrument sounds due to the
extensive availability of data but it can be applied on other sounds
(i.e. animal sounds, environmental sounds), whenever these are harmonic.
In simulations the proposed method clearly outperformed a
baseline method for mixture signals
Temporally-aware algorithms for the classification of anuran sounds
Several authors have shown that the sounds of anurans can be used as an indicator of
climate change. Hence, the recording, storage and further processing of a huge
number of anuran sounds, distributed over time and space, are required in order to
obtain this indicator. Furthermore, it is desirable to have algorithms and tools for
the automatic classification of the different classes of sounds. In this paper, six
classification methods are proposed, all based on the data-mining domain, which
strive to take advantage of the temporal character of the sounds. The definition and
comparison of these classification methods is undertaken using several approaches.
The main conclusions of this paper are that: (i) the sliding window method attained
the best results in the experiments presented, and even outperformed the hidden
Markov models usually employed in similar applications; (ii) noteworthy overall
classification performance has been obtained, which is an especially striking result
considering that the sounds analysed were affected by a highly noisy background;
(iii) the instance selection for the determination of the sounds in the training dataset
offers better results than cross-validation techniques; and (iv) the temporally-aware
classifiers have revealed that they can obtain better performance than their nontemporally-aware
counterparts.Consejería de Innovación, Ciencia y Empresa (Junta de Andalucía, Spain): excellence eSAPIENS number TIC 570
Recognition of 3D arm movements using neural networks
[[abstract]]There are many different approaches to recognition of spatio-temporal patterns. Each has its own merits and disadvantages. In this paper we present a neural-network-based approach to spatio-temporal pattern recognition. The effectiveness of this method is evaluated by recognizing 3D arm movements involved in Taiwanese sign language (TSL).[[conferencetype]]國際[[conferencedate]]19990710~19990716[[booktype]]紙本[[conferencelocation]]Washington, DC, US
- …