8,569 research outputs found
Deep Learning: Our Miraculous Year 1990-1991
In 2020, we will celebrate that many of the basic ideas behind the deep
learning revolution were published three decades ago within fewer than 12
months in our "Annus Mirabilis" or "Miraculous Year" 1990-1991 at TU Munich.
Back then, few people were interested, but a quarter century later, neural
networks based on these ideas were on over 3 billion devices such as
smartphones, and used many billions of times per day, consuming a significant
fraction of the world's compute.Comment: 37 pages, 188 references, based on work of 4 Oct 201
Embedding-Based Speaker Adaptive Training of Deep Neural Networks
An embedding-based speaker adaptive training (SAT) approach is proposed and
investigated in this paper for deep neural network acoustic modeling. In this
approach, speaker embedding vectors, which are a constant given a particular
speaker, are mapped through a control network to layer-dependent element-wise
affine transformations to canonicalize the internal feature representations at
the output of hidden layers of a main network. The control network for
generating the speaker-dependent mappings is jointly estimated with the main
network for the overall speaker adaptive acoustic modeling. Experiments on
large vocabulary continuous speech recognition (LVCSR) tasks show that the
proposed SAT scheme can yield superior performance over the widely-used
speaker-aware training using i-vectors with speaker-adapted input features
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Investigation of sequence processing: A cognitive and computational neuroscience perspective
Serial order processing or sequence processing underlies
many human activities such as speech, language, skill
learning, planning, problem-solving, etc. Investigating
the neural bases of sequence processing enables us to
understand serial order in cognition and also helps in
building intelligent devices. In this article, we review
various cognitive issues related to sequence processing
with examples. Experimental results that give evidence
for the involvement of various brain areas will be described.
Finally, a theoretical approach based on statistical
models and reinforcement learning paradigm is
presented. These theoretical ideas are useful for studying
sequence learning in a principled way. This article
also suggests a two-way process diagram integrating
experimentation (cognitive neuroscience) and theory/
computational modelling (computational neuroscience).
This integrated framework is useful not only in the present
study of serial order, but also for understanding
many cognitive processes
Finding Competitive Network Architectures Within a Day Using UCT
The design of neural network architectures for a new data set is a laborious
task which requires human deep learning expertise. In order to make deep
learning available for a broader audience, automated methods for finding a
neural network architecture are vital. Recently proposed methods can already
achieve human expert level performances. However, these methods have run times
of months or even years of GPU computing time, ignoring hardware constraints as
faced by many researchers and companies. We propose the use of Monte Carlo
planning in combination with two different UCT (upper confidence bound applied
to trees) derivations to search for network architectures. We adapt the UCT
algorithm to the needs of network architecture search by proposing two ways of
sharing information between different branches of the search tree. In an
empirical study we are able to demonstrate that this method is able to find
competitive networks for MNIST, SVHN and CIFAR-10 in just a single GPU day.
Extending the search time to five GPU days, we are able to outperform human
architectures and our competitors which consider the same types of layers
Methods and applications of automatic speech recognition
Abstract. This thesis is an examination of automatic speech recognition in the form of a narrative literature review. Both past and present methods, and the applications of automatic speech recognition were looked at and examined.
Prior research used for sources in this thesis consists of a wide variety of technical conference papers and journal articles on methods of automatic speech recognition, which has seen a lot of advancements throughout the years, and compilations of knowledge on both methods and applications in the form of books and literature reviews.
For methods of automatic speech recognition, three of the seemingly most significant ones that were examined were dynamic time warping, hidden Markov models, and deep neural networks. The latter one, deep neural networks, seemed to be the most advanced and used one currently.
Applications of automatic speech recognition were looked at with groupings based on their desired communication improvement target, improving either human-human communication or human-machine communication. From the first group, speech-to-speech translation and speech summarization were two popular applications that were examined. From the second group, virtual assistants were examined as an application group of its own, being an encompassing name for a general software agent doing tasks in response to human speech.
The research presented on this thesis has the possibility to serve as a basis of future research on the subject of automatic speech recognition. Suggested avenues for this include a quantitative research analysis on either the performance of different methods, privacy aspects of different applications, or approaching the subject from the point of design science research by documenting construction of an automatic speech recognition application using modern methods.Tiivistelmä. Tässä tutkielmassa tutkittiin automaattista puheentunnista narratiivisen kirjallisuuskatsauksen muodossa. Tutkielmassa tarkasteltiin sekä menneitä että nykyisiä tunnetuimpia automaattisen puheentunnistuksen menetelmiä, sekä sen tunnetuimpia sovelluksia kahdesta eri kategoriasta.
Aiempi tutkimusmateriaali, jota tutkielmassa käytettiin lähteenä, koostui laajasta valikoimasta erityyppistä aineistoa. Pääasiallisesti automaattisen puheentunnistuksen menetelmiin liittyvä aineisto löytyi konferenssipapereista sekä tieteellisiä lehtiartikkeleita. Vuosien saatossa kehittyneet teknologiat liittyen menetelmiin auttoi tarjoamaan myös monia vuosikymmeniä kattavan tarjonnan tutkimusmateriaalia. Sovelluksiin liittyvä tieto taas on poimittu lähinnä eri kirjoista, sekä muista alan kirjallisuuskatsauksista.
Menetelmistä tutkittiin historiallisesti kolmea suosituinta menetelmätapaa, “dynamic time warping”, “hidden Markov models”, sekä “deep neural networks”. Näistä viimeisin, eli syvät neuroverkot, vaikutti olevan edistynein ja suosituin menetelmä nykypäivänä.
Sovelluksia tutkittiin kahteen kategoriaan jaettuna. Ensimmäinen kategoria sisältää sovellukset, jotka pyrkivät parantamaan ihmisten välistä kommunikaatiota ja vuorovaikutusta. Tästä kategoriasta tutkittiin kahta suosittua sovellusta, “speech-to-speech translation”, eli reaaliaikaista puheen kääntämistä, sekä “speech summarization”, eli puheen yhteenvetoa. Toinen kategoria sisälsi sovellukset, jotka pyrkivät parantamaan ihmisten ja laitteiden välistä kommunikaatiota ja vuorovaikutusta. Tämän kategorian sovelluksista tutkittiin ehkäpä automaattisen puheentunnistuksen suosituinta sovellustyyppiä, virtuaalisia avustajia. Virtuaalisia avustajia tarkasteltiin yleisenä ohjelmistotyyppinä, jonka pääominaisuutena ja -tarkoituksena on suorittaa eri toimintoja vastauksena ihmisen antamiin puheohjauksiin.
Tutkielmassa esitellyn tiedon pohjalta voidaan tehdä myös tulevaisuudessa enemmän tutkimusta. Esimerkkinä tästä olisi kvantitatiivinen tutkimus joko eri automaattisen puheentunnistuksen menetelmien tehokkuuksin, tai automaattisen puheentunnistuksen sovelluksien tietoturvan eri aspekteihin. Mahdollisuutena olisi myös tehdä konstruktiivista tutkimusta tästä aiheesta, rakentaen esimerkiksi automaattisen puheentunnistuksen sovelluksen käyttäen moderneja menetelmiä
- …