8,569 research outputs found

    Deep Learning: Our Miraculous Year 1990-1991

    Full text link
    In 2020, we will celebrate that many of the basic ideas behind the deep learning revolution were published three decades ago within fewer than 12 months in our "Annus Mirabilis" or "Miraculous Year" 1990-1991 at TU Munich. Back then, few people were interested, but a quarter century later, neural networks based on these ideas were on over 3 billion devices such as smartphones, and used many billions of times per day, consuming a significant fraction of the world's compute.Comment: 37 pages, 188 references, based on work of 4 Oct 201

    Embedding-Based Speaker Adaptive Training of Deep Neural Networks

    Full text link
    An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent element-wise affine transformations to canonicalize the internal feature representations at the output of hidden layers of a main network. The control network for generating the speaker-dependent mappings is jointly estimated with the main network for the overall speaker adaptive acoustic modeling. Experiments on large vocabulary continuous speech recognition (LVCSR) tasks show that the proposed SAT scheme can yield superior performance over the widely-used speaker-aware training using i-vectors with speaker-adapted input features

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    Investigation of sequence processing: A cognitive and computational neuroscience perspective

    Get PDF
    Serial order processing or sequence processing underlies many human activities such as speech, language, skill learning, planning, problem-solving, etc. Investigating the neural bases of sequence processing enables us to understand serial order in cognition and also helps in building intelligent devices. In this article, we review various cognitive issues related to sequence processing with examples. Experimental results that give evidence for the involvement of various brain areas will be described. Finally, a theoretical approach based on statistical models and reinforcement learning paradigm is presented. These theoretical ideas are useful for studying sequence learning in a principled way. This article also suggests a two-way process diagram integrating experimentation (cognitive neuroscience) and theory/ computational modelling (computational neuroscience). This integrated framework is useful not only in the present study of serial order, but also for understanding many cognitive processes

    Finding Competitive Network Architectures Within a Day Using UCT

    Full text link
    The design of neural network architectures for a new data set is a laborious task which requires human deep learning expertise. In order to make deep learning available for a broader audience, automated methods for finding a neural network architecture are vital. Recently proposed methods can already achieve human expert level performances. However, these methods have run times of months or even years of GPU computing time, ignoring hardware constraints as faced by many researchers and companies. We propose the use of Monte Carlo planning in combination with two different UCT (upper confidence bound applied to trees) derivations to search for network architectures. We adapt the UCT algorithm to the needs of network architecture search by proposing two ways of sharing information between different branches of the search tree. In an empirical study we are able to demonstrate that this method is able to find competitive networks for MNIST, SVHN and CIFAR-10 in just a single GPU day. Extending the search time to five GPU days, we are able to outperform human architectures and our competitors which consider the same types of layers

    Methods and applications of automatic speech recognition

    Get PDF
    Abstract. This thesis is an examination of automatic speech recognition in the form of a narrative literature review. Both past and present methods, and the applications of automatic speech recognition were looked at and examined. Prior research used for sources in this thesis consists of a wide variety of technical conference papers and journal articles on methods of automatic speech recognition, which has seen a lot of advancements throughout the years, and compilations of knowledge on both methods and applications in the form of books and literature reviews. For methods of automatic speech recognition, three of the seemingly most significant ones that were examined were dynamic time warping, hidden Markov models, and deep neural networks. The latter one, deep neural networks, seemed to be the most advanced and used one currently. Applications of automatic speech recognition were looked at with groupings based on their desired communication improvement target, improving either human-human communication or human-machine communication. From the first group, speech-to-speech translation and speech summarization were two popular applications that were examined. From the second group, virtual assistants were examined as an application group of its own, being an encompassing name for a general software agent doing tasks in response to human speech. The research presented on this thesis has the possibility to serve as a basis of future research on the subject of automatic speech recognition. Suggested avenues for this include a quantitative research analysis on either the performance of different methods, privacy aspects of different applications, or approaching the subject from the point of design science research by documenting construction of an automatic speech recognition application using modern methods.Tiivistelmä. Tässä tutkielmassa tutkittiin automaattista puheentunnista narratiivisen kirjallisuuskatsauksen muodossa. Tutkielmassa tarkasteltiin sekä menneitä että nykyisiä tunnetuimpia automaattisen puheentunnistuksen menetelmiä, sekä sen tunnetuimpia sovelluksia kahdesta eri kategoriasta. Aiempi tutkimusmateriaali, jota tutkielmassa käytettiin lähteenä, koostui laajasta valikoimasta erityyppistä aineistoa. Pääasiallisesti automaattisen puheentunnistuksen menetelmiin liittyvä aineisto löytyi konferenssipapereista sekä tieteellisiä lehtiartikkeleita. Vuosien saatossa kehittyneet teknologiat liittyen menetelmiin auttoi tarjoamaan myös monia vuosikymmeniä kattavan tarjonnan tutkimusmateriaalia. Sovelluksiin liittyvä tieto taas on poimittu lähinnä eri kirjoista, sekä muista alan kirjallisuuskatsauksista. Menetelmistä tutkittiin historiallisesti kolmea suosituinta menetelmätapaa, “dynamic time warping”, “hidden Markov models”, sekä “deep neural networks”. Näistä viimeisin, eli syvät neuroverkot, vaikutti olevan edistynein ja suosituin menetelmä nykypäivänä. Sovelluksia tutkittiin kahteen kategoriaan jaettuna. Ensimmäinen kategoria sisältää sovellukset, jotka pyrkivät parantamaan ihmisten välistä kommunikaatiota ja vuorovaikutusta. Tästä kategoriasta tutkittiin kahta suosittua sovellusta, “speech-to-speech translation”, eli reaaliaikaista puheen kääntämistä, sekä “speech summarization”, eli puheen yhteenvetoa. Toinen kategoria sisälsi sovellukset, jotka pyrkivät parantamaan ihmisten ja laitteiden välistä kommunikaatiota ja vuorovaikutusta. Tämän kategorian sovelluksista tutkittiin ehkäpä automaattisen puheentunnistuksen suosituinta sovellustyyppiä, virtuaalisia avustajia. Virtuaalisia avustajia tarkasteltiin yleisenä ohjelmistotyyppinä, jonka pääominaisuutena ja -tarkoituksena on suorittaa eri toimintoja vastauksena ihmisen antamiin puheohjauksiin. Tutkielmassa esitellyn tiedon pohjalta voidaan tehdä myös tulevaisuudessa enemmän tutkimusta. Esimerkkinä tästä olisi kvantitatiivinen tutkimus joko eri automaattisen puheentunnistuksen menetelmien tehokkuuksin, tai automaattisen puheentunnistuksen sovelluksien tietoturvan eri aspekteihin. Mahdollisuutena olisi myös tehdä konstruktiivista tutkimusta tästä aiheesta, rakentaen esimerkiksi automaattisen puheentunnistuksen sovelluksen käyttäen moderneja menetelmiä
    corecore