775 research outputs found

    Comparison of echo state network output layer classification methods on noisy data

    Full text link
    Echo state networks are a recently developed type of recurrent neural network where the internal layer is fixed with random weights, and only the output layer is trained on specific data. Echo state networks are increasingly being used to process spatiotemporal data in real-world settings, including speech recognition, event detection, and robot control. A strength of echo state networks is the simple method used to train the output layer - typically a collection of linear readout weights found using a least squares approach. Although straightforward to train and having a low computational cost to use, this method may not yield acceptable accuracy performance on noisy data. This study compares the performance of three echo state network output layer methods to perform classification on noisy data: using trained linear weights, using sparse trained linear weights, and using trained low-rank approximations of reservoir states. The methods are investigated experimentally on both synthetic and natural datasets. The experiments suggest that using regularized least squares to train linear output weights is superior on data with low noise, but using the low-rank approximations may significantly improve accuracy on datasets contaminated with higher noise levels.Comment: 8 pages. International Joint Conference on Neural Networks (IJCNN 2017

    Relative Positional Encoding for Transformers with Linear Complexity

    Full text link
    Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.Comment: ICML 2021 (long talk) camera-ready. 24 page

    An review of automatic drum transcription

    Get PDF
    In Western popular music, drums and percussion are an important means to emphasize and shape the rhythm, often defining the musical style. If computers were able to analyze the drum part in recorded music, it would enable a variety of rhythm-related music processing tasks. Especially the detection and classification of drum sound events by computational methods is considered to be an important and challenging research problem in the broader field of Music Information Retrieval. Over the last two decades, several authors have attempted to tackle this problem under the umbrella term Automatic Drum Transcription(ADT).This paper presents a comprehensive review of ADT research, including a thorough discussion of the task-specific challenges, categorization of existing techniques, and evaluation of several state-of-the-art systems. To provide more insights on the practice of ADT systems, we focus on two families of ADT techniques, namely methods based on Nonnegative Matrix Factorization and Recurrent Neural Networks. We explain the methods’ technical details and drum-specific variations and evaluate these approaches on publicly available datasets with a consistent experimental setup. Finally, the open issues and under-explored areas in ADT research are identified and discussed, providing future directions in this fiel

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    Script Effects as the Hidden Drive of the Mind, Cognition, and Culture

    Get PDF
    This open access volume reveals the hidden power of the script we read in and how it shapes and drives our minds, ways of thinking, and cultures. Expanding on the Linguistic Relativity Hypothesis (i.e., the idea that language affects the way we think), this volume proposes the “Script Relativity Hypothesis” (i.e., the idea that the script in which we read affects the way we think) by offering a unique perspective on the effect of script (alphabets, morphosyllabaries, or multi-scripts) on our attention, perception, and problem-solving. Once we become literate, fundamental changes occur in our brain circuitry to accommodate the new demand for resources. The powerful effects of literacy have been demonstrated by research on literate versus illiterate individuals, as well as cross-scriptal transfer, indicating that literate brain networks function differently, depending on the script being read. This book identifies the locus of differences between the Chinese, Japanese, and Koreans, and between the East and the West, as the neural underpinnings of literacy. To support the “Script Relativity Hypothesis”, it reviews a vast corpus of empirical studies, including anthropological accounts of human civilization, social psychology, cognitive psychology, neuropsychology, applied linguistics, second language studies, and cross-cultural communication. It also discusses the impact of reading from screens in the digital age, as well as the impact of bi-script or multi-script use, which is a growing trend around the globe. As a result, our minds, ways of thinking, and cultures are now growing closer together, not farther apart. ; Examines the origin, emergence, and co-evolution of written language, the human mind, and culture within the purview of script effects Investigates how the scripts we read over time shape our cognition, mind, and thought patterns Provides a new outlook on the four representative writing systems of the world Discusses the consequences of literacy for the functioning of the min

    Max Planck Institute for Psycholinguistics: Annual report 1996

    No full text

    Music Encoding Conference Proceedings 2021, 19–22 July, 2021 University of Alicante (Spain): Onsite & Online

    Get PDF
    Este documento incluye los artículos y pósters presentados en el Music Encoding Conference 2021 realizado en Alicante entre el 19 y el 22 de julio de 2022.Funded by project Multiscore, MCIN/AEI/10.13039/50110001103
    corecore