775 research outputs found
Comparison of echo state network output layer classification methods on noisy data
Echo state networks are a recently developed type of recurrent neural network
where the internal layer is fixed with random weights, and only the output
layer is trained on specific data. Echo state networks are increasingly being
used to process spatiotemporal data in real-world settings, including speech
recognition, event detection, and robot control. A strength of echo state
networks is the simple method used to train the output layer - typically a
collection of linear readout weights found using a least squares approach.
Although straightforward to train and having a low computational cost to use,
this method may not yield acceptable accuracy performance on noisy data.
This study compares the performance of three echo state network output layer
methods to perform classification on noisy data: using trained linear weights,
using sparse trained linear weights, and using trained low-rank approximations
of reservoir states. The methods are investigated experimentally on both
synthetic and natural datasets. The experiments suggest that using regularized
least squares to train linear output weights is superior on data with low
noise, but using the low-rank approximations may significantly improve accuracy
on datasets contaminated with higher noise levels.Comment: 8 pages. International Joint Conference on Neural Networks (IJCNN
2017
Relative Positional Encoding for Transformers with Linear Complexity
Recent advances in Transformer models allow for unprecedented sequence
lengths, due to linear space and time complexity. In the meantime, relative
positional encoding (RPE) was proposed as beneficial for classical Transformers
and consists in exploiting lags instead of absolute positions for inference.
Still, RPE is not available for the recent linear-variants of the Transformer,
because it requires the explicit computation of the attention matrix, which is
precisely what is avoided by such methods. In this paper, we bridge this gap
and present Stochastic Positional Encoding as a way to generate PE that can be
used as a replacement to the classical additive (sinusoidal) PE and provably
behaves like RPE. The main theoretical contribution is to make a connection
between positional encoding and cross-covariance structures of correlated
Gaussian processes. We illustrate the performance of our approach on the
Long-Range Arena benchmark and on music generation.Comment: ICML 2021 (long talk) camera-ready. 24 page
An review of automatic drum transcription
In Western popular music, drums and percussion are an important means to emphasize and shape the rhythm, often defining the musical style. If computers were able to analyze the drum part in recorded music, it would enable a variety of rhythm-related music processing tasks. Especially the detection and classification of drum sound events by computational methods is considered to be an important and challenging research problem in the broader field of Music Information Retrieval. Over the last two decades, several authors have attempted to tackle this problem under the umbrella term Automatic Drum Transcription(ADT).This paper presents a comprehensive review of ADT research, including a thorough discussion of the task-specific challenges, categorization of existing techniques, and evaluation of several state-of-the-art systems. To provide more insights on the practice of ADT systems, we focus on two families of ADT techniques, namely methods based on Nonnegative Matrix Factorization and Recurrent Neural Networks. We explain the methods’ technical details and drum-specific variations and evaluate these approaches on publicly available datasets with a consistent experimental setup. Finally, the open issues and under-explored areas in ADT research are identified and discussed, providing future directions in this fiel
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech
Script Effects as the Hidden Drive of the Mind, Cognition, and Culture
This open access volume reveals the hidden power of the script we read in and how it shapes and drives our minds, ways of thinking, and cultures. Expanding on the Linguistic Relativity Hypothesis (i.e., the idea that language affects the way we think), this volume proposes the “Script Relativity Hypothesis” (i.e., the idea that the script in which we read affects the way we think) by offering a unique perspective on the effect of script (alphabets, morphosyllabaries, or multi-scripts) on our attention, perception, and problem-solving. Once we become literate, fundamental changes occur in our brain circuitry to accommodate the new demand for resources. The powerful effects of literacy have been demonstrated by research on literate versus illiterate individuals, as well as cross-scriptal transfer, indicating that literate brain networks function differently, depending on the script being read. This book identifies the locus of differences between the Chinese, Japanese, and Koreans, and between the East and the West, as the neural underpinnings of literacy. To support the “Script Relativity Hypothesis”, it reviews a vast corpus of empirical studies, including anthropological accounts of human civilization, social psychology, cognitive psychology, neuropsychology, applied linguistics, second language studies, and cross-cultural communication. It also discusses the impact of reading from screens in the digital age, as well as the impact of bi-script or multi-script use, which is a growing trend around the globe. As a result, our minds, ways of thinking, and cultures are now growing closer together, not farther apart. ; Examines the origin, emergence, and co-evolution of written language, the human mind, and culture within the purview of script effects Investigates how the scripts we read over time shape our cognition, mind, and thought patterns Provides a new outlook on the four representative writing systems of the world Discusses the consequences of literacy for the functioning of the min
Music Encoding Conference Proceedings 2021, 19–22 July, 2021 University of Alicante (Spain): Onsite & Online
Este documento incluye los artículos y pósters presentados en el Music Encoding Conference 2021 realizado en Alicante entre el 19 y el 22 de julio de 2022.Funded by project Multiscore, MCIN/AEI/10.13039/50110001103
- …