3,161 research outputs found

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Integer Sparse Distributed Memory and Modular Composite Representation

    Get PDF
    Challenging AI applications, such as cognitive architectures, natural language understanding, and visual object recognition share some basic operations including pattern recognition, sequence learning, clustering, and association of related data. Both the representations used and the structure of a system significantly influence which tasks and problems are most readily supported. A memory model and a representation that facilitate these basic tasks would greatly improve the performance of these challenging AI applications.Sparse Distributed Memory (SDM), based on large binary vectors, has several desirable properties: auto-associativity, content addressability, distributed storage, robustness over noisy inputs that would facilitate the implementation of challenging AI applications. Here I introduce two variations on the original SDM, the Extended SDM and the Integer SDM, that significantly improve these desirable properties, as well as a new form of reduced description representation named MCR.Extended SDM, which uses word vectors of larger size than address vectors, enhances its hetero-associativity, improving the storage of sequences of vectors, as well as of other data structures. A novel sequence learning mechanism is introduced, and several experiments demonstrate the capacity and sequence learning capability of this memory.Integer SDM uses modular integer vectors rather than binary vectors, improving the representation capabilities of the memory and its noise robustness. Several experiments show its capacity and noise robustness. Theoretical analyses of its capacity and fidelity are also presented.A reduced description represents a whole hierarchy using a single high-dimensional vector, which can recover individual items and directly be used for complex calculations and procedures, such as making analogies. Furthermore, the hierarchy can be reconstructed from the single vector. Modular Composite Representation (MCR), a new reduced description model for the representation used in challenging AI applications, provides an attractive tradeoff between expressiveness and simplicity of operations. A theoretical analysis of its noise robustness, several experiments, and comparisons with similar models are presented.My implementations of these memories include an object oriented version using a RAM cache, a version for distributed and multi-threading execution, and a GPU version for fast vector processing

    Encoding Sequential Information in Semantic Space Models: Comparing Holographic Reduced Representation and Random Permutation

    Get PDF
    Circular convolution and random permutation have each been proposed as neurally plausible binding operators capable of encoding sequential information in semantic memory. We perform several controlled comparisons of circular convolution and random permutation as means of encoding paired associates as well as encoding sequential information. Random permutations outperformed convolution with respect to the number of paired associates that can be reliably stored in a single memory trace. Performance was equal on semantic tasks when using a small corpus, but random permutations were ultimately capable of achieving superior performance due to their higher scalability to large corpora. Finally, “noisy” permutations in which units are mapped to other units arbitrarily (no one-to-one mapping) perform nearly as well as true permutations. These findings increase the neurological plausibility of random permutations and highlight their utility in vector space models of semantics

    AI of Brain and Cognitive Sciences: From the Perspective of First Principles

    Full text link
    Nowadays, we have witnessed the great success of AI in various applications, including image classification, game playing, protein structure analysis, language translation, and content generation. Despite these powerful applications, there are still many tasks in our daily life that are rather simple to humans but pose great challenges to AI. These include image and language understanding, few-shot learning, abstract concepts, and low-energy cost computing. Thus, learning from the brain is still a promising way that can shed light on the development of next-generation AI. The brain is arguably the only known intelligent machine in the universe, which is the product of evolution for animals surviving in the natural environment. At the behavior level, psychology and cognitive sciences have demonstrated that human and animal brains can execute very intelligent high-level cognitive functions. At the structure level, cognitive and computational neurosciences have unveiled that the brain has extremely complicated but elegant network forms to support its functions. Over years, people are gathering knowledge about the structure and functions of the brain, and this process is accelerating recently along with the initiation of giant brain projects worldwide. Here, we argue that the general principles of brain functions are the most valuable things to inspire the development of AI. These general principles are the standard rules of the brain extracting, representing, manipulating, and retrieving information, and here we call them the first principles of the brain. This paper collects six such first principles. They are attractor network, criticality, random network, sparse coding, relational memory, and perceptual learning. On each topic, we review its biological background, fundamental property, potential application to AI, and future development.Comment: 59 pages, 5 figures, review articl

    Learning probabilistic neural representations with randomly connected circuits

    Get PDF
    The brain represents and reasons probabilistically about complex stimuli and motor actions using a noisy, spike-based neural code. A key building block for such neural computations, as well as the basis for supervised and unsupervised learning, is the ability to estimate the surprise or likelihood of incoming high-dimensional neural activity patterns. Despite progress in statistical modeling of neural responses and deep learning, current approaches either do not scale to large neural populations or cannot be implemented using biologically realistic mechanisms. Inspired by the sparse and random connectivity of real neuronal circuits, we present a model for neural codes that accurately estimates the likelihood of individual spiking patterns and has a straightforward, scalable, efficient, learnable, and realistic neural implementation. This model’s performance on simultaneously recorded spiking activity of >100 neurons in the monkey visual and prefrontal cortices is comparable with or better than that of state-of-the-art models. Importantly, the model can be learned using a small number of samples and using a local learning rule that utilizes noise intrinsic to neural circuits. Slower, structural changes in random connectivity, consistent with rewiring and pruning processes, further improve the efficiency and sparseness of the resulting neural representations. Our results merge insights from neuroanatomy, machine learning, and theoretical neuroscience to suggest random sparse connectivity as a key design principle for neuronal computation

    Integer Echo State Networks: Hyperdimensional Reservoir Computing

    Full text link
    We propose an approximation of Echo State Networks (ESN) that can be efficiently implemented on digital hardware based on the mathematics of hyperdimensional computing. The reservoir of the proposed Integer Echo State Network (intESN) is a vector containing only n-bits integers (where n<8 is normally sufficient for a satisfactory performance). The recurrent matrix multiplication is replaced with an efficient cyclic shift operation. The intESN architecture is verified with typical tasks in reservoir computing: memorizing of a sequence of inputs; classifying time-series; learning dynamic processes. Such an architecture results in dramatic improvements in memory footprint and computational efficiency, with minimal performance loss.Comment: 10 pages, 10 figures, 1 tabl

    Pattern Recognition Using Associative Memories

    Get PDF
    The human brain is extremely effective at performing pattern recognition, even in the presence of noisy or distorted inputs. Artificial neural networks attempt to imitate the structure of the brain, often with a view to mimicking its success. The binary correlation matrix memory (CMM) is a particular type of neural network that is capable of learning and recalling associations extremely quickly, as well as displaying a high storage capacity and having the ability to generalise from patterns already learned. CMMs have been used as a major component of larger architectures designed to solve a wide range of problems, such as rule chaining, character recognition, or more general pattern recognition. It is clear that the memory requirement of the CMMs will thus have a significant impact on the scalability of such architectures. A domain specific language for binary CMMs is developed, alongside an implementation that uses an efficient storage mechanism which allows memory usage to scale linearly with the number of associations stored. An architecture for rule chaining is then examined in detail, showing that the problem of scalability is indeed settled before identifying and resolving a number of important limitations to its capabilities. Finally an architecture for pattern recognition is investigated, and a memory efficient method to incorporate general invariance into this architecture is presented—this is specifically tested with scale invariance, although the mechanism can be used with other types of invariance such as skew or rotation

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
    • …
    corecore