4,036 research outputs found

    An Efficient Threshold-Driven Aggregate-Label Learning Algorithm for Multimodal Information Processing

    Get PDF
    The aggregate-label learning paradigm tackles the long-standing temporary credit assignment (TCA) problem in neuroscience and machine learning, enabling spiking neural networks to learn multimodal sensory clues with delayed feedback signals. However, the existing aggregate-label learning algorithms only work for single spiking neurons, and with low learning efficiency, which limit their real-world applicability. To address these limitations, we first propose an efficient threshold-driven plasticity algorithm for spiking neurons, namely ETDP. It enables spiking neurons to generate the desired number of spikes that match the magnitude of delayed feedback signals and to learn useful multimodal sensory clues embedded within spontaneous spiking activities. Furthermore, we extend the ETDP algorithm to support multi-layer spiking neural networks (SNNs), which significantly improves the applicability of aggregate-label learning algorithms. We also validate the multi-layer ETDP learning algorithm in a multimodal computation framework for audio-visual pattern recognition. Experimental results on both synthetic and realistic datasets show significant improvements in the learning efficiency and model capacity over the existing aggregate-label learning algorithms. It, therefore, provides many opportunities for solving real-world multimodal pattern recognition tasks with spiking neural networks

    Learning Multimodal Graph-to-Graph Translation for Molecular Optimization

    Full text link
    We view molecular optimization as a graph-to-graph translation problem. The goal is to learn to map from one molecular graph to another with better properties based on an available corpus of paired molecules. Since molecules can be optimized in different ways, there are multiple viable translations for each input graph. A key challenge is therefore to model diverse translation outputs. Our primary contributions include a junction tree encoder-decoder for learning diverse graph translations along with a novel adversarial training method for aligning distributions of molecules. Diverse output distributions in our model are explicitly realized by low-dimensional latent vectors that modulate the translation process. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines

    Learning Multimodal Structures in Computer Vision

    Get PDF
    A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately. We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand. Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power. We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition
    • …
    corecore