35,699 research outputs found

    Mutual Information Scaling and Expressive Power of Sequence Models

    Full text link
    Sequence models assign probabilities to variable-length sequences such as natural language texts. The ability of sequence models to capture temporal dependence can be characterized by the temporal scaling of correlation and mutual information. In this paper, we study the mutual information of recurrent neural networks (RNNs) including long short-term memories and self-attention networks such as Transformers. Through a combination of theoretical study of linear RNNs and empirical study of nonlinear RNNs, we find their mutual information decays exponentially in temporal distance. On the other hand, Transformers can capture long-range mutual information more efficiently, making them preferable in modeling sequences with slow power-law mutual information, such as natural languages and stock prices. We discuss the connection of these results with statistical mechanics. We also point out the non-uniformity problem in many natural language datasets. We hope this work provides a new perspective in understanding the expressive power of sequence models and shed new light on improving the architecture of them.Comment: 12 + 15 pages. Comments are welcom

    Conditional WaveGAN

    Full text link
    Generative models are successfully used for image synthesis in the recent years. But when it comes to other modalities like audio, text etc little progress has been made. Recent works focus on generating audio from a generative model in an unsupervised setting. We explore the possibility of using generative models conditioned on class labels. Concatenation based conditioning and conditional scaling were explored in this work with various hyper-parameter tuning methods. In this paper we introduce Conditional WaveGANs (cWaveGAN). Find our implementation at https://github.com/acheketa/cwaveganComment: Preprin

    Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

    Full text link
    We propose a fully convolutional sequence-to-sequence encoder architecture with a simple and efficient decoder. Our model improves WER on LibriSpeech while being an order of magnitude more efficient than a strong RNN baseline. Key to our approach is a time-depth separable convolution block which dramatically reduces the number of parameters in the model while keeping the receptive field large. We also give a stable and efficient beam search inference procedure which allows us to effectively integrate a language model. Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set

    GrAMME: Semi-Supervised Learning using Multi-layered Graph Attention Models

    Full text link
    Modern data analysis pipelines are becoming increasingly complex due to the presence of multi-view information sources. While graphs are effective in modeling complex relationships, in many scenarios a single graph is rarely sufficient to succinctly represent all interactions, and hence multi-layered graphs have become popular. Though this leads to richer representations, extending solutions from the single-graph case is not straightforward. Consequently, there is a strong need for novel solutions to solve classical problems, such as node classification, in the multi-layered case. In this paper, we consider the problem of semi-supervised learning with multi-layered graphs. Though deep network embeddings, e.g. DeepWalk, are widely adopted for community discovery, we argue that feature learning with random node attributes, using graph neural networks, can be more effective. To this end, we propose to use attention models for effective feature learning, and develop two novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the inter-layer dependencies for building multi-layered graph embeddings. Using empirical studies on several benchmark datasets, we evaluate the proposed approaches and demonstrate significant performance improvements in comparison to state-of-the-art network embedding strategies. The results also show that using simple random features is an effective choice, even in cases where explicit node attributes are not available

    Uncertainty in the Variational Information Bottleneck

    Full text link
    We present a simple case study, demonstrating that Variational Information Bottleneck (VIB) can improve a network's classification calibration as well as its ability to detect out-of-distribution data. Without explicitly being designed to do so, VIB gives two natural metrics for handling and quantifying uncertainty.Comment: 10 pages, 7 figures. Accepted to UAI 2018 - Uncertainty in Deep Learning Worksho

    Quantum Computation: Particle and Wave Aspects of Algorithms

    Full text link
    The driving force in the pursuit for quantum computation is the exciting possibility that quantum algorithms can be more efficient than their classical analogues. Research on the subject has unraveled several aspects of how that can happen. Clever quantum algorithms have been discovered in recent years, although not systematically, and the field remains under active investigation. Richard Feynman was one of the pioneers who foresaw the power of quantum computers. In this issue dedicated to him, I give an introduction to how particle and wave aspects contribute to the power of quantum computers. Shor's and Grover's algorithms are analysed as examples.Comment: 6 pages, Prepared for the issue of Resonance in honour of Richard Feynman (v2) Minor changes. Published versio

    Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data

    Full text link
    Surrogate modeling and uncertainty quantification tasks for PDE systems are most often considered as supervised learning problems where input and output data pairs are used for training. The construction of such emulators is by definition a small data problem which poses challenges to deep learning approaches that have been developed to operate in the big data regime. Even in cases where such models have been shown to have good predictive capability in high dimensions, they fail to address constraints in the data implied by the PDE model. This paper provides a methodology that incorporates the governing equations of the physical model in the loss/likelihood functions. The resulting physics-constrained, deep learning models are trained without any labeled data (e.g. employing only input data) and provide comparable predictive responses with data-driven models while obeying the constraints of the problem at hand. This work employs a convolutional encoder-decoder neural network approach as well as a conditional flow-based generative model for the solution of PDEs, surrogate model construction, and uncertainty quantification tasks. The methodology is posed as a minimization problem of the reverse Kullback-Leibler (KL) divergence between the model predictive density and the reference conditional density, where the later is defined as the Boltzmann-Gibbs distribution at a given inverse temperature with the underlying potential relating to the PDE system of interest. The generalization capability of these models to out-of-distribution input is considered. Quantification and interpretation of the predictive uncertainty is provided for a number of problems.Comment: 51 pages, 18 figures, submitted to Journal of Computational Physic

    SOC computer simulations

    Full text link
    The following chapter provides an overview of the techniques used to understand Self-Organised Criticality (SOC) by performing computer simulations. Those are of particular significance in SOC, given its very paradigm, the BTW (Bak-Tang-Wiesenfeld) sandpile, was introduced on the basis of a process that is conveniently implemented as a computer program. The chapter is divided into three sections: In the first section a number of key concepts are introduced, followed by four brief presentations of SOC models which are most commonly investigated or which have played an important part in the development of the field as a whole. The second section is concerned with the basics of scaling with particular emphasis of its role in numerical models of SOC, introducing a number of basic tools for data analysis such as binning, moment analysis and error estimation. The third section is devoted to numerical methods and algorithms as applied to SOC models, addressing typical computational questions with the particular application of SOC in mind. The present chapter is rather technical, but hands-on at the same time, providing practical advice and even code snippets (in C) wherever possible.Comment: 57 pages, 5 figures, chapter 7 of e-book Self-Organized Criticality Systems, edited by M Aschwanden, OpenAcademicPres

    Hierarchical Annotation of Images with Two-Alternative-Forced-Choice Metric Learning

    Full text link
    Many tasks such as retrieval and recommendations can significantly benefit from structuring the data, commonly in a hierarchical way. To achieve this through annotations of high dimensional data such as images or natural text can be significantly labor intensive. We propose an approach for uncovering the hierarchical structure of data based on efficient discriminative testing rather than annotations of individual datapoints. Using two-alternative-forced-choice (2AFC) testing and deep metric learning we achieve embedding of the data in semantic space where we are able to successfully hierarchically cluster. We actively select triplets for the 2AFC test such that the modeling process is highly efficient with respect to the number of tests presented to the annotator. We empirically demonstrate the feasibility of the method by confirming the shape bias on synthetic data and extract hierarchical structure on the Fashion-MNIST dataset to a finer granularity than the original labels.Comment: presented at 2019 ICML Workshop on Human in the Loop Learning (HILL 2019), Long Beach, US

    Depth Adaptive Deep Neural Network for Semantic Segmentation

    Full text link
    In this work, we present the depth-adaptive deep neural network using a depth map for semantic segmentation. Typical deep neural networks receive inputs at the predetermined locations regardless of the distance from the camera. This fixed receptive field presents a challenge to generalize the features of objects at various distances in neural networks. Specifically, the predetermined receptive fields are too small at a short distance, and vice versa. To overcome this challenge, we develop a neural network which is able to adapt the receptive field not only for each layer but also for each neuron at the spatial location. To adjust the receptive field, we propose the depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive perception neuron and the in-layer multiscale neuron. The adaptive perception neuron is to adjust the receptive field at each spatial location using the corresponding depth information. The in-layer multiscale neuron is to apply the different size of the receptive field at each feature space to learn features at multiple scales. The proposed DaM convolution is applied to two fully convolutional neural networks. We demonstrate the effectiveness of the proposed neural networks on the publicly available RGB-D dataset for semantic segmentation and the novel hand segmentation dataset for hand-object interaction. The experimental results show that the proposed method outperforms the state-of-the-art methods without any additional layers or pre/post-processing.Comment: IEEE Transactions on Multimedia, 201
    • …
    corecore