224 research outputs found

    On Sparsifying Encoder Outputs in Sequence-to-Sequence Models

    Get PDF
    Sequence-to-sequence models usually transfer all encoder outputs to the decoder for generation. In this work, by contrast, we hypothesize that these encoder outputs can be compressed to shorten the sequence delivered for decoding. We take Transformer as the testbed and introduce a layer of stochastic gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty, resulting in completely masking-out a subset of encoder outputs. In other words, via joint training, the L0DROP layer forces Transformer to route information through a subset of its encoder states. We investigate the effects of this sparsification on two machine translation and two summarization tasks. Experiments show that, depending on the task, around 40-70% of source encodings can be pruned without significantly compromising quality. The decrease of the output length endows L0DROP with the potential of improving decoding efficiency, where it yields a speedup of up to 1.65x on document summarization tasks against the standard Transformer. We analyze the L0DROP behaviour and observe that it exhibits systematic preferences for pruning certain word types, e.g., function words and punctuation get pruned most. Inspired by these observations, we explore the feasibility of specifying rule-based patterns that mask out encoder outputs based on information such as part-of-speech tags, word frequency and word position

    Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

    Full text link
    Singing voice separation based on deep learning relies on the usage of time-frequency masking. In many cases the masking process is not a learnable function or is not encapsulated into the deep learning optimization. Consequently, most of the existing methods rely on a post processing step using the generalized Wiener filtering. This work proposes a method that learns and optimizes (during training) a source-dependent mask and does not need the aforementioned post processing step. We introduce a recurrent inference algorithm, a sparse transformation step to improve the mask generation process, and a learned denoising filter. Obtained results show an increase of 0.49 dB for the signal to distortion ratio and 0.30 dB for the signal to interference ratio, compared to previous state-of-the-art approaches for monaural singing voice separation

    Predefined Sparseness in Recurrent Sequence Models

    Full text link
    Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.Comment: the SIGNLL Conference on Computational Natural Language Learning (CoNLL, 2018

    Adaptive Feature Selection for End-to-End Speech Translation

    Get PDF
    Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features. In this paper, we propose adaptive feature selection (AFS) for encoder-decoder based E2E ST. We first pre-train an ASR encoder and apply AFS to dynamically estimate the importance of each encoded speech feature to ASR. A ST encoder, stacked on top of the ASR encoder, then receives the filtered features from the (frozen) ASR encoder. We take L0DROP (Zhang et al., 2020) as the backbone for AFS, and adapt it to sparsify speech features with respect to both temporal and feature dimensions. Results on LibriSpeech EnFr and MuST-C benchmarks show that AFS facilitates learning of ST by pruning out ~84% temporal features, yielding an average translation gain of ~1.3-1.6 BLEU and a decoding speedup of ~1.4x. In particular, AFS reduces the performance gap compared to the cascade baseline, and outperforms it on LibriSpeech En-Fr with a BLEU score of 18.56 (without data augmentation)

    Machine learning in Magnetic Resonance Imaging: Image reconstruction.

    Get PDF
    Magnetic Resonance Imaging (MRI) plays a vital role in diagnosis, management and monitoring of many diseases. However, it is an inherently slow imaging technique. Over the last 20 years, parallel imaging, temporal encoding and compressed sensing have enabled substantial speed-ups in the acquisition of MRI data, by accurately recovering missing lines of k-space data. However, clinical uptake of vastly accelerated acquisitions has been limited, in particular in compressed sensing, due to the time-consuming nature of the reconstructions and unnatural looking images. Following the success of machine learning in a wide range of imaging tasks, there has been a recent explosion in the use of machine learning in the field of MRI image reconstruction. A wide range of approaches have been proposed, which can be applied in k-space and/or image-space. Promising results have been demonstrated from a range of methods, enabling natural looking images and rapid computation. In this review article we summarize the current machine learning approaches used in MRI reconstruction, discuss their drawbacks, clinical applications, and current trends

    3D spatio-temporal analysis for compressive sensing in magnetic resonance imaging of the murine cardiac cycle

    Get PDF
    This thesis consists of two major contributions, each of which has been prepared in a conference paper. These papers will be submitted for publication in the SPIE 2013 Medical Imaging Conference and the ASEE 2013 Annual Conference. The first paper explores a three-dimensional compressive sensing (CS) technique for reducing measurement time in MR imaging of the murine (mouse) cardiac cycle. By randomly undersampling a single 2D slice of a mouse heart at regular time intervals as it expands and contracts through the stages of a heartbeat, a CS reconstruction algorithm can be made to exploit transform sparsity in time as well as space. For the purposes of measuring the left ventricular volume in the mouse heart, this 3D approach offers significant advantages against classical 2D spatial compressive sensing. The second paper describes the modification and testing of a set of laboratory exercises for developing an undergraduate level understanding of Simulink. An existing partial set of lab exercises for Simulink was obtained and improved considerably in pedagogical utility, and then the completed set of pilot exercises was taught as a part of a communications course at the Missouri University of Science and Technology in order to gauge student responses and learning experiences. In this paper, the content of the laboratory exercises with corresponding educational approaches are discussed, along with student feedback and future improvements. --Abstract, page iv

    Sparsely Aggregated Convolutional Networks

    Full text link
    We explore a key architectural aspect of deep convolutional neural networks: the pattern of internal skip connections used to aggregate outputs of earlier layers for consumption by deeper layers. Such aggregation is critical to facilitate training of very deep networks in an end-to-end manner. This is a primary reason for the widespread adoption of residual networks, which aggregate outputs via cumulative summation. While subsequent works investigate alternative aggregation operations (e.g. concatenation), we focus on an orthogonal question: which outputs to aggregate at a particular point in the network. We propose a new internal connection structure which aggregates only a sparse set of previous outputs at any given depth. Our experiments demonstrate this simple design change offers superior performance with fewer parameters and lower computational requirements. Moreover, we show that sparse aggregation allows networks to scale more robustly to 1000+ layers, thereby opening future avenues for training long-running visual processes.Comment: Accepted to ECCV 201
    • …
    corecore