224 research outputs found
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models
Sequence-to-sequence models usually transfer all encoder outputs to the
decoder for generation. In this work, by contrast, we hypothesize that these
encoder outputs can be compressed to shorten the sequence delivered for
decoding. We take Transformer as the testbed and introduce a layer of
stochastic gates in-between the encoder and the decoder. The gates are
regularized using the expected value of the sparsity-inducing L0penalty,
resulting in completely masking-out a subset of encoder outputs. In other
words, via joint training, the L0DROP layer forces Transformer to route
information through a subset of its encoder states. We investigate the effects
of this sparsification on two machine translation and two summarization tasks.
Experiments show that, depending on the task, around 40-70% of source encodings
can be pruned without significantly compromising quality. The decrease of the
output length endows L0DROP with the potential of improving decoding
efficiency, where it yields a speedup of up to 1.65x on document summarization
tasks against the standard Transformer. We analyze the L0DROP behaviour and
observe that it exhibits systematic preferences for pruning certain word types,
e.g., function words and punctuation get pruned most. Inspired by these
observations, we explore the feasibility of specifying rule-based patterns that
mask out encoder outputs based on information such as part-of-speech tags, word
frequency and word position
Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask
Singing voice separation based on deep learning relies on the usage of
time-frequency masking. In many cases the masking process is not a learnable
function or is not encapsulated into the deep learning optimization.
Consequently, most of the existing methods rely on a post processing step using
the generalized Wiener filtering. This work proposes a method that learns and
optimizes (during training) a source-dependent mask and does not need the
aforementioned post processing step. We introduce a recurrent inference
algorithm, a sparse transformation step to improve the mask generation process,
and a learned denoising filter. Obtained results show an increase of 0.49 dB
for the signal to distortion ratio and 0.30 dB for the signal to interference
ratio, compared to previous state-of-the-art approaches for monaural singing
voice separation
Predefined Sparseness in Recurrent Sequence Models
Inducing sparseness while training neural networks has been shown to yield
models with a lower memory footprint but similar effectiveness to dense models.
However, sparseness is typically induced starting from a dense model, and thus
this advantage does not hold during training. We propose techniques to enforce
sparseness upfront in recurrent sequence models for NLP applications, to also
benefit training. First, in language modeling, we show how to increase hidden
state sizes in recurrent layers without increasing the number of parameters,
leading to more expressive models. Second, for sequence labeling, we show that
word embeddings with predefined sparseness lead to similar performance as dense
embeddings, at a fraction of the number of trainable parameters.Comment: the SIGNLL Conference on Computational Natural Language Learning
(CoNLL, 2018
Adaptive Feature Selection for End-to-End Speech Translation
Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features. In this paper, we propose adaptive feature selection (AFS) for encoder-decoder based E2E ST. We first pre-train an ASR encoder and apply AFS to dynamically estimate the importance of each encoded speech feature to ASR. A ST encoder, stacked on top of the ASR encoder, then receives the filtered features from the (frozen) ASR encoder. We take L0DROP (Zhang et al., 2020) as the backbone for AFS, and adapt it to sparsify speech features with respect to both temporal and feature dimensions. Results on LibriSpeech EnFr and MuST-C benchmarks show that AFS facilitates learning of ST by pruning out ~84% temporal features, yielding an average translation gain of ~1.3-1.6 BLEU and a decoding speedup of ~1.4x. In particular, AFS reduces the performance gap compared to the cascade baseline, and outperforms it on LibriSpeech En-Fr with a BLEU score of 18.56 (without data augmentation)
Machine learning in Magnetic Resonance Imaging: Image reconstruction.
Magnetic Resonance Imaging (MRI) plays a vital role in diagnosis, management and monitoring of many diseases. However, it is an inherently slow imaging technique. Over the last 20Â years, parallel imaging, temporal encoding and compressed sensing have enabled substantial speed-ups in the acquisition of MRI data, by accurately recovering missing lines of k-space data. However, clinical uptake of vastly accelerated acquisitions has been limited, in particular in compressed sensing, due to the time-consuming nature of the reconstructions and unnatural looking images. Following the success of machine learning in a wide range of imaging tasks, there has been a recent explosion in the use of machine learning in the field of MRI image reconstruction. A wide range of approaches have been proposed, which can be applied in k-space and/or image-space. Promising results have been demonstrated from a range of methods, enabling natural looking images and rapid computation. In this review article we summarize the current machine learning approaches used in MRI reconstruction, discuss their drawbacks, clinical applications, and current trends
3D spatio-temporal analysis for compressive sensing in magnetic resonance imaging of the murine cardiac cycle
This thesis consists of two major contributions, each of which has been prepared in a conference paper. These papers will be submitted for publication in the SPIE 2013 Medical Imaging Conference and the ASEE 2013 Annual Conference.
The first paper explores a three-dimensional compressive sensing (CS) technique for reducing measurement time in MR imaging of the murine (mouse) cardiac cycle. By randomly undersampling a single 2D slice of a mouse heart at regular time intervals as it expands and contracts through the stages of a heartbeat, a CS reconstruction algorithm can be made to exploit transform sparsity in time as well as space. For the purposes of measuring the left ventricular volume in the mouse heart, this 3D approach offers significant advantages against classical 2D spatial compressive sensing.
The second paper describes the modification and testing of a set of laboratory exercises for developing an undergraduate level understanding of Simulink. An existing partial set of lab exercises for Simulink was obtained and improved considerably in pedagogical utility, and then the completed set of pilot exercises was taught as a part of a communications course at the Missouri University of Science and Technology in order to gauge student responses and learning experiences. In this paper, the content of the laboratory exercises with corresponding educational approaches are discussed, along with student feedback and future improvements. --Abstract, page iv
Sparsely Aggregated Convolutional Networks
We explore a key architectural aspect of deep convolutional neural networks:
the pattern of internal skip connections used to aggregate outputs of earlier
layers for consumption by deeper layers. Such aggregation is critical to
facilitate training of very deep networks in an end-to-end manner. This is a
primary reason for the widespread adoption of residual networks, which
aggregate outputs via cumulative summation. While subsequent works investigate
alternative aggregation operations (e.g. concatenation), we focus on an
orthogonal question: which outputs to aggregate at a particular point in the
network. We propose a new internal connection structure which aggregates only a
sparse set of previous outputs at any given depth. Our experiments demonstrate
this simple design change offers superior performance with fewer parameters and
lower computational requirements. Moreover, we show that sparse aggregation
allows networks to scale more robustly to 1000+ layers, thereby opening future
avenues for training long-running visual processes.Comment: Accepted to ECCV 201
- …