7,392 research outputs found
ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing
With the aim of developing a fast yet accurate algorithm for compressive
sensing (CS) reconstruction of natural images, we combine in this paper the
merits of two existing categories of CS methods: the structure insights of
traditional optimization-based methods and the speed of recent network-based
ones. Specifically, we propose a novel structured deep network, dubbed
ISTA-Net, which is inspired by the Iterative Shrinkage-Thresholding Algorithm
(ISTA) for optimizing a general norm CS reconstruction model. To cast
ISTA into deep network form, we develop an effective strategy to solve the
proximal mapping associated with the sparsity-inducing regularizer using
nonlinear transforms. All the parameters in ISTA-Net (\eg nonlinear transforms,
shrinkage thresholds, step sizes, etc.) are learned end-to-end, rather than
being hand-crafted. Moreover, considering that the residuals of natural images
are more compressible, an enhanced version of ISTA-Net in the residual domain,
dubbed {ISTA-Net}, is derived to further improve CS reconstruction.
Extensive CS experiments demonstrate that the proposed ISTA-Nets outperform
existing state-of-the-art optimization-based and network-based CS methods by
large margins, while maintaining fast computational speed. Our source codes are
available: \textsl{http://jianzhang.tech/projects/ISTA-Net}.Comment: 10 pages, 6 figures, 4 Tables. To appear in CVPR 201
English Conversational Telephone Speech Recognition by Humans and Machines
One of the most difficult speech recognition tasks is accurate recognition of
human to human communication. Advances in deep learning over the last few years
have produced major speech recognition improvements on the representative
Switchboard conversational corpus. Word error rates that just a few years ago
were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now
believed to be within striking range of human performance. This then raises two
issues - what IS human performance, and how far down can we still drive speech
recognition error rates? A recent paper by Microsoft suggests that we have
already achieved human performance. In trying to verify this statement, we
performed an independent set of human performance measurements on two
conversational tasks and found that human performance may be considerably
better than what was earlier reported, giving the community a significantly
harder goal to achieve. We also report on our own efforts in this area,
presenting a set of acoustic and language modeling techniques that lowered the
word error rate of our own English conversational telephone LVCSR system to the
level of 5.5%/10.3% on the Switchboard/CallHome subsets of the Hub5 2000
evaluation, which - at least at the writing of this paper - is a new
performance milestone (albeit not at what we measure to be human performance!).
On the acoustic side, we use a score fusion of three models: one LSTM with
multiple feature inputs, a second LSTM trained with speaker-adversarial
multi-task learning and a third residual net (ResNet) with 25 convolutional
layers and time-dilated convolutions. On the language modeling side, we use
word and character LSTMs and convolutional WaveNet-style language models
- …