4,146 research outputs found
Analog-to-digital conversion revolutionized by deep learning
As the bridge between the analog world and digital computers,
analog-to-digital converters are generally used in modern information systems
such as radar, surveillance, and communications. For the configuration of
analog-to-digital converters in future high-frequency broadband systems, we
introduce a revolutionary architecture that adopts deep learning technology to
overcome tradeoffs between bandwidth, sampling rate, and accuracy. A photonic
front-end provides broadband capability for direct sampling and speed
multiplication. Trained deep neural networks learn the patterns of system
defects, maintaining high accuracy of quantized data in a succinct and adaptive
manner. Based on numerical and experimental demonstrations, we show that the
proposed architecture outperforms state-of-the-art analog-to-digital
converters, confirming the potential of our approach in future
analog-to-digital converter design and performance enhancement of future
information systems
Deep segmentation networks predict survival of non-small cell lung cancer
Non-small-cell lung cancer (NSCLC) represents approximately 80-85% of lung
cancer diagnoses and is the leading cause of cancer-related death worldwide.
Recent studies indicate that image-based radiomics features from positron
emission tomography-computed tomography (PET/CT) images have predictive power
on NSCLC outcomes. To this end, easily calculated functional features such as
the maximum and the mean of standard uptake value (SUV) and total lesion
glycolysis (TLG) are most commonly used for NSCLC prognostication, but their
prognostic value remains controversial. Meanwhile, convolutional neural
networks (CNN) are rapidly emerging as a new premise for cancer image analysis,
with significantly enhanced predictive power compared to other hand-crafted
radiomics features. Here we show that CNN trained to perform the tumor
segmentation task, with no other information than physician contours, identify
a rich set of survival-related image features with remarkable prognostic value.
In a retrospective study on 96 NSCLC patients before stereotactic-body
radiotherapy (SBRT), we found that the CNN segmentation algorithm (U-Net)
trained for tumor segmentation in PET/CT images, contained features having
strong correlation with 2- and 5-year overall and disease-specific survivals.
The U-net algorithm has not seen any other clinical information (e.g. survival,
age, smoking history) than the images and the corresponding tumor contours
provided by physicians. Furthermore, through visualization of the U-Net, we
also found convincing evidence that the regions of progression appear to match
with the regions where the U-Net features identified patterns that predicted
higher likelihood of death. We anticipate our findings will be a starting point
for more sophisticated non-intrusive patient specific cancer prognosis
determination
Log-DenseNet: How to Sparsify a DenseNet
Skip connections are increasingly utilized by deep neural networks to improve
accuracy and cost-efficiency. In particular, the recent DenseNet is efficient
in computation and parameters, and achieves state-of-the-art predictions by
directly connecting each feature layer to all previous ones. However,
DenseNet's extreme connectivity pattern may hinder its scalability to high
depths, and in applications like fully convolutional networks, full DenseNet
connections are prohibitively expensive. This work first experimentally shows
that one key advantage of skip connections is to have short distances among
feature layers during backpropagation. Specifically, using a fixed number of
skip connections, the connection patterns with shorter backpropagation distance
among layers have more accurate predictions. Following this insight, we propose
a connection template, Log-DenseNet, which, in comparison to DenseNet, only
slightly increases the backpropagation distances among layers from 1 to (), but uses only total connections instead of .
Hence, Log-DenseNets are easier than DenseNets to implement and to scale. We
demonstrate the effectiveness of our design principle by showing better
performance than DenseNets on tabula rasa semantic segmentation, and
competitive results on visual recognition
Learning to Generate Images with Perceptual Similarity Metrics
Deep networks are increasingly being applied to problems involving image
synthesis, e.g., generating images from textual descriptions and reconstructing
an input image from a compact representation. Supervised training of
image-synthesis networks typically uses a pixel-wise loss (PL) to indicate the
mismatch between a generated image and its corresponding target image. We
propose instead to use a loss function that is better calibrated to human
perceptual judgments of image quality: the multiscale structural-similarity
score (MS-SSIM). Because MS-SSIM is differentiable, it is easily incorporated
into gradient-descent learning. We compare the consequences of using MS-SSIM
versus PL loss on training deterministic and stochastic autoencoders. For three
different architectures, we collected human judgments of the quality of image
reconstructions. Observers reliably prefer images synthesized by
MS-SSIM-optimized models over those synthesized by PL-optimized models, for two
distinct PL measures ( and distances). We also explore the
effect of training objective on image encoding and analyze conditions under
which perceptually-optimized representations yield better performance on image
classification. Finally, we demonstrate the superiority of
perceptually-optimized networks for super-resolution imaging. Just as computer
vision has advanced through the use of convolutional architectures that mimic
the structure of the mammalian visual system, we argue that significant
additional advances can be made in modeling images through the use of training
objectives that are well aligned to characteristics of human perception
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Convolutional Neural Networks (CNN) have been regarded as a powerful class of
models for image recognition problems. Nevertheless, it is not trivial when
utilizing a CNN for learning spatio-temporal video representation. A few
studies have shown that performing 3D convolutions is a rewarding approach to
capture both spatial and temporal dimensions in videos. However, the
development of a very deep 3D CNN from scratch results in expensive
computational cost and memory demand. A valid question is why not recycle
off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple
variants of bottleneck building blocks in a residual learning framework by
simulating convolutions with convolutional
filters on spatial domain (equivalent to 2D CNN) plus
convolutions to construct temporal connections on adjacent feature maps in
time. Furthermore, we propose a new architecture, named Pseudo-3D Residual Net
(P3D ResNet), that exploits all the variants of blocks but composes each in
different placement of ResNet, following the philosophy that enhancing
structural diversity with going deep could improve the power of neural
networks. Our P3D ResNet achieves clear improvements on Sports-1M video
classification dataset against 3D CNN and frame-based 2D CNN by 5.3% and 1.8%,
respectively. We further examine the generalization performance of video
representation produced by our pre-trained P3D ResNet on five different
benchmarks and three different tasks, demonstrating superior performances over
several state-of-the-art techniques.Comment: ICCV 2017; The codes and model of our P3D ResNet are publicly
available at: https://github.com/ZhaofanQiu/pseudo-3d-residual-network
RoomNet: End-to-End Room Layout Estimation
This paper focuses on the task of room layout estimation from a monocular RGB
image. Prior works break the problem into two sub-tasks: semantic segmentation
of floor, walls, ceiling to produce layout hypotheses, followed by an iterative
optimization step to rank these hypotheses. In contrast, we adopt a more direct
formulation of this problem as one of estimating an ordered set of room layout
keypoints. The room layout and the corresponding segmentation is completely
specified given the locations of these ordered keypoints. We predict the
locations of the room layout keypoints using RoomNet, an end-to-end trainable
encoder-decoder network. On the challenging benchmark datasets Hedau and LSUN,
we achieve state-of-the-art performance along with 200x to 600x speedup
compared to the most recent work. Additionally, we present optional extensions
to the RoomNet architecture such as including recurrent computations and memory
units to refine the keypoint locations under the same parametric capacity.Comment: accepted at ICCV 201
SHADE: Information Based Regularization for Deep Learning
Regularization is a big issue for training deep neural networks. In this
paper, we propose a new information-theory-based regularization scheme named
SHADE for SHAnnon DEcay. The originality of the approach is to define a prior
based on conditional entropy, which explicitly decouples the learning of
invariant representations in the regularizer and the learning of correlations
between inputs and labels in the data fitting term. Our second contribution is
to derive a stochastic version of the regularizer compatible with deep
learning, resulting in a tractable training scheme. We empirically validate the
efficiency of our approach to improve classification performances compared to
common regularization schemes on several standard architectures
Why are deep nets reversible: A simple theory, with implications for training
Generative models for deep learning are promising both to improve
understanding of the model, and yield training methods requiring fewer labeled
samples.
Recent works use generative model approaches to produce the deep net's input
given the value of a hidden layer several levels above. However, there is no
accompanying "proof of correctness" for the generative model, showing that the
feedforward deep net is the correct inference method for recovering the hidden
layer given the input. Furthermore, these models are complicated.
The current paper takes a more theoretical tack. It presents a very simple
generative model for RELU deep nets, with the following characteristics: (i)
The generative model is just the reverse of the feedforward net: if the forward
transformation at a layer is then the reverse transformation is .
(This can be seen as an explanation of the old weight tying idea for denoising
autoencoders.) (ii) Its correctness can be proven under a clean theoretical
assumption: the edge weights in real-life deep nets behave like random numbers.
Under this assumption ---which is experimentally tested on real-life nets like
AlexNet--- it is formally proved that feed forward net is a correct inference
method for recovering the hidden layer.
The generative model suggests a simple modification for training: use the
generative model to produce synthetic data with labels and include it in the
training set. Experiments are shown to support this theory of random-like deep
nets; and that it helps the training
Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection
This paper presents a novel 3D object detection framework that processes
LiDAR data directly on its native representation: range images. Benefiting from
the compactness of range images, 2D convolutions can efficiently process dense
LiDAR data of a scene. To overcome scale sensitivity in this perspective view,
a novel range-conditioned dilation (RCD) layer is proposed to dynamically
adjust a continuous dilation rate as a function of the measured range.
Furthermore, localized soft range gating combined with a 3D box-refinement
stage improves robustness in occluded areas, and produces overall more accurate
bounding box predictions. On the public large-scale Waymo Open Dataset, our
method sets a new baseline for range-based 3D detection, outperforming
multiview and voxel-based methods over all ranges with unparalleled performance
at long range detection.Comment: CoRL 202
Learning Representations and Generative Models for 3D Point Clouds
Three-dimensional geometric data offer an excellent domain for studying
representation learning and generative modeling. In this paper, we look at
geometric data represented as point clouds. We introduce a deep AutoEncoder
(AE) network with state-of-the-art reconstruction quality and generalization
ability. The learned representations outperform existing methods on 3D
recognition tasks and enable shape editing via simple algebraic manipulations,
such as semantic part editing, shape analogies and shape interpolation, as well
as shape completion. We perform a thorough study of different generative models
including GANs operating on the raw point clouds, significantly improved GANs
trained in the fixed latent space of our AEs, and Gaussian Mixture Models
(GMMs). To quantitatively evaluate generative models we introduce measures of
sample fidelity and diversity based on matchings between sets of point clouds.
Interestingly, our evaluation of generalization, fidelity and diversity reveals
that GMMs trained in the latent space of our AEs yield the best results
overall
- …