169,573 research outputs found
Learning-based Single-step Quantitative Susceptibility Mapping Reconstruction Without Brain Extraction
Quantitative susceptibility mapping (QSM) estimates the underlying tissue
magnetic susceptibility from MRI gradient-echo phase signal and typically
requires several processing steps. These steps involve phase unwrapping, brain
volume extraction, background phase removal and solving an ill-posed inverse
problem. The resulting susceptibility map is known to suffer from inaccuracy
near the edges of the brain tissues, in part due to imperfect brain extraction,
edge erosion of the brain tissue and the lack of phase measurement outside the
brain. This inaccuracy has thus hindered the application of QSM for measuring
the susceptibility of tissues near the brain edges, e.g., quantifying cortical
layers and generating superficial venography. To address these challenges, we
propose a learning-based QSM reconstruction method that directly estimates the
magnetic susceptibility from total phase images without the need for brain
extraction and background phase removal, referred to as autoQSM. The neural
network has a modified U-net structure and is trained using QSM maps computed
by a two-step QSM method. 209 healthy subjects with ages ranging from 11 to 82
years were employed for patch-wise network training. The network was validated
on data dissimilar to the training data, e.g. in vivo mouse brain data and
brains with lesions, which suggests that the network has generalized and
learned the underlying mathematical relationship between magnetic field
perturbation and magnetic susceptibility. AutoQSM was able to recover magnetic
susceptibility of anatomical structures near the edges of the brain including
the veins covering the cortical surface, spinal cord and nerve tracts near the
mouse brain boundaries. The advantages of high-quality maps, no need for brain
volume extraction and high reconstruction speed demonstrate its potential for
future applications.Comment: 26 page
Image Matters: Scalable Detection of Offensive and Non-Compliant Content / Logo in Product Images
In e-commerce, product content, especially product images have a significant
influence on a customer's journey from product discovery to evaluation and
finally, purchase decision. Since many e-commerce retailers sell items from
other third-party marketplace sellers besides their own, the content published
by both internal and external content creators needs to be monitored and
enriched, wherever possible. Despite guidelines and warnings, product listings
that contain offensive and non-compliant images continue to enter catalogs.
Offensive and non-compliant content can include a wide range of objects, logos,
and banners conveying violent, sexually explicit, racist, or promotional
messages. Such images can severely damage the customer experience, lead to
legal issues, and erode the company brand. In this paper, we present a computer
vision driven offensive and non-compliant image detection system for extremely
large image datasets. This paper delves into the unique challenges of applying
deep learning to real-world product image data from retail world. We
demonstrate how we resolve a number of technical challenges such as lack of
training data, severe class imbalance, fine-grained class definitions etc.
using a number of practical yet unique technical strategies. Our system
combines state-of-the-art image classification and object detection techniques
with budgeted crowdsourcing to develop a solution customized for a massive,
diverse, and constantly evolving product catalog.Comment: 10 page
signSGD with Majority Vote is Communication Efficient And Fault Tolerant
Training neural networks on large datasets can be accelerated by distributing
the workload over a network of machines. As datasets grow ever larger, networks
of hundreds or thousands of machines become economically viable. The time cost
of communicating gradients limits the effectiveness of using such large machine
counts, as may the increased chance of network faults. We explore a
particularly simple algorithm for robust, communication-efficient
learning---signSGD. Workers transmit only the sign of their gradient vector to
a server, and the overall update is decided by a majority vote. This algorithm
uses less communication per iteration than full-precision,
distributed SGD. Under natural conditions verified by experiment, we prove that
signSGD converges in the large and mini-batch settings, establishing
convergence for a parameter regime of Adam as a byproduct. Aggregating sign
gradients by majority vote means that no individual worker has too much power.
We prove that unlike SGD, majority vote is robust when up to 50% of workers
behave adversarially. The class of adversaries we consider includes as special
cases those that invert or randomise their gradient estimate. On the practical
side, we built our distributed training system in Pytorch. Benchmarking against
the state of the art collective communications library (NCCL), our
framework---with the parameter server housed entirely on one machine---led to a
25% reduction in time for training resnet50 on Imagenet when using 15 AWS
p3.2xlarge machines
On Deep Set Learning and the Choice of Aggregations
Recently, it has been shown that many functions on sets can be represented by
sum decompositions. These decompositons easily lend themselves to neural
approximations, extending the applicability of neural nets to set-valued
inputs---Deep Set learning. This work investigates a core component of Deep Set
architecture: aggregation functions. We suggest and examine alternatives to
commonly used aggregation functions, including learnable recurrent aggregation
functions. Empirically, we show that the Deep Set networks are highly sensitive
to the choice of aggregation functions: beyond improved performance, we find
that learnable aggregations lower hyper-parameter sensitivity and generalize
better to out-of-distribution input size
Notes on Deep Learning for NLP
My notes on Deep Learning for NLP.Comment: work in progres
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
We study the roots of algorithmic progress in deep policy gradient algorithms
through a case study on two popular algorithms: Proximal Policy Optimization
(PPO) and Trust Region Policy Optimization (TRPO). Specifically, we investigate
the consequences of "code-level optimizations:" algorithm augmentations found
only in implementations or described as auxiliary details to the core
algorithm. Seemingly of secondary importance, such optimizations turn out to
have a major impact on agent behavior. Our results show that they (a) are
responsible for most of PPO's gain in cumulative reward over TRPO, and (b)
fundamentally change how RL methods function. These insights show the
difficulty and importance of attributing performance gains in deep
reinforcement learning. Code for reproducing our results is available at
https://github.com/MadryLab/implementation-matters .Comment: ICLR 2020 version. arXiv admin note: text overlap with
arXiv:1811.0255
Deep Learning: Generalization Requires Deep Compositional Feature Space Design
Generalization error defines the discriminability and the representation
power of a deep model. In this work, we claim that feature space design using
deep compositional function plays a significant role in generalization along
with explicit and implicit regularizations. Our claims are being established
with several image classification experiments. We show that the information
loss due to convolution and max pooling can be marginalized with the
compositional design, improving generalization performance. Also, we will show
that learning rate decay acts as an implicit regularizer in deep model
training.Comment: fig added, with minor typo correction
Automatic Liver Lesion Segmentation Using A Deep Convolutional Neural Network Method
Liver lesion segmentation is an important step for liver cancer diagnosis,
treatment planning and treatment evaluation. LiTS (Liver Tumor Segmentation
Challenge) provides a common testbed for comparing different automatic liver
lesion segmentation methods. We participate in this challenge by developing a
deep convolutional neural network (DCNN) method. The particular DCNN model
works in 2.5D in that it takes a stack of adjacent slices as input and produces
the segmentation map corresponding to the center slice. The model has 32 layers
in total and makes use of both long range concatenation connections of U-Net
[1] and short-range residual connections from ResNet [2]. The model was trained
using the 130 LiTS training datasets and achieved an average Dice score of 0.67
when evaluated on the 70 test CT scans, which ranked first for the LiTS
challenge at the time of the ISBI 2017 conference.Comment: Submission for ISBI'2017 LiTS Challenge ISIC201
Experiments on Parallel Training of Deep Neural Network using Model Averaging
In this work we apply model averaging to parallel training of deep neural
network (DNN). Parallelization is done in a model averaging manner. Data is
partitioned and distributed to different nodes for local model updates, and
model averaging across nodes is done every few minibatches. We use multiple
GPUs for data parallelization, and Message Passing Interface (MPI) for
communication between nodes, which allows us to perform model averaging
frequently without losing much time on communication. We investigate the
effectiveness of Natural Gradient Stochastic Gradient Descent (NG-SGD) and
Restricted Boltzmann Machine (RBM) pretraining for parallel training in
model-averaging framework, and explore the best setups in term of different
learning rate schedules, averaging frequencies and minibatch sizes. It is shown
that NG-SGD and RBM pretraining benefits parameter-averaging based model
training. On the 300h Switchboard dataset, a 9.3 times speedup is achieved
using 16 GPUs and 17 times speedup using 32 GPUs with limited decoding accuracy
loss
STAIR Actions: A Video Dataset of Everyday Home Actions
A new large-scale video dataset for human action recognition, called STAIR
Actions is introduced. STAIR Actions contains 100 categories of action labels
representing fine-grained everyday home actions so that it can be applied to
research in various home tasks such as nursing, caring, and security. In STAIR
Actions, each video has a single action label. Moreover, for each action
category, there are around 1,000 videos that were obtained from YouTube or
produced by crowdsource workers. The duration of each video is mostly five to
six seconds. The total number of videos is 102,462. We explain how we
constructed STAIR Actions and show the characteristics of STAIR Actions
compared to existing datasets for human action recognition. Experiments with
three major models for action recognition show that STAIR Actions can train
large models and achieve good performance. STAIR Actions can be downloaded from
http://actions.stair.centerComment: STAIR Actions dataset can be downloaded from
http://actions.stair.cente
- …