4,103 research outputs found
Yes, we GAN: Applying Adversarial Techniques for Autonomous Driving
Generative Adversarial Networks (GAN) have gained a lot of popularity from
their introduction in 2014 till present. Research on GAN is rapidly growing and
there are many variants of the original GAN focusing on various aspects of deep
learning. GAN are perceived as the most impactful direction of machine learning
in the last decade. This paper focuses on the application of GAN in autonomous
driving including topics such as advanced data augmentation, loss function
learning, semi-supervised learning, etc. We formalize and review key
applications of adversarial techniques and discuss challenges and open problems
to be addressed.Comment: Accepted for publication in Electronic Imaging, Autonomous Vehicles
and Machines 2019. arXiv admin note: text overlap with arXiv:1606.05908 by
other author
Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs
In practice, it is common to find oneself with far too little text data to
train a deep neural network. This "Big Data Wall" represents a challenge for
minority language communities on the Internet, organizations, laboratories and
companies that compete the GAFAM (Google, Amazon, Facebook, Apple, Microsoft).
While most of the research effort in text data augmentation aims on the
long-term goal of finding end-to-end learning solutions, which is equivalent to
"using neural networks to feed neural networks", this engineering work focuses
on the use of practical, robust, scalable and easy-to-implement data
augmentation pre-processing techniques similar to those that are successful in
computer vision. Several text augmentation techniques have been experimented.
Some existing ones have been tested for comparison purposes such as noise
injection or the use of regular expressions. Others are modified or improved
techniques like lexical replacement. Finally more innovative ones, such as the
generation of paraphrases using back-translation or by the transformation of
syntactic trees, are based on robust, scalable, and easy-to-use NLP Cloud APIs.
All the text augmentation techniques studied, with an amplification factor of
only 5, increased the accuracy of the results in a range of 4.3% to 21.6%, with
significant statistical fluctuations, on a standardized task of text polarity
prediction. Some standard deep neural network architectures were tested: the
multilayer perceptron (MLP), the long short-term memory recurrent network
(LSTM) and the bidirectional LSTM (biLSTM). Classical XGBoost algorithm has
been tested with up to 2.5% improvements.Comment: 33 pages, 25 figure
A Comprehensive Survey of Grammar Error Correction
Grammar error correction (GEC) is an important application aspect of natural
language processing techniques. The past decade has witnessed significant
progress achieved in GEC for the sake of increasing popularity of machine
learning and deep learning, especially in late 2010s when near human-level GEC
systems are available. However, there is no prior work focusing on the whole
recapitulation of the progress. We present the first survey in GEC for a
comprehensive retrospect of the literature in this area. We first give the
introduction of five public datasets, data annotation schema, two important
shared tasks and four standard evaluation metrics. More importantly, we discuss
four kinds of basic approaches, including statistical machine translation based
approach, neural machine translation based approach, classification based
approach and language model based approach, six commonly applied performance
boosting techniques for GEC systems and two data augmentation methods. Since
GEC is typically viewed as a sister task of machine translation, many GEC
systems are based on neural machine translation (NMT) approaches, where the
neural sequence-to-sequence model is applied. Similarly, some performance
boosting techniques are adapted from machine translation and are successfully
combined with GEC systems for enhancement on the final performance.
Furthermore, we conduct an analysis in level of basic approaches, performance
boosting techniques and integrated GEC systems based on their experiment
results respectively for more clear patterns and conclusions. Finally, we
discuss five prospective directions for future GEC researches
Smooth Neighbors on Teacher Graphs for Semi-supervised Learning
The recently proposed self-ensembling methods have achieved promising results
in deep semi-supervised learning, which penalize inconsistent predictions of
unlabeled data under different perturbations. However, they only consider
adding perturbations to each single data point, while ignoring the connections
between data samples. In this paper, we propose a novel method, called Smooth
Neighbors on Teacher Graphs (SNTG). In SNTG, a graph is constructed based on
the predictions of the teacher model, i.e., the implicit self-ensemble of
models. Then the graph serves as a similarity measure with respect to which the
representations of "similar" neighboring points are learned to be smooth on the
low-dimensional manifold. We achieve state-of-the-art results on
semi-supervised learning benchmarks. The error rates are 9.89%, 3.99% for
CIFAR-10 with 4000 labels, SVHN with 500 labels, respectively. In particular,
the improvements are significant when the labels are fewer. For the
non-augmented MNIST with only 20 labels, the error rate is reduced from
previous 4.81% to 1.36%. Our method also shows robustness to noisy labels.Comment: Accept as Spotlight in Computer Vision and Pattern Recognition 201
Controllable Top-down Feature Transformer
We study the intrinsic transformation of feature maps across convolutional
network layers with explicit top-down control. To this end, we develop top-down
feature transformer (TFT), under controllable parameters, that are able to
account for the hidden layer transformation while maintaining the overall
consistency across layers. The learned generators capture the underlying
feature transformation processes that are independent of particular training
images. Our proposed TFT framework brings insights to and helps the
understanding of, an important problem of studying the CNN internal feature
representation and transformation under the top-down processes. In the case of
spatial transformations, we demonstrate the significant advantage of TFT over
existing data-driven approaches in building data-independent transformations.
We also show that it can be adopted in other applications such as data
augmentation and image style transfer
Self-training with Noisy Student improves ImageNet classification
We present Noisy Student Training, a semi-supervised learning approach that
works well even when labeled data is abundant. Noisy Student Training achieves
88.4% top-1 accuracy on ImageNet, which is 2.0% better than the
state-of-the-art model that requires 3.5B weakly labeled Instagram images. On
robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to
83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces
ImageNet-P mean flip rate from 27.8 to 12.2.
Noisy Student Training extends the idea of self-training and distillation
with the use of equal-or-larger student models and noise added to the student
during learning. On ImageNet, we first train an EfficientNet model on labeled
images and use it as a teacher to generate pseudo labels for 300M unlabeled
images. We then train a larger EfficientNet as a student model on the
combination of labeled and pseudo labeled images. We iterate this process by
putting back the student as the teacher. During the learning of the student, we
inject noise such as dropout, stochastic depth, and data augmentation via
RandAugment to the student so that the student generalizes better than the
teacher. Models are available at
https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.
Code is available at https://github.com/google-research/noisystudent.Comment: CVPR 202
Diversity in Machine Learning
Machine learning methods have achieved good performance and been widely
applied in various real-world applications. They can learn the model adaptively
and be better fit for special requirements of different tasks. Generally, a
good machine learning system is composed of plentiful training data, a good
model training process, and an accurate inference. Many factors can affect the
performance of the machine learning process, among which the diversity of the
machine learning process is an important one. The diversity can help each
procedure to guarantee a total good machine learning: diversity of the training
data ensures that the training data can provide more discriminative information
for the model, diversity of the learned model (diversity in parameters of each
model or diversity among different base models) makes each parameter/model
capture unique or complement information and the diversity in inference can
provide multiple choices each of which corresponds to a specific plausible
local optimal result. Even though the diversity plays an important role in
machine learning process, there is no systematical analysis of the
diversification in machine learning system. In this paper, we systematically
summarize the methods to make data diversification, model diversification, and
inference diversification in the machine learning process, respectively. In
addition, the typical applications where the diversity technology improved the
machine learning performance have been surveyed, including the remote sensing
imaging tasks, machine translation, camera relocalization, image segmentation,
object detection, topic modeling, and others. Finally, we discuss some
challenges of the diversity technology in machine learning and point out some
directions in future work.Comment: Accepted by IEEE Acces
Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition
Generative Adversarial Networks (GAN) have attracted much research attention
recently, leading to impressive results for natural image generation. However,
to date little success was observed in using GAN generated images for improving
classification tasks. Here we attempt to explore, in the context of car license
plate recognition, whether it is possible to generate synthetic training data
using GAN to improve recognition accuracy. With a carefully-designed pipeline,
we show that the answer is affirmative. First, a large-scale image set is
generated using the generator of GAN, without manual annotation. Then, these
images are fed to a deep convolutional neural network (DCNN) followed by a
bidirectional recurrent neural network (BRNN) with long short-term memory
(LSTM), which performs the feature learning and sequence labelling. Finally,
the pre-trained model is fine-tuned on real images. Our experimental results on
a few data sets demonstrate the effectiveness of using GAN images: an
improvement of 7.5% over a strong baseline with moderate-sized real data being
available. We show that the proposed framework achieves competitive recognition
accuracy on challenging test datasets. We also leverage the depthwise separate
convolution to construct a lightweight convolutional RNN, which is about half
size and 2x faster on CPU. Combining this framework and the proposed pipeline,
we make progress in performing accurate recognition on mobile and embedded
devices
Billion-scale semi-supervised learning for image classification
This paper presents a study of semi-supervised learning with large
convolutional networks. We propose a pipeline, based on a teacher/student
paradigm, that leverages a large collection of unlabelled images (up to 1
billion). Our main goal is to improve the performance for a given target
architecture, like ResNet-50 or ResNext. We provide an extensive analysis of
the success factors of our approach, which leads us to formulate some
recommendations to produce high-accuracy models for image classification with
semi-supervised learning. As a result, our approach brings important gains to
standard architectures for image, video and fine-grained classification. For
instance, by leveraging one billion unlabelled images, our learned vanilla
ResNet-50 achieves 81.2% top-1 accuracy on the ImageNet benchmark
Generative Adversarial Network in Medical Imaging: A Review
Generative adversarial networks have gained a lot of attention in the
computer vision community due to their capability of data generation without
explicitly modelling the probability density function. The adversarial loss
brought by the discriminator provides a clever way of incorporating unlabeled
samples into training and imposing higher order consistency. This has proven to
be useful in many cases, such as domain adaptation, data augmentation, and
image-to-image translation. These properties have attracted researchers in the
medical imaging community, and we have seen rapid adoption in many traditional
and novel applications, such as image reconstruction, segmentation, detection,
classification, and cross-modality synthesis. Based on our observations, this
trend will continue and we therefore conducted a review of recent advances in
medical imaging using the adversarial training scheme with the hope of
benefiting researchers interested in this technique.Comment: 24 pages; v4; added missing references from before Jan 1st 2019;
accepted to MedI
- …