1,027 research outputs found
On the Generalization Effects of Linear Transformations in Data Augmentation
Data augmentation is a powerful technique to improve performance in
applications such as image and text classification tasks. Yet, there is little
rigorous understanding of why and how various augmentations work. In this work,
we consider a family of linear transformations and study their effects on the
ridge estimator in an over-parametrized linear regression setting. First, we
show that transformations which preserve the labels of the data can improve
estimation by enlarging the span of the training data. Second, we show that
transformations which mix data can improve estimation by playing a
regularization effect. Finally, we validate our theoretical insights on MNIST.
Based on the insights, we propose an augmentation scheme that searches over the
space of transformations by how uncertain the model is about the transformed
data. We validate our proposed scheme on image and text datasets. For example,
our method outperforms RandAugment by 1.24% on CIFAR-100 using
Wide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTA
Adversarial AutoAugment on CIFAR datasets.Comment: International Conference on Machine learning (ICML) 2020. Added
experimental results on ImageNe
Learning Compositional Visual Concepts with Mutual Consistency
Compositionality of semantic concepts in image synthesis and analysis is
appealing as it can help in decomposing known and generatively recomposing
unknown data. For instance, we may learn concepts of changing illumination,
geometry or albedo of a scene, and try to recombine them to generate physically
meaningful, but unseen data for training and testing. In practice however we
often do not have samples from the joint concept space available: We may have
data on illumination change in one data set and on geometric change in another
one without complete overlap. We pose the following question: How can we learn
two or more concepts jointly from different data sets with mutual consistency
where we do not have samples from the full joint space? We present a novel
answer in this paper based on cyclic consistency over multiple concepts,
represented individually by generative adversarial networks (GANs). Our method,
ConceptGAN, can be understood as a drop in for data augmentation to improve
resilience for real world applications. Qualitative and quantitative
evaluations demonstrate its efficacy in generating semantically meaningful
images, as well as one shot face verification as an example application.Comment: 10 pages, 8 figures, 4 tables, CVPR 201
PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking
Estimating the relative pose of a new object without prior knowledge is a
hard problem, while it is an ability very much needed in robotics and Augmented
Reality. We present a method for tracking the 6D motion of objects in RGB video
sequences when neither the training images nor the 3D geometry of the objects
are available. In contrast to previous works, our method can therefore consider
unknown objects in open world instantly, without requiring any prior
information or a specific training phase. We consider two architectures, one
based on two frames, and the other relying on a Transformer Encoder, which can
exploit an arbitrary number of past frames. We train our architectures using
only synthetic renderings with domain randomization. Our results on challenging
datasets are on par with previous works that require much more information
(training images of the target objects, 3D models, and/or depth data). Our
source code is available at https://github.com/nv-nguyen/pizzaComment: 3DV Ora
Deep-LK for Efficient Adaptive Object Tracking
In this paper we present a new approach for efficient regression based object
tracking which we refer to as Deep- LK. Our approach is closely related to the
Generic Object Tracking Using Regression Networks (GOTURN) framework of Held et
al. We make the following contributions. First, we demonstrate that there is a
theoretical relationship between siamese regression networks like GOTURN and
the classical Inverse-Compositional Lucas & Kanade (IC-LK) algorithm. Further,
we demonstrate that unlike GOTURN IC-LK adapts its regressor to the appearance
of the currently tracked frame. We argue that this missing property in GOTURN
can be attributed to its poor performance on unseen objects and/or viewpoints.
Second, we propose a novel framework for object tracking - which we refer to as
Deep-LK - that is inspired by the IC-LK framework. Finally, we show impressive
results demonstrating that Deep-LK substantially outperforms GOTURN.
Additionally, we demonstrate comparable tracking performance to current state
of the art deep-trackers whilst being an order of magnitude (i.e. 100 FPS)
computationally efficient
Time-Efficient Hybrid Approach for Facial Expression Recognition
Facial expression recognition is an emerging research area for improving human and computer interaction. This research plays a significant role in the field of social communication, commercial enterprise, law enforcement, and other computer interactions. In this paper, we propose a time-efficient hybrid design for facial expression recognition, combining image pre-processing steps and different Convolutional Neural Network (CNN) structures providing better accuracy and greatly improved training time. We are predicting seven basic emotions of human faces: sadness, happiness, disgust, anger, fear, surprise and neutral. The model performs well regarding challenging facial expression recognition where the emotion expressed could be one of several due to their quite similar facial characteristics such as anger, disgust, and sadness. The experiment to test the model was conducted across multiple databases and different facial orientations, and to the best of our knowledge, the model provided an accuracy of about 89.58% for KDEF dataset, 100% accuracy for JAFFE dataset and 71.975% accuracy for combined (KDEF + JAFFE + SFEW) dataset across these different scenarios. Performance evaluation was done by cross-validation techniques to avoid bias towards a specific set of images from a database
Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data
The growing interest in language-conditioned robot manipulation aims to
develop robots capable of understanding and executing complex tasks, with the
objective of enabling robots to interpret language commands and manipulate
objects accordingly. While language-conditioned approaches demonstrate
impressive capabilities for addressing tasks in familiar environments, they
encounter limitations in adapting to unfamiliar environment settings. In this
study, we propose a general-purpose, language-conditioned approach that
combines base skill priors and imitation learning under unstructured data to
enhance the algorithm's generalization in adapting to unfamiliar environments.
We assess our model's performance in both simulated and real-world environments
using a zero-shot setting. In the simulated environment, the proposed approach
surpasses previously reported scores for CALVIN benchmark, especially in the
challenging Zero-Shot Multi-Environment setting. The average completed task
length, indicating the average number of tasks the agent can continuously
complete, improves more than 2.5 times compared to the state-of-the-art method
HULC. In addition, we conduct a zero-shot evaluation of our policy in a
real-world setting, following training exclusively in simulated environments
without additional specific adaptations. In this evaluation, we set up ten
tasks and achieved an average 30% improvement in our approach compared to the
current state-of-the-art approach, demonstrating a high generalization
capability in both simulated environments and the real world. For further
details, including access to our code and videos, please refer to our
supplementary materials
Visualizing and Understanding Convolutional Networks
Large Convolutional Network models have recently demonstrated impressive
classification performance on the ImageNet benchmark. However there is no clear
understanding of why they perform so well, or how they might be improved. In
this paper we address both issues. We introduce a novel visualization technique
that gives insight into the function of intermediate feature layers and the
operation of the classifier. We also perform an ablation study to discover the
performance contribution from different model layers. This enables us to find
model architectures that outperform Krizhevsky \etal on the ImageNet
classification benchmark. We show our ImageNet model generalizes well to other
datasets: when the softmax classifier is retrained, it convincingly beats the
current state-of-the-art results on Caltech-101 and Caltech-256 datasets
Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning
We present a novel approach to address the challenge of generalization in
offline reinforcement learning (RL), where the agent learns from a fixed
dataset without any additional interaction with the environment. Specifically,
we aim to improve the agent's ability to generalize to out-of-distribution
goals. To achieve this, we propose to learn a dynamics model and check if it is
equivariant with respect to a fixed type of transformation, namely translations
in the state space. We then use an entropy regularizer to increase the
equivariant set and augment the dataset with the resulting transformed samples.
Finally, we learn a new policy offline based on the augmented dataset, with an
off-the-shelf offline RL algorithm. Our experimental results demonstrate that
our approach can greatly improve the test performance of the policy on the
considered environments
- …