1,198 research outputs found
Robustness of 3D Deep Learning in an Adversarial Setting
Understanding the spatial arrangement and nature of real-world objects is of
paramount importance to many complex engineering tasks, including autonomous
navigation. Deep learning has revolutionized state-of-the-art performance for
tasks in 3D environments; however, relatively little is known about the
robustness of these approaches in an adversarial setting. The lack of
comprehensive analysis makes it difficult to justify deployment of 3D deep
learning models in real-world, safety-critical applications. In this work, we
develop an algorithm for analysis of pointwise robustness of neural networks
that operate on 3D data. We show that current approaches presented for
understanding the resilience of state-of-the-art models vastly overestimate
their robustness. We then use our algorithm to evaluate an array of
state-of-the-art models in order to demonstrate their vulnerability to
occlusion attacks. We show that, in the worst case, these networks can be
reduced to 0% classification accuracy after the occlusion of at most 6.5% of
the occupied input space.Comment: 10 pages, 8 figures, 1 tabl
Iterative Multiple Bounding-Box Refinements for Visual Tracking
Single-object visual tracking aims at locating a target in each video frame by predicting the bounding box of the object. Recent approaches have adopted iterative procedures to gradually refine the bounding box and locate the target in the image. In such approaches, the deep model takes as input the image patch corresponding to the currently estimated target bounding box, and provides as output the probability associated with each of the possible bounding box refinements, generally defined as a discrete set of linear transformations of the bounding box center and size. At each iteration, only one transformation is applied, and supervised training of the model may introduce an inherent ambiguity by giving importance priority to some transformations over the others. This paper proposes a novel formulation of the problem of selecting the bounding box refinement. It introduces the concept of non-conflicting transformations and allows applying multiple refinements to the target bounding box at each iteration without introducing ambiguities during learning of the model parameters. Empirical results demonstrate that the proposed approach improves the iterative single refinement in terms of accuracy and precision of the tracking results
IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection
Change detection (CD) aims to detect change regions within an image pair
captured at different times, playing a significant role for diverse real-world
applications. Nevertheless, most of existing works focus on designing advanced
network architectures to map the feature difference to the final change map
while ignoring the influence of the quality of the feature difference. In this
paper, we study the CD from a new perspective, i.e., how to optimize the
feature difference to highlight changes and suppress unchanged regions, and
propose a novel module denoted as iterative difference-enhanced transformers
(IDET). IDET contains three transformers: two transformers for extracting the
long-range information of the two images and one transformer for enhancing the
feature difference. In contrast to the previous transformers, the third
transformer takes the outputs of the first two transformers to guide the
enhancement of the feature difference iteratively. To achieve more effective
refinement, we further propose the multi-scale IDET-based change detection that
uses multi-scale representations of the images for multiple feature difference
refinements and proposes a coarse-to-fine fusion strategy to combine all
refinements. Our final CD method outperforms seven state-of-the-art methods on
six large-scale datasets under diverse application scenarios, which
demonstrates the importance of feature difference enhancements and the
effectiveness of IDET.Comment: conferenc
Deep learning techniques for visual object tracking
Visual object tracking plays a crucial role in various vision systems, including biometric analysis, medical imaging, smart traffic systems, and video surveillance. Despite notable advancements in visual object tracking over the past few decades, many tracking algorithms still face challenges due to factors like illumination changes, deformation, and scale variations.
This thesis is divided into three parts. The first part introduces the visual object tracking problem and discusses the traditional approaches that have been used to study it. We then propose a novel method called Tracking by Iterative Multi-Refinements, which addresses the issue of locating the target by redefining the search for the ideal bounding box. This method utilizes an iterative process to forecast a sequence of bounding box adjustments, enabling the tracking algorithm to handle multiple non-conflicting transformations simultaneously. As a result, it achieves faster tracking and can handle a higher number of composite transformations.
In the second part of this thesis we explore the application of reinforcement learning (RL) to visual tracking. Presenting a general RL framework applicable to problems that require a sequence of decisions. We discuss various families of popular RL approaches, including value-based methods, policy gradient approaches, and Actor-Critic Methods. Furthermore, we delve into the application of RL to visual tracking, where an RL agent predicts the target's location, selects hyperparameters, correlation filters, or target appearance. A comprehensive comparison of these approaches is provided, along with a taxonomy of state-of-the-art methods.
The third part presents a novel method that addresses the need for online tuning of offline-trained tracking models. Typically, offline-trained models, whether through supervised learning or reinforcement learning, require additional tuning during online tracking to achieve optimal performance. The duration of this tuning process depends on the number of layers that need training for the new target. However, our thesis proposes a pioneering approach that expedites the training of convolutional neural networks (CNNs) while preserving their high performance levels.
In summary, this thesis extensively explores the area of visual object tracking and its related domains, covering traditional approaches, novel methodologies like Tracking by Iterative Multi-Refinements, the application of reinforcement learning, and a pioneering method for accelerating CNN training. By addressing the challenges faced by existing tracking algorithms, this research aims to advance the field of visual object tracking and contributes to the development of more robust and efficient tracking systems
Transformer-based Multi-Instance Learning for Weakly Supervised Object Detection
Weakly Supervised Object Detection (WSOD) enables the training of object
detection models using only image-level annotations. State-of-the-art WSOD
detectors commonly rely on multi-instance learning (MIL) as the backbone of
their detectors and assume that the bounding box proposals of an image are
independent of each other. However, since such approaches only utilize the
highest score proposal and discard the potentially useful information from
other proposals, their independent MIL backbone often limits models to salient
parts of an object or causes them to detect only one object per class. To solve
the above problems, we propose a novel backbone for WSOD based on our tailored
Vision Transformer named Weakly Supervised Transformer Detection Network
(WSTDN). Our algorithm is not only the first to demonstrate that self-attention
modules that consider inter-instance relationships are effective backbones for
WSOD, but also we introduce a novel bounding box mining method (BBM) integrated
with a memory transfer refinement (MTR) procedure to utilize the instance
dependencies for facilitating instance refinements. Experimental results on
PASCAL VOC2007 and VOC2012 benchmarks demonstrate the effectiveness of our
proposed WSTDN and modified instance refinement modules
- …