3,250 research outputs found
IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters
In this work, we study robust deep learning against abnormal training data
from the perspective of example weighting built in empirical loss functions,
i.e., gradient magnitude with respect to logits, an angle that is not
thoroughly studied so far. Consequently, we have two key findings: (1) Mean
Absolute Error (MAE) Does Not Treat Examples Equally. We present new
observations and insightful analysis about MAE, which is theoretically proved
to be noise-robust. First, we reveal its underfitting problem in practice.
Second, we analyse that MAE's noise-robustness is from emphasising on uncertain
examples instead of treating training samples equally, as claimed in prior
work. (2) The Variance of Gradient Magnitude Matters. We propose an effective
and simple solution to enhance MAE's fitting ability while preserving its
noise-robustness. Without changing MAE's overall weighting scheme, i.e., what
examples get higher weights, we simply change its weighting variance
non-linearly so that the impact ratio between two examples are adjusted. Our
solution is termed Improved MAE (IMAE). We prove IMAE's effectiveness using
extensive experiments: image classification under clean labels, synthetic label
noise, and real-world unknown noise. We conclude IMAE is superior to CCE, the
most popular loss for training DNNs.Comment: Updated Version. IMAE for Noise-Robust Learning: Mean Absolute Error
Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters
Code:
\url{https://github.com/XinshaoAmosWang/Improving-Mean-Absolute-Error-against-CCE}.
Please feel free to contact for discussions or implementation problem
Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID
Systems for person re-identification (ReID) can achieve a high accuracy when
trained on large fully-labeled image datasets. However, the domain shift
typically associated with diverse operational capture conditions (e.g., camera
viewpoints and lighting) may translate to a significant decline in performance.
This paper focuses on unsupervised domain adaptation (UDA) for video-based ReID
- a relevant scenario that is less explored in the literature. In this
scenario, the ReID model must adapt to a complex target domain defined by a
network of diverse video cameras based on tracklet information. State-of-art
methods cluster unlabeled target data, yet domain shifts across target cameras
(sub-domains) can lead to poor initialization of clustering methods that
propagates noise across epochs, thus preventing the ReID model to accurately
associate samples of same identity. In this paper, an UDA method is introduced
for video person ReID that leverages knowledge on video tracklets, and on the
distribution of frames captured over target cameras to improve the performance
of CNN backbones trained using pseudo-labels. Our method relies on an
adversarial approach, where a camera-discriminator network is introduced to
extract discriminant camera-independent representations, facilitating the
subsequent clustering. In addition, a weighted contrastive loss is proposed to
leverage the confidence of clusters, and mitigate the risk of incorrect
identity associations. Experimental results obtained on three challenging
video-based person ReID datasets - PRID2011, iLIDS-VID, and MARS - indicate
that our proposed method can outperform related state-of-the-art methods. Our
code is available at: \url{https://github.com/dmekhazni/CAWCL-ReID}Comment: IEEE/CVF Winter Conference on Applications of Computer Vision(WACV)
202
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
Self-paced Weight Consolidation for Continual Learning
Continual learning algorithms which keep the parameters of new tasks close to
that of previous tasks, are popular in preventing catastrophic forgetting in
sequential task learning settings. However, 1) the performance for the new
continual learner will be degraded without distinguishing the contributions of
previously learned tasks; 2) the computational cost will be greatly increased
with the number of tasks, since most existing algorithms need to regularize all
previous tasks when learning new tasks. To address the above challenges, we
propose a self-paced Weight Consolidation (spWC) framework to attain robust
continual learning via evaluating the discriminative contributions of previous
tasks. To be specific, we develop a self-paced regularization to reflect the
priorities of past tasks via measuring difficulty based on key performance
indicator (i.e., accuracy). When encountering a new task, all previous tasks
are sorted from "difficult" to "easy" based on the priorities. Then the
parameters of the new continual learner will be learned via selectively
maintaining the knowledge amongst more difficult past tasks, which could well
overcome catastrophic forgetting with less computational cost. We adopt an
alternative convex search to iteratively update the model parameters and
priority weights in the bi-convex formulation. The proposed spWC framework is
plug-and-play, which is applicable to most continual learning algorithms (e.g.,
EWC, MAS and RCIL) in different directions (e.g., classification and
segmentation). Experimental results on several public benchmark datasets
demonstrate that our proposed framework can effectively improve performance
when compared with other popular continual learning algorithms
Visual object category discovery in images and videos
textThe current trend in visual recognition research is to place a strict division between the supervised and unsupervised learning paradigms, which is problematic for two main reasons. On the one hand, supervised methods require training data for each and every category that the system learns; training data may not always be available and is expensive to obtain. On the other hand, unsupervised methods must determine the optimal visual cues and distance metrics that distinguish one category from another to group images into semantically meaningful categories; however, for unlabeled data, these are unknown a priori.
I propose a visual category discovery framework that transcends the two paradigms and learns accurate models with few labeled exemplars. The main insight is to automatically focus on the prevalent objects in images and videos, and learn models from them for category grouping, segmentation, and summarization.
To implement this idea, I first present a context-aware category discovery framework that discovers novel categories by leveraging context from previously learned categories. I devise a novel object-graph descriptor to model the interaction between a set of known categories and the unknown to-be-discovered categories, and group regions that have similar appearance and similar object-graphs. I then present a collective segmentation framework that simultaneously discovers the segmentations and groupings of objects by leveraging the shared patterns in the unlabeled image collection. It discovers an ensemble of representative instances for each unknown category, and builds top-down models from them to refine the segmentation of the remaining instances. Finally, building on these techniques, I show how to produce compact visual summaries for first-person egocentric videos that focus on the important people and objects. The system leverages novel egocentric and high-level saliency features to predict important regions in the video, and produces a concise visual summary that is driven by those regions.
I compare against existing state-of-the-art methods for category discovery and segmentation on several challenging benchmark datasets. I demonstrate that we can discover visual concepts more accurately by focusing on the prevalent objects in images and videos, and show clear advantages of departing from the status quo division between the supervised and unsupervised learning paradigms. The main impact of my thesis is that it lays the groundwork for building large-scale visual discovery systems that can automatically discover visual concepts with minimal human supervision.Electrical and Computer Engineerin
- …