31 research outputs found
Cross-resolution Face Recognition via Identity-Preserving Network and Knowledge Distillation
Cross-resolution face recognition has become a challenging problem for modern
deep face recognition systems. It aims at matching a low-resolution probe image
with high-resolution gallery images registered in a database. Existing methods
mainly leverage prior information from high-resolution images by either
reconstructing facial details with super-resolution techniques or learning a
unified feature space. To address this challenge, this paper proposes a new
approach that enforces the network to focus on the discriminative information
stored in the low-frequency components of a low-resolution image. A
cross-resolution knowledge distillation paradigm is first employed as the
learning framework. Then, an identity-preserving network, WaveResNet, and a
wavelet similarity loss are designed to capture low-frequency details and boost
performance. Finally, an image degradation model is conceived to simulate more
realistic low-resolution training data. Consequently, extensive experimental
results show that the proposed method consistently outperforms the baseline
model and other state-of-the-art methods across a variety of image resolutions
Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling
Training high-quality instance segmentation models requires an abundance of
labeled images with instance masks and classifications, which is often
expensive to procure. Active learning addresses this challenge by striving for
optimum performance with minimal labeling cost by selecting the most
informative and representative images for labeling. Despite its potential,
active learning has been less explored in instance segmentation compared to
other tasks like image classification, which require less labeling. In this
study, we propose a post-hoc active learning algorithm that integrates
uncertainty-based sampling with diversity-based sampling. Our proposed
algorithm is not only simple and easy to implement, but it also delivers
superior performance on various datasets. Its practical application is
demonstrated on a real-world overhead imagery dataset, where it increases the
labeling efficiency fivefold.Comment: UNCV ICCV 202
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Large language models such as GPT-3 have demonstrated an impressive
capability to adapt to new tasks without requiring task-specific training data.
This capability has been particularly effective in settings such as narrative
question answering, where the diversity of tasks is immense, but the available
supervision data is small. In this work, we investigate if such language models
can extend their zero-shot reasoning abilities to long multimodal narratives in
multimedia content such as drama, movies, and animation, where the story plays
an essential role. We propose Long Story Short, a framework for narrative video
QA that first summarizes the narrative of the video to a short plot and then
searches parts of the video relevant to the question. We also propose to
enhance visual matching with CLIPCheck. Our model outperforms state-of-the-art
supervised models by a large margin, highlighting the potential of zero-shot QA
for long videos.Comment: Published in BMVC 202
CoNAN: Conditional Neural Aggregation Network For Unconstrained Face Feature Fusion
Face recognition from image sets acquired under unregulated and uncontrolled
settings, such as at large distances, low resolutions, varying viewpoints,
illumination, pose, and atmospheric conditions, is challenging. Face feature
aggregation, which involves aggregating a set of N feature representations
present in a template into a single global representation, plays a pivotal role
in such recognition systems. Existing works in traditional face feature
aggregation either utilize metadata or high-dimensional intermediate feature
representations to estimate feature quality for aggregation. However,
generating high-quality metadata or style information is not feasible for
extremely low-resolution faces captured in long-range and high altitude
settings. To overcome these limitations, we propose a feature distribution
conditioning approach called CoNAN for template aggregation. Specifically, our
method aims to learn a context vector conditioned over the distribution
information of the incoming feature set, which is utilized to weigh the
features based on their estimated informativeness. The proposed method produces
state-of-the-art results on long-range unconstrained face recognition datasets
such as BTS, and DroneSURF, validating the advantages of such an aggregation
strategy.Comment: Paper accepted at IJCB 202
CCFace: Classification Consistency for Low-Resolution Face Recognition
In recent years, deep face recognition methods have demonstrated impressive
results on in-the-wild datasets. However, these methods have shown a
significant decline in performance when applied to real-world low-resolution
benchmarks like TinyFace or SCFace. To address this challenge, we propose a
novel classification consistency knowledge distillation approach that transfers
the learned classifier from a high-resolution model to a low-resolution
network. This approach helps in finding discriminative representations for
low-resolution instances. To further improve the performance, we designed a
knowledge distillation loss using the adaptive angular penalty inspired by the
success of the popular angular margin loss function. The adaptive penalty
reduces overfitting on low-resolution samples and alleviates the convergence
issue of the model integrated with data augmentation. Additionally, we utilize
an asymmetric cross-resolution learning approach based on the state-of-the-art
semi-supervised representation learning paradigm to improve discriminability on
low-resolution instances and prevent them from forming a cluster. Our proposed
method outperforms state-of-the-art approaches on low-resolution benchmarks,
with a three percent improvement on TinyFace while maintaining performance on
high-resolution benchmarks.Comment: 2023 IEEE International Joint Conference on Biometrics (IJCB
AnoDODE: Anomaly Detection with Diffusion ODE
Anomaly detection is the process of identifying atypical data samples that
significantly deviate from the majority of the dataset. In the realm of
clinical screening and diagnosis, detecting abnormalities in medical images
holds great importance. Typically, clinical practice provides access to a vast
collection of normal images, while abnormal images are relatively scarce. We
hypothesize that abnormal images and their associated features tend to manifest
in low-density regions of the data distribution. Following this assumption, we
turn to diffusion ODEs for unsupervised anomaly detection, given their
tractability and superior performance in density estimation tasks. More
precisely, we propose a new anomaly detection method based on diffusion ODEs by
estimating the density of features extracted from multi-scale medical images.
Our anomaly scoring mechanism depends on computing the negative log-likelihood
of features extracted from medical images at different scales, quantified in
bits per dimension. Furthermore, we propose a reconstruction-based anomaly
localization suitable for our method. Our proposed method not only identifie
anomalies but also provides interpretability at both the image and pixel
levels. Through experiments on the BraTS2021 medical dataset, our proposed
method outperforms existing methods. These results confirm the effectiveness
and robustness of our method.Comment: 11 pages, 5 figure
SimSwap: An Efficient Framework For High Fidelity Face Swapping
We propose an efficient framework, called Simple Swap (SimSwap), aiming for
generalized and high fidelity face swapping. In contrast to previous approaches
that either lack the ability to generalize to arbitrary identity or fail to
preserve attributes like facial expression and gaze direction, our framework is
capable of transferring the identity of an arbitrary source face into an
arbitrary target face while preserving the attributes of the target face. We
overcome the above defects in the following two ways. First, we present the ID
Injection Module (IIM) which transfers the identity information of the source
face into the target face at feature level. By using this module, we extend the
architecture of an identity-specific face swapping algorithm to a framework for
arbitrary face swapping. Second, we propose the Weak Feature Matching Loss
which efficiently helps our framework to preserve the facial attributes in an
implicit way. Extensive experiments on wild faces demonstrate that our SimSwap
is able to achieve competitive identity performance while preserving attributes
better than previous state-of-the-art methods. The code is already available on
github: https://github.com/neuralchen/SimSwap.Comment: Accepted by ACMMM 202
Pattern Anomaly Detection based on Sequence-to-Sequence Regularity Learning
Anomaly detection in traffic surveillance videos is a challenging task due to the ambiguity of anomaly definition and the complexity of scenes. In this paper, we propose to detect anomalous trajectories for vehicle behavior analysis via learning regularities in data. First, we train a sequence-to-sequence model under the autoencoder architecture and propose a new reconstruction error function for model optimization and anomaly evaluation. As such, the model is forced to learn the regular trajectory patterns in an unsupervised manner. Then, at the inference stage, we use the learned model to encode the test trajectory sample into a compact representation and generate a new trajectory sequence in the learned regular pattern. An anomaly score is computed based on the deviation of the generated trajectory from the test sample. Finally, we can find out the anomalous trajectories with an adaptive threshold. We evaluate the proposed method on two real-world traffic datasets and the experiments show favorable results against state-of-the-art algorithms. This paper\u27s research on sequence-to-sequence regularity learning can provide theoretical and practical support for pattern anomaly detection