32 research outputs found
Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling
Training high-quality instance segmentation models requires an abundance of
labeled images with instance masks and classifications, which is often
expensive to procure. Active learning addresses this challenge by striving for
optimum performance with minimal labeling cost by selecting the most
informative and representative images for labeling. Despite its potential,
active learning has been less explored in instance segmentation compared to
other tasks like image classification, which require less labeling. In this
study, we propose a post-hoc active learning algorithm that integrates
uncertainty-based sampling with diversity-based sampling. Our proposed
algorithm is not only simple and easy to implement, but it also delivers
superior performance on various datasets. Its practical application is
demonstrated on a real-world overhead imagery dataset, where it increases the
labeling efficiency fivefold.Comment: UNCV ICCV 202
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Large language models such as GPT-3 have demonstrated an impressive
capability to adapt to new tasks without requiring task-specific training data.
This capability has been particularly effective in settings such as narrative
question answering, where the diversity of tasks is immense, but the available
supervision data is small. In this work, we investigate if such language models
can extend their zero-shot reasoning abilities to long multimodal narratives in
multimedia content such as drama, movies, and animation, where the story plays
an essential role. We propose Long Story Short, a framework for narrative video
QA that first summarizes the narrative of the video to a short plot and then
searches parts of the video relevant to the question. We also propose to
enhance visual matching with CLIPCheck. Our model outperforms state-of-the-art
supervised models by a large margin, highlighting the potential of zero-shot QA
for long videos.Comment: Published in BMVC 202
That's BAD: Blind Anomaly Detection by Implicit Local Feature Clustering
Recent studies on visual anomaly detection (AD) of industrial
objects/textures have achieved quite good performance. They consider an
unsupervised setting, specifically the one-class setting, in which we assume
the availability of a set of normal (\textit{i.e.}, anomaly-free) images for
training. In this paper, we consider a more challenging scenario of
unsupervised AD, in which we detect anomalies in a given set of images that
might contain both normal and anomalous samples. The setting does not assume
the availability of known normal data and thus is completely free from human
annotation, which differs from the standard AD considered in recent studies.
For clarity, we call the setting blind anomaly detection (BAD). We show that
BAD can be converted into a local outlier detection problem and propose a novel
method named PatchCluster that can accurately detect image- and pixel-level
anomalies. Experimental results show that PatchCluster shows a promising
performance without the knowledge of normal data, even comparable to the SOTA
methods applied in the one-class setting needing it
CoNAN: Conditional Neural Aggregation Network For Unconstrained Face Feature Fusion
Face recognition from image sets acquired under unregulated and uncontrolled
settings, such as at large distances, low resolutions, varying viewpoints,
illumination, pose, and atmospheric conditions, is challenging. Face feature
aggregation, which involves aggregating a set of N feature representations
present in a template into a single global representation, plays a pivotal role
in such recognition systems. Existing works in traditional face feature
aggregation either utilize metadata or high-dimensional intermediate feature
representations to estimate feature quality for aggregation. However,
generating high-quality metadata or style information is not feasible for
extremely low-resolution faces captured in long-range and high altitude
settings. To overcome these limitations, we propose a feature distribution
conditioning approach called CoNAN for template aggregation. Specifically, our
method aims to learn a context vector conditioned over the distribution
information of the incoming feature set, which is utilized to weigh the
features based on their estimated informativeness. The proposed method produces
state-of-the-art results on long-range unconstrained face recognition datasets
such as BTS, and DroneSURF, validating the advantages of such an aggregation
strategy.Comment: Paper accepted at IJCB 202
SimSwap: An Efficient Framework For High Fidelity Face Swapping
We propose an efficient framework, called Simple Swap (SimSwap), aiming for
generalized and high fidelity face swapping. In contrast to previous approaches
that either lack the ability to generalize to arbitrary identity or fail to
preserve attributes like facial expression and gaze direction, our framework is
capable of transferring the identity of an arbitrary source face into an
arbitrary target face while preserving the attributes of the target face. We
overcome the above defects in the following two ways. First, we present the ID
Injection Module (IIM) which transfers the identity information of the source
face into the target face at feature level. By using this module, we extend the
architecture of an identity-specific face swapping algorithm to a framework for
arbitrary face swapping. Second, we propose the Weak Feature Matching Loss
which efficiently helps our framework to preserve the facial attributes in an
implicit way. Extensive experiments on wild faces demonstrate that our SimSwap
is able to achieve competitive identity performance while preserving attributes
better than previous state-of-the-art methods. The code is already available on
github: https://github.com/neuralchen/SimSwap.Comment: Accepted by ACMMM 202
CCFace: Classification Consistency for Low-Resolution Face Recognition
In recent years, deep face recognition methods have demonstrated impressive
results on in-the-wild datasets. However, these methods have shown a
significant decline in performance when applied to real-world low-resolution
benchmarks like TinyFace or SCFace. To address this challenge, we propose a
novel classification consistency knowledge distillation approach that transfers
the learned classifier from a high-resolution model to a low-resolution
network. This approach helps in finding discriminative representations for
low-resolution instances. To further improve the performance, we designed a
knowledge distillation loss using the adaptive angular penalty inspired by the
success of the popular angular margin loss function. The adaptive penalty
reduces overfitting on low-resolution samples and alleviates the convergence
issue of the model integrated with data augmentation. Additionally, we utilize
an asymmetric cross-resolution learning approach based on the state-of-the-art
semi-supervised representation learning paradigm to improve discriminability on
low-resolution instances and prevent them from forming a cluster. Our proposed
method outperforms state-of-the-art approaches on low-resolution benchmarks,
with a three percent improvement on TinyFace while maintaining performance on
high-resolution benchmarks.Comment: 2023 IEEE International Joint Conference on Biometrics (IJCB
SATR: Zero-Shot Semantic Segmentation of 3D Shapes
We explore the task of zero-shot semantic segmentation of 3D shapes by using
large-scale off-the-shelf 2D image recognition models. Surprisingly, we find
that modern zero-shot 2D object detectors are better suited for this task than
contemporary text/image similarity predictors or even zero-shot 2D segmentation
networks. Our key finding is that it is possible to extract accurate 3D
segmentation maps from multi-view bounding box predictions by using the
topological properties of the underlying surface. For this, we develop the
Segmentation Assignment with Topological Reweighting (SATR) algorithm and
evaluate it on two challenging benchmarks: FAUST and ShapeNetPart. On these
datasets, SATR achieves state-of-the-art performance and outperforms prior work
by at least 22\% on average in terms of mIoU. Our source code and data will be
publicly released. Project webpage: https://samir55.github.io/SATR/Comment: Project webpage: https://samir55.github.io/SATR
Brainomaly: Unsupervised Neurologic Disease Detection Utilizing Unannotated T1-weighted Brain MR Images
Harnessing the power of deep neural networks in the medical imaging domain is
challenging due to the difficulties in acquiring large annotated datasets,
especially for rare diseases, which involve high costs, time, and effort for
annotation. Unsupervised disease detection methods, such as anomaly detection,
can significantly reduce human effort in these scenarios. While anomaly
detection typically focuses on learning from images of healthy subjects only,
real-world situations often present unannotated datasets with a mixture of
healthy and diseased subjects. Recent studies have demonstrated that utilizing
such unannotated images can improve unsupervised disease and anomaly detection.
However, these methods do not utilize knowledge specific to registered
neuroimages, resulting in a subpar performance in neurologic disease detection.
To address this limitation, we propose Brainomaly, a GAN-based image-to-image
translation method specifically designed for neurologic disease detection.
Brainomaly not only offers tailored image-to-image translation suitable for
neuroimages but also leverages unannotated mixed images to achieve superior
neurologic disease detection. Additionally, we address the issue of model
selection for inference without annotated samples by proposing a pseudo-AUC
metric, further enhancing Brainomaly's detection performance. Extensive
experiments and ablation studies demonstrate that Brainomaly outperforms
existing state-of-the-art unsupervised disease and anomaly detection methods by
significant margins in Alzheimer's disease detection using a publicly available
dataset and headache detection using an institutional dataset. The code is
available from https://github.com/mahfuzmohammad/Brainomaly.Comment: Accepted in WACV 202
Prior based Sampling for Adaptive LiDAR
We propose SampleDepth, a Convolutional Neural Network (CNN), that is suited
for an adaptive LiDAR. Typically,LiDAR sampling strategy is pre-defined,
constant and independent of the observed scene. Instead of letting a LiDAR
sample the scene in this agnostic fashion, SampleDepth determines, adaptively,
where it is best to sample the current frame.To do that, SampleDepth uses depth
samples from previous time steps to predict a sampling mask for the current
frame. Crucially, SampleDepth is trained to optimize the performance of a depth
completion downstream task. SampleDepth is evaluated on two different depth
completion networks and two LiDAR datasets, KITTI Depth Completion and the
newly introduced synthetic dataset, SHIFT. We show that SampleDepth is
effective and suitable for different depth completion downstream tasks
SelFormaly: Towards Task-Agnostic Unified Anomaly Detection
The core idea of visual anomaly detection is to learn the normality from
normal images, but previous works have been developed specifically for certain
tasks, leading to fragmentation among various tasks: defect detection, semantic
anomaly detection, multi-class anomaly detection, and anomaly clustering. This
one-task-one-model approach is resource-intensive and incurs high maintenance
costs as the number of tasks increases. This paper presents SelFormaly, a
universal and powerful anomaly detection framework. We emphasize the necessity
of our off-the-shelf approach by pointing out a suboptimal issue with
fluctuating performance in previous online encoder-based methods. In addition,
we question the effectiveness of using ConvNets as previously employed in the
literature and confirm that self-supervised ViTs are suitable for unified
anomaly detection. We introduce back-patch masking and discover the new role of
top k-ratio feature matching to achieve unified and powerful anomaly detection.
Back-patch masking eliminates irrelevant regions that possibly hinder
target-centric detection with representations of the scene layout. The top
k-ratio feature matching unifies various anomaly levels and tasks. Finally,
SelFormaly achieves state-of-the-art results across various datasets for all
the aforementioned tasks.Comment: 11 pages, 7 figure