781 research outputs found
Improved Region Proposal Network for Enhanced Few-Shot Object Detection
Despite significant success of deep learning in object detection tasks, the
standard training of deep neural networks requires access to a substantial
quantity of annotated images across all classes. Data annotation is an arduous
and time-consuming endeavor, particularly when dealing with infrequent objects.
Few-shot object detection (FSOD) methods have emerged as a solution to the
limitations of classic object detection approaches based on deep learning. FSOD
methods demonstrate remarkable performance by achieving robust object detection
using a significantly smaller amount of training data. A challenge for FSOD is
that instances from novel classes that do not belong to the fixed set of
training classes appear in the background and the base model may pick them up
as potential objects. These objects behave similarly to label noise because
they are classified as one of the training dataset classes, leading to FSOD
performance degradation. We develop a semi-supervised algorithm to detect and
then utilize these unlabeled novel objects as positive samples during the FSOD
training stage to improve FSOD performance. Specifically, we develop a
hierarchical ternary classification region proposal network (HTRPN) to localize
the potential unlabeled novel objects and assign them new objectness labels to
distinguish these objects from the base training dataset classes. Our improved
hierarchical sampling strategy for the region proposal network (RPN) also
boosts the perception ability of the object detection model for large objects.
We test our approach and COCO and PASCAL VOC baselines that are commonly used
in FSOD literature. Our experimental results indicate that our method is
effective and outperforms the existing state-of-the-art (SOTA) FSOD methods.
Our implementation is provided as a supplement to support reproducibility of
the results.Comment: arXiv admin note: substantial text overlap with arXiv:2303.1042
Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models
Most existing cross-modal generative methods based on diffusion models use
guidance to provide control over the latent space to enable conditional
generation across different modalities. Such methods focus on providing
guidance through separately-trained models, each for one modality. As a result,
these methods suffer from cross-modal information loss and are limited to
unidirectional conditional generation. Inspired by how humans synchronously
acquire multi-modal information and learn the correlation between modalities,
we explore a multi-modal diffusion model training and sampling scheme that uses
channel-wise image conditioning to learn cross-modality correlation during the
training phase to better mimic the learning process in the brain. Our empirical
results demonstrate that our approach can achieve data generation conditioned
on all correlated modalities
Cognitively Inspired Learning of Incremental Drifting Concepts
Humans continually expand their learned knowledge to new domains and learn
new concepts without any interference with past learned experiences. In
contrast, machine learning models perform poorly in a continual learning
setting, where input data distribution changes over time. Inspired by the
nervous system learning mechanisms, we develop a computational model that
enables a deep neural network to learn new concepts and expand its learned
knowledge to new domains incrementally in a continual learning setting. We rely
on the Parallel Distributed Processing theory to encode abstract concepts in an
embedding space in terms of a multimodal distribution. This embedding space is
modeled by internal data representations in a hidden network layer. We also
leverage the Complementary Learning Systems theory to equip the model with a
memory mechanism to overcome catastrophic forgetting through implementing
pseudo-rehearsal. Our model can generate pseudo-data points for experience
replay and accumulate new experiences to past learned experiences without
causing cross-task interference
Class-Incremental Learning Using Generative Experience Replay Based on Time-aware Regularization
Learning new tasks accumulatively without forgetting remains a critical
challenge in continual learning. Generative experience replay addresses this
challenge by synthesizing pseudo-data points for past learned tasks and later
replaying them for concurrent training along with the new tasks' data.
Generative replay is the best strategy for continual learning under a strict
class-incremental setting when certain constraints need to be met: (i) constant
model size, (ii) no pre-training dataset, and (iii) no memory buffer for
storing past tasks' data. Inspired by the biological nervous system mechanisms,
we introduce a time-aware regularization method to dynamically fine-tune the
three training objective terms used for generative replay: supervised learning,
latent regularization, and data reconstruction. Experimental results on major
benchmarks indicate that our method pushes the limit of brain-inspired
continual learners under such strict settings, improves memory retention, and
increases the average performance over continually arriving tasks
- …