99,090 research outputs found
E-CLIP: Towards Label-efficient Event-based Open-world Understanding by CLIP
Contrasting Language-image pertaining (CLIP) has recently shown promising
open-world and few-shot performance on 2D image-based recognition tasks.
However, the transferred capability of CLIP to the novel event camera data
still remains under-explored. In particular, due to the modality gap with the
image-text data and the lack of large-scale datasets, achieving this goal is
non-trivial and thus requires significant research innovation. In this paper,
we propose E-CLIP, a novel and effective framework that unleashes the potential
of CLIP for event-based recognition to compensate for the lack of large-scale
event-based datasets. Our work addresses two crucial challenges: 1) how to
generalize CLIP's visual encoder to event data while fully leveraging events'
unique properties, e.g., sparsity and high temporal resolution; 2) how to
effectively align the multi-modal embeddings, i.e., image, text, and events. To
this end, we first introduce a novel event encoder that subtly models the
temporal information from events and meanwhile generates event prompts to
promote the modality bridging. We then design a text encoder that generates
content prompts and utilizes hybrid text prompts to enhance the E-CLIP's
generalization ability across diverse datasets. With the proposed event
encoder, text encoder, and original image encoder, a novel Hierarchical Triple
Contrastive Alignment (HTCA) module is introduced to jointly optimize the
correlation and enable efficient knowledge transfer among the three modalities.
We conduct extensive experiments on two recognition benchmarks, and the results
demonstrate that our E-CLIP outperforms existing methods by a large margin of
+3.94% and +4.62% on the N-Caltech dataset, respectively, in both fine-tuning
and few-shot settings. Moreover, our E-CLIP can be flexibly extended to the
event retrieval task using both text or image queries, showing plausible
performance.Comment: Jounal version with supplementary materia
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Learning models for semantic classification of insufficient plantar pressure images
Establishing a reliable and stable model to predict a target by using insufficient labeled samples is feasible and
effective, particularly, for a sensor-generated data-set. This paper has been inspired with insufficient data-set
learning algorithms, such as metric-based, prototype networks and meta-learning, and therefore we propose
an insufficient data-set transfer model learning method. Firstly, two basic models for transfer learning are
introduced. A classification system and calculation criteria are then subsequently introduced. Secondly, a dataset
of plantar pressure for comfort shoe design is acquired and preprocessed through foot scan system; and by
using a pre-trained convolution neural network employing AlexNet and convolution neural network (CNN)-
based transfer modeling, the classification accuracy of the plantar pressure images is over 93.5%. Finally,
the proposed method has been compared to the current classifiers VGG, ResNet, AlexNet and pre-trained
CNN. Also, our work is compared with known-scaling and shifting (SS) and unknown-plain slot (PS) partition
methods on the public test databases: SUN, CUB, AWA1, AWA2, and aPY with indices of precision (tr, ts, H)
and time (training and evaluation). The proposed method for the plantar pressure classification task shows high
performance in most indices when comparing with other methods. The transfer learning-based method can be
applied to other insufficient data-sets of sensor imaging fields
- …