5,612 research outputs found
Zero-Shot Object Detection by Hybrid Region Embedding
Object detection is considered as one of the most challenging problems in
computer vision, since it requires correct prediction of both classes and
locations of objects in images. In this study, we define a more difficult
scenario, namely zero-shot object detection (ZSD) where no visual training data
is available for some of the target object classes. We present a novel approach
to tackle this ZSD problem, where a convex combination of embeddings are used
in conjunction with a detection framework. For evaluation of ZSD methods, we
propose a simple dataset constructed from Fashion-MNIST images and also a
custom zero-shot split for the Pascal VOC detection challenge. The experimental
results suggest that our method yields promising results for ZSD
Context-Aware Zero-Shot Recognition
We present a novel problem setting in zero-shot learning, zero-shot object
recognition and detection in the context. Contrary to the traditional zero-shot
learning methods, which simply infers unseen categories by transferring
knowledge from the objects belonging to semantically similar seen categories,
we aim to understand the identity of the novel objects in an image surrounded
by the known objects using the inter-object relation prior. Specifically, we
leverage the visual context and the geometric relationships between all pairs
of objects in a single image, and capture the information useful to infer
unseen categories. We integrate our context-aware zero-shot learning framework
into the traditional zero-shot learning techniques seamlessly using a
Conditional Random Field (CRF). The proposed algorithm is evaluated on both
zero-shot region classification and zero-shot detection tasks. The results on
Visual Genome (VG) dataset show that our model significantly boosts performance
with the additional visual context compared to traditional methods
Simple Open-Vocabulary Object Detection with Vision Transformers
Combining simple architectures with large-scale pre-training has led to
massive improvements in image classification. For object detection,
pre-training and scaling approaches are less well established, especially in
the long-tailed and open-vocabulary setting, where training data is relatively
scarce. In this paper, we propose a strong recipe for transferring image-text
models to open-vocabulary object detection. We use a standard Vision
Transformer architecture with minimal modifications, contrastive image-text
pre-training, and end-to-end detection fine-tuning. Our analysis of the scaling
properties of this setup shows that increasing image-level pre-training and
model size yield consistent improvements on the downstream detection task. We
provide the adaptation strategies and regularizations needed to attain very
strong performance on zero-shot text-conditioned and one-shot image-conditioned
object detection. Code and models are available on GitHub.Comment: ECCV 2022 camera-ready versio
Learning models for semantic classification of insufficient plantar pressure images
Establishing a reliable and stable model to predict a target by using insufficient labeled samples is feasible and
effective, particularly, for a sensor-generated data-set. This paper has been inspired with insufficient data-set
learning algorithms, such as metric-based, prototype networks and meta-learning, and therefore we propose
an insufficient data-set transfer model learning method. Firstly, two basic models for transfer learning are
introduced. A classification system and calculation criteria are then subsequently introduced. Secondly, a dataset
of plantar pressure for comfort shoe design is acquired and preprocessed through foot scan system; and by
using a pre-trained convolution neural network employing AlexNet and convolution neural network (CNN)-
based transfer modeling, the classification accuracy of the plantar pressure images is over 93.5%. Finally,
the proposed method has been compared to the current classifiers VGG, ResNet, AlexNet and pre-trained
CNN. Also, our work is compared with known-scaling and shifting (SS) and unknown-plain slot (PS) partition
methods on the public test databases: SUN, CUB, AWA1, AWA2, and aPY with indices of precision (tr, ts, H)
and time (training and evaluation). The proposed method for the plantar pressure classification task shows high
performance in most indices when comparing with other methods. The transfer learning-based method can be
applied to other insufficient data-sets of sensor imaging fields
- …