3,206 research outputs found
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
Despite great progress in object detection, most existing methods work only
on a limited set of object categories, due to the tremendous human effort
needed for bounding-box annotations of training data. To alleviate the problem,
recent open vocabulary and zero-shot detection methods attempt to detect novel
object categories beyond those seen during training. They achieve this goal by
training on a pre-defined base categories to induce generalization to novel
objects. However, their potential is still constrained by the small set of base
categories available for training. To enlarge the set of base classes, we
propose a method to automatically generate pseudo bounding-box annotations of
diverse objects from large-scale image-caption pairs. Our method leverages the
localization ability of pre-trained vision-language models to generate pseudo
bounding-box labels and then directly uses them for training object detectors.
Experimental results show that our method outperforms the state-of-the-art open
vocabulary detector by 8% AP on COCO novel categories, by 6.3% AP on PASCAL
VOC, by 2.3% AP on Objects365 and by 2.8% AP on LVIS. Code is available at
https://github.com/salesforce/PB-OVD.Comment: ECCV 202
A temperature gradient based Condition Estimation Method for IGBT Module
The paper presents a temperature gradient based method for device state evaluation, taking the insulated Gated Bipolar Transistor (IGBT) modules as an example investigation. Firstly, theoretical basis of this method is presented and the results from example calculation on temperature gradient indicate that the increased thermal resistance and power loss of IGBT modules would increase the temperature gradient. Then an electrical-thermal- mechanical finite element method (FEM) model of IGBT modules, which takes the material temperature-dependent characteristic into account, is utilized to estimate the temperature gradient distribution for both healthy and fatigue conditions. It is found that the temperature gradient varies with power loss. Furthermore, both the experimental and simulation investigation on the temperature gradient for different conditions were conducted, and it is concluded that the temperature gradient can not only track the change of power loss, but have a better sensitivity compared with temperature distribution. In addition, the temperature gradient can reflect the defects location and distinguish failures degree. In the end the influence on the temperature gradient distribution caused by solder fatigue, void and delamination are discussed
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
Existing instance segmentation models learn task-specific information using
manual mask annotations from base (training) categories. These mask annotations
require tremendous human effort, limiting the scalability to annotate novel
(new) categories. To alleviate this problem, Open-Vocabulary (OV) methods
leverage large-scale image-caption pairs and vision-language models to learn
novel categories. In summary, an OV method learns task-specific information
using strong supervision from base annotations and novel category information
using weak supervision from image-captions pairs. This difference between
strong and weak supervision leads to overfitting on base categories, resulting
in poor generalization towards novel categories. In this work, we overcome this
issue by learning both base and novel categories from pseudo-mask annotations
generated by the vision-language model in a weakly supervised manner using our
proposed Mask-free OVIS pipeline. Our method automatically generates
pseudo-mask annotations by leveraging the localization ability of a pre-trained
vision-language model for objects present in image-caption pairs. The generated
pseudo-mask annotations are then used to supervise an instance segmentation
model, freeing the entire pipeline from any labour-expensive instance-level
annotations and overfitting. Our extensive experiments show that our method
trained with just pseudo-masks significantly improves the mAP scores on the
MS-COCO dataset and OpenImages dataset compared to the recent state-of-the-art
methods trained with manual masks. Codes and models are provided in
https://vibashan.github.io/ovis-web/.Comment: Accepted to CVPR 2023. Project site:
https://vibashan.github.io/ovis-web
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Text-VQA aims at answering questions that require understanding the textual
cues in an image. Despite the great progress of existing Text-VQA methods,
their performance suffers from insufficient human-labeled question-answer (QA)
pairs. However, we observe that, in general, the scene text is not fully
exploited in the existing datasets -- only a small portion of the text in each
image participates in the annotated QA activities. This results in a huge waste
of useful information. To address this deficiency, we develop a new method to
generate high-quality and diverse QA pairs by explicitly utilizing the existing
rich text available in the scene context of each image. Specifically, we
propose, TAG, a text-aware visual question-answer generation architecture that
learns to produce meaningful, and accurate QA samples using a multimodal
transformer. The architecture exploits underexplored scene text information and
enhances scene understanding of Text-VQA models by combining the generated QA
pairs with the initial training data. Extensive experimental results on two
well-known Text-VQA benchmarks (TextVQA and ST-VQA) demonstrate that our
proposed TAG effectively enlarges the training data that helps improve the
Text-VQA performance without extra labeling effort. Moreover, our model
outperforms state-of-the-art approaches that are pre-trained with extra
large-scale data. Code is available at https://github.com/HenryJunW/TAG.Comment: BMVC 202
- …