2,856 research outputs found
TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks
In recent years, the focus on anomaly detection and localization in
industrial inspection tasks has intensified. While existing studies have
demonstrated impressive outcomes, they often rely heavily on extensive training
datasets or robust features extracted from pre-trained models trained on
diverse datasets like ImageNet. In this work, we propose a novel framework
leveraging the visual-linguistic CLIP model to adeptly train a backbone model
tailored to the manufacturing domain. Our approach concurrently considers
visual and text-aligned embedding spaces for normal and abnormal conditions.
The resulting pre-trained backbone markedly enhances performance in industrial
downstream tasks, particularly in anomaly detection and localization. Notably,
this improvement is substantiated through experiments conducted on multiple
datasets such as MVTecAD, BTAD, and KSDD2. Furthermore, using our pre-trained
backbone weights allows previous works to achieve superior performance in
few-shot scenarios with less training data. The proposed anomaly backbone
provides a foundation model for more precise anomaly detection and
localization
A Joint Learning Approach to Face Detection in Wavelet Compressed Domain
Face detection has been an important and active research topic in computer vision and image processing. In recent years, learning-based face detection algorithms have prevailed with successful applications. In this paper, we propose a new face detection algorithm that works directly in wavelet compressed domain. In order to simplify the processes of image decompression and feature extraction, we modify the AdaBoost learning algorithm to select a set of complimentary joint-coefficient classifiers and integrate them to achieve optimal face detection. Since the face detection on the wavelet compression domain is restricted by the limited discrimination power of the designated feature space, the proposed learning mechanism is developed to achieve the best discrimination from the restricted feature space. The major contributions in the proposed AdaBoost face detection learning algorithm contain the feature space warping, joint feature representation, ID3-like plane quantization, and weak probabilistic classifier, which dramatically increase the discrimination power of the face classifier. Experimental results on the CBCL benchmark and the MIT + CMU real image dataset show that the proposed algorithm can detect faces in the wavelet compressed domain accurately and efficiently
KFC: Kinship Verification with Fair Contrastive Loss and Multi-Task Learning
Kinship verification is an emerging task in computer vision with multiple
potential applications. However, there's no large enough kinship dataset to
train a representative and robust model, which is a limitation for achieving
better performance. Moreover, face verification is known to exhibit bias, which
has not been dealt with by previous kinship verification works and sometimes
even results in serious issues. So we first combine existing kinship datasets
and label each identity with the correct race in order to take race information
into consideration and provide a larger and complete dataset, called KinRace
dataset. Secondly, we propose a multi-task learning model structure with
attention module to enhance accuracy, which surpasses state-of-the-art
performance. Lastly, our fairness-aware contrastive loss function with
adversarial learning greatly mitigates racial bias. We introduce a debias term
into traditional contrastive loss and implement gradient reverse in race
classification task, which is an innovative idea to mix two fairness methods to
alleviate bias. Exhaustive experimental evaluation demonstrates the
effectiveness and superior performance of the proposed KFC in both standard
deviation and accuracy at the same time.Comment: Accepted by BMVC 202
MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition
Although significant progress has been made in face recognition, demographic
bias still exists in face recognition systems. For instance, it usually happens
that the face recognition performance for a certain demographic group is lower
than the others. In this paper, we propose MixFairFace framework to improve the
fairness in face recognition models. First of all, we argue that the commonly
used attribute-based fairness metric is not appropriate for face recognition. A
face recognition system can only be considered fair while every person has a
close performance. Hence, we propose a new evaluation protocol to fairly
evaluate the fairness performance of different approaches. Different from
previous approaches that require sensitive attribute labels such as race and
gender for reducing the demographic bias, we aim at addressing the identity
bias in face representation, i.e., the performance inconsistency between
different identities, without the need for sensitive attribute labels. To this
end, we propose MixFair Adapter to determine and reduce the identity bias of
training samples. Our extensive experiments demonstrate that our MixFairFace
approach achieves state-of-the-art fairness performance on all benchmark
datasets.Comment: Accepted in AAAI-23; Code: https://github.com/fuenwang/MixFairFac
Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection
The goal of spatial-temporal action detection is to determine the time and
place where each person's action occurs in a video and classify the
corresponding action category. Most of the existing methods adopt
fully-supervised learning, which requires a large amount of training data,
making it very difficult to achieve zero-shot learning. In this paper, we
propose to utilize a pre-trained visual-language model to extract the
representative image and text features, and model the relationship between
these features through different interaction modules to obtain the interaction
feature. In addition, we use this feature to prompt each label to obtain more
appropriate text features. Finally, we calculate the similarity between the
interaction feature and the text feature for each label to determine the action
category. Our experiments on J-HMDB and UCF101-24 datasets demonstrate that the
proposed interaction module and prompting make the visual-language features
better aligned, thus achieving excellent accuracy for zero-shot spatio-temporal
action detection. The code will be released upon acceptance.Comment: the first Zero-Shot Spatio-Temporal Action Detection wor
Extremely Low-light Image Enhancement with Scene Text Restoration
Deep learning-based methods have made impressive progress in enhancing
extremely low-light images - the image quality of the reconstructed images has
generally improved. However, we found out that most of these methods could not
sufficiently recover the image details, for instance, the texts in the scene.
In this paper, a novel image enhancement framework is proposed to precisely
restore the scene texts, as well as the overall quality of the image
simultaneously under extremely low-light images conditions. Mainly, we employed
a self-regularised attention map, an edge map, and a novel text detection loss.
In addition, leveraging synthetic low-light images is beneficial for image
enhancement on the genuine ones in terms of text detection. The quantitative
and qualitative experimental results have shown that the proposed model
outperforms state-of-the-art methods in image restoration, text detection, and
text spotting on See In the Dark and ICDAR15 datasets
14-3-3epsilon contributes to tumour suppression in laryngeal carcinoma by affecting apoptosis and invasion
<p>Abstract</p> <p>Background</p> <p>14-3-3epsilon regulates a wide range of biological processes, including cell cycle control, proliferation, and apoptosis, and plays a significant role in neurogenesis and the formation of malignant tumours. However, the exact function and regulatory mechanism of 14-3-3epsilon in carcinogenesis have not been elucidated.</p> <p>Methods</p> <p>The expression of <it>14-3-3epsilon </it>was assessed by RT-PCR and western blotting. The invasiveness and viability of Hep-2 cells were determined by the transwell migration assay and MTT assay, respectively. Cell cycle and apoptosis of Hep-2 cells were detected by flow cytometry.</p> <p>Results</p> <p>The mRNA and protein expression of <it>14-3-3epsilon </it>in larynx squamous cell carcinoma (LSCC) tissues were significantly lower than those in clear surgical margin tissues. Statistical analysis showed that the 14-3-3epsilon protein level in metastatic lymph nodes was lower than that in paired tumour tissues. In addition, the protein level of 14-3-3epsilon in stage III or IV tumours was significantly lower than that in stage I or II tumours. Compared with control Hep-2 cells, the percentages of viable cells in the 14-3-3epsilon-GFP and negative control GFP groups were 36.68 ± 14.09% and 71.68 ± 12.10%, respectively. The proportions of S phase were 22.47 ± 3.36%, 28.17 ± 3.97% and 46.15 ± 6.82%, and the apoptotic sub-G1 populations were 1.23 ± 1.02%, 2.92 ± 1.59% and 13.72 ± 3.89% in the control, negative control GFP and 14-3-3epsilon-GFP groups, respectively. The percentages of the apoptotic cells were 0.84 ± 0.25%, 1.08 ± 0.24% and 2.93 ± 0.13% in the control, negative control GFP and 14-3-3epsilon-GFP groups, respectively. The numbers of cells that penetrated the filter membrane in the control, negative control GFP and 14-3-3epsilon-GFP groups were 20.65 ± 1.94, 17.63 ± 1.04 and 9.1 ± 0.24, respectively, indicating significant differences among the different groups.</p> <p>Conclusions</p> <p>Decreased expression of <it>14-3-3epsilon </it>in LSCC tissues contributes to the initiation and progression of LSCC. <it>14-3-3epsilon </it>can promote apoptosis and inhibit the invasiveness of LSCC.</p
- …