2,856 research outputs found

    TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks

    Full text link
    In recent years, the focus on anomaly detection and localization in industrial inspection tasks has intensified. While existing studies have demonstrated impressive outcomes, they often rely heavily on extensive training datasets or robust features extracted from pre-trained models trained on diverse datasets like ImageNet. In this work, we propose a novel framework leveraging the visual-linguistic CLIP model to adeptly train a backbone model tailored to the manufacturing domain. Our approach concurrently considers visual and text-aligned embedding spaces for normal and abnormal conditions. The resulting pre-trained backbone markedly enhances performance in industrial downstream tasks, particularly in anomaly detection and localization. Notably, this improvement is substantiated through experiments conducted on multiple datasets such as MVTecAD, BTAD, and KSDD2. Furthermore, using our pre-trained backbone weights allows previous works to achieve superior performance in few-shot scenarios with less training data. The proposed anomaly backbone provides a foundation model for more precise anomaly detection and localization

    A Joint Learning Approach to Face Detection in Wavelet Compressed Domain

    Get PDF
    Face detection has been an important and active research topic in computer vision and image processing. In recent years, learning-based face detection algorithms have prevailed with successful applications. In this paper, we propose a new face detection algorithm that works directly in wavelet compressed domain. In order to simplify the processes of image decompression and feature extraction, we modify the AdaBoost learning algorithm to select a set of complimentary joint-coefficient classifiers and integrate them to achieve optimal face detection. Since the face detection on the wavelet compression domain is restricted by the limited discrimination power of the designated feature space, the proposed learning mechanism is developed to achieve the best discrimination from the restricted feature space. The major contributions in the proposed AdaBoost face detection learning algorithm contain the feature space warping, joint feature representation, ID3-like plane quantization, and weak probabilistic classifier, which dramatically increase the discrimination power of the face classifier. Experimental results on the CBCL benchmark and the MIT + CMU real image dataset show that the proposed algorithm can detect faces in the wavelet compressed domain accurately and efficiently

    KFC: Kinship Verification with Fair Contrastive Loss and Multi-Task Learning

    Full text link
    Kinship verification is an emerging task in computer vision with multiple potential applications. However, there's no large enough kinship dataset to train a representative and robust model, which is a limitation for achieving better performance. Moreover, face verification is known to exhibit bias, which has not been dealt with by previous kinship verification works and sometimes even results in serious issues. So we first combine existing kinship datasets and label each identity with the correct race in order to take race information into consideration and provide a larger and complete dataset, called KinRace dataset. Secondly, we propose a multi-task learning model structure with attention module to enhance accuracy, which surpasses state-of-the-art performance. Lastly, our fairness-aware contrastive loss function with adversarial learning greatly mitigates racial bias. We introduce a debias term into traditional contrastive loss and implement gradient reverse in race classification task, which is an innovative idea to mix two fairness methods to alleviate bias. Exhaustive experimental evaluation demonstrates the effectiveness and superior performance of the proposed KFC in both standard deviation and accuracy at the same time.Comment: Accepted by BMVC 202

    MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition

    Full text link
    Although significant progress has been made in face recognition, demographic bias still exists in face recognition systems. For instance, it usually happens that the face recognition performance for a certain demographic group is lower than the others. In this paper, we propose MixFairFace framework to improve the fairness in face recognition models. First of all, we argue that the commonly used attribute-based fairness metric is not appropriate for face recognition. A face recognition system can only be considered fair while every person has a close performance. Hence, we propose a new evaluation protocol to fairly evaluate the fairness performance of different approaches. Different from previous approaches that require sensitive attribute labels such as race and gender for reducing the demographic bias, we aim at addressing the identity bias in face representation, i.e., the performance inconsistency between different identities, without the need for sensitive attribute labels. To this end, we propose MixFair Adapter to determine and reduce the identity bias of training samples. Our extensive experiments demonstrate that our MixFairFace approach achieves state-of-the-art fairness performance on all benchmark datasets.Comment: Accepted in AAAI-23; Code: https://github.com/fuenwang/MixFairFac

    Physically based adaptive preconditioning for early vision

    Full text link

    Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection

    Full text link
    The goal of spatial-temporal action detection is to determine the time and place where each person's action occurs in a video and classify the corresponding action category. Most of the existing methods adopt fully-supervised learning, which requires a large amount of training data, making it very difficult to achieve zero-shot learning. In this paper, we propose to utilize a pre-trained visual-language model to extract the representative image and text features, and model the relationship between these features through different interaction modules to obtain the interaction feature. In addition, we use this feature to prompt each label to obtain more appropriate text features. Finally, we calculate the similarity between the interaction feature and the text feature for each label to determine the action category. Our experiments on J-HMDB and UCF101-24 datasets demonstrate that the proposed interaction module and prompting make the visual-language features better aligned, thus achieving excellent accuracy for zero-shot spatio-temporal action detection. The code will be released upon acceptance.Comment: the first Zero-Shot Spatio-Temporal Action Detection wor

    Extremely Low-light Image Enhancement with Scene Text Restoration

    Full text link
    Deep learning-based methods have made impressive progress in enhancing extremely low-light images - the image quality of the reconstructed images has generally improved. However, we found out that most of these methods could not sufficiently recover the image details, for instance, the texts in the scene. In this paper, a novel image enhancement framework is proposed to precisely restore the scene texts, as well as the overall quality of the image simultaneously under extremely low-light images conditions. Mainly, we employed a self-regularised attention map, an edge map, and a novel text detection loss. In addition, leveraging synthetic low-light images is beneficial for image enhancement on the genuine ones in terms of text detection. The quantitative and qualitative experimental results have shown that the proposed model outperforms state-of-the-art methods in image restoration, text detection, and text spotting on See In the Dark and ICDAR15 datasets

    14-3-3epsilon contributes to tumour suppression in laryngeal carcinoma by affecting apoptosis and invasion

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>14-3-3epsilon regulates a wide range of biological processes, including cell cycle control, proliferation, and apoptosis, and plays a significant role in neurogenesis and the formation of malignant tumours. However, the exact function and regulatory mechanism of 14-3-3epsilon in carcinogenesis have not been elucidated.</p> <p>Methods</p> <p>The expression of <it>14-3-3epsilon </it>was assessed by RT-PCR and western blotting. The invasiveness and viability of Hep-2 cells were determined by the transwell migration assay and MTT assay, respectively. Cell cycle and apoptosis of Hep-2 cells were detected by flow cytometry.</p> <p>Results</p> <p>The mRNA and protein expression of <it>14-3-3epsilon </it>in larynx squamous cell carcinoma (LSCC) tissues were significantly lower than those in clear surgical margin tissues. Statistical analysis showed that the 14-3-3epsilon protein level in metastatic lymph nodes was lower than that in paired tumour tissues. In addition, the protein level of 14-3-3epsilon in stage III or IV tumours was significantly lower than that in stage I or II tumours. Compared with control Hep-2 cells, the percentages of viable cells in the 14-3-3epsilon-GFP and negative control GFP groups were 36.68 ± 14.09% and 71.68 ± 12.10%, respectively. The proportions of S phase were 22.47 ± 3.36%, 28.17 ± 3.97% and 46.15 ± 6.82%, and the apoptotic sub-G1 populations were 1.23 ± 1.02%, 2.92 ± 1.59% and 13.72 ± 3.89% in the control, negative control GFP and 14-3-3epsilon-GFP groups, respectively. The percentages of the apoptotic cells were 0.84 ± 0.25%, 1.08 ± 0.24% and 2.93 ± 0.13% in the control, negative control GFP and 14-3-3epsilon-GFP groups, respectively. The numbers of cells that penetrated the filter membrane in the control, negative control GFP and 14-3-3epsilon-GFP groups were 20.65 ± 1.94, 17.63 ± 1.04 and 9.1 ± 0.24, respectively, indicating significant differences among the different groups.</p> <p>Conclusions</p> <p>Decreased expression of <it>14-3-3epsilon </it>in LSCC tissues contributes to the initiation and progression of LSCC. <it>14-3-3epsilon </it>can promote apoptosis and inhibit the invasiveness of LSCC.</p
    corecore