55 research outputs found

    A Unified Multi-Task Learning Framework of Real-Time Drone Supervision for Crowd Counting

    Full text link
    In this paper, a novel Unified Multi-Task Learning Framework of Real-Time Drone Supervision for Crowd Counting (MFCC) is proposed, which utilizes an image fusion network architecture to fuse images from the visible and thermal infrared image, and a crowd counting network architecture to estimate the density map. The purpose of our framework is to fuse two modalities, including visible and thermal infrared images captured by drones in real-time, that exploit the complementary information to accurately count the dense population and then automatically guide the flight of the drone to supervise the dense crowd. To this end, we propose the unified multi-task learning framework for crowd counting for the first time and re-design the unified training loss functions to align the image fusion network and crowd counting network. We also design the Assisted Learning Module (ALM) to fuse the density map feature to the image fusion encoder process for learning the counting features. To improve the accuracy, we propose the Extensive Context Extraction Module (ECEM) that is based on a dense connection architecture to encode multi-receptive-fields contextual information and apply the Multi-domain Attention Block (MAB) for concerning the head region in the drone view. Finally, we apply the prediction map to automatically guide the drones to supervise the dense crowd. The experimental results on the DroneRGBT dataset show that, compared with the existing methods, ours has comparable results on objective evaluations and an easier training process

    Face Recognition Under Varying Illumination

    Get PDF
    This study is a result of a successful joint-venture with my adviser Prof. Dr. Muhittin Gökmen. I am thankful to him for his continuous assistance on preparing this project. Special thanks to the assistants of the Computer Vision Laboratory for their steady support and help in many topics related with the project

    Adversarial Purification of Information Masking

    Full text link
    Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal purification results. We emphasize the potential threat of residual adversarial perturbations to target models, quantitatively establishing a relationship between perturbation scale and attack capability. Notably, the residual perturbations on the purified image primarily stem from the same-position patch and similar patches of the adversarial sample. We propose a novel adversarial purification approach named Information Mask Purification (IMPure), aims to extensively eliminate adversarial perturbations. To obtain an adversarial sample, we first mask part of the patches information, then reconstruct the patches to resist adversarial perturbations from the patches. We reconstruct all patches in parallel to obtain a cohesive image. Then, in order to protect the purified samples against potential similar regional perturbations, we simulate this risk by randomly mixing the purified samples with the input samples before inputting them into the feature extraction network. Finally, we establish a combined constraint of pixel loss and perceptual loss to augment the model's reconstruction adaptability. Extensive experiments on the ImageNet dataset with three classifier models demonstrate that our approach achieves state-of-the-art results against nine adversarial attack methods. Implementation code and pre-trained weights can be accessed at \textcolor{blue}{https://github.com/NoWindButRain/IMPure}

    CVFC: Attention-Based Cross-View Feature Consistency for Weakly Supervised Semantic Segmentation of Pathology Images

    Full text link
    Histopathology image segmentation is the gold standard for diagnosing cancer, and can indicate cancer prognosis. However, histopathology image segmentation requires high-quality masks, so many studies now use imagelevel labels to achieve pixel-level segmentation to reduce the need for fine-grained annotation. To solve this problem, we propose an attention-based cross-view feature consistency end-to-end pseudo-mask generation framework named CVFC based on the attention mechanism. Specifically, CVFC is a three-branch joint framework composed of two Resnet38 and one Resnet50, and the independent branch multi-scale integrated feature map to generate a class activation map (CAM); in each branch, through down-sampling and The expansion method adjusts the size of the CAM; the middle branch projects the feature matrix to the query and key feature spaces, and generates a feature space perception matrix through the connection layer and inner product to adjust and refine the CAM of each branch; finally, through the feature consistency loss and feature cross loss to optimize the parameters of CVFC in co-training mode. After a large number of experiments, An IoU of 0.7122 and a fwIoU of 0.7018 are obtained on the WSSS4LUAD dataset, which outperforms HistoSegNet, SEAM, C-CAM, WSSS-Tissue, and OEEM, respectively.Comment: Submitted to BIBM202

    MGTUNet: An new UNet for colon nuclei instance segmentation and quantification

    Full text link
    Colorectal cancer (CRC) is among the top three malignant tumor types in terms of morbidity and mortality. Histopathological images are the gold standard for diagnosing colon cancer. Cellular nuclei instance segmentation and classification, and nuclear component regression tasks can aid in the analysis of the tumor microenvironment in colon tissue. Traditional methods are still unable to handle both types of tasks end-to-end at the same time, and have poor prediction accuracy and high application costs. This paper proposes a new UNet model for handling nuclei based on the UNet framework, called MGTUNet, which uses Mish, Group normalization and transposed convolution layer to improve the segmentation model, and a ranger optimizer to adjust the SmoothL1Loss values. Secondly, it uses different channels to segment and classify different types of nucleus, ultimately completing the nuclei instance segmentation and classification task, and the nuclei component regression task simultaneously. Finally, we did extensive comparison experiments using eight segmentation models. By comparing the three evaluation metrics and the parameter sizes of the models, MGTUNet obtained 0.6254 on PQ, 0.6359 on mPQ, and 0.8695 on R2. Thus, the experiments demonstrated that MGTUNet is now a state-of-the-art method for quantifying histopathological images of colon cancer.Comment: Published in BIBM2022(regular paper),https://doi.org/10.1109/BIBM55620.2022.999566

    Multi-Head Attention Mechanism Learning for Cancer New Subtypes and Treatment Based on Cancer Multi-Omics Data

    Full text link
    Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omics data and clinical features among subtypes of different cancers. Therefore, the identification and discovery of cancer subtypes are crucial for the diagnosis, treatment, and prognosis of cancer. In this study, we proposed a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze cancer multi-omics data for the identification and characterization of cancer subtypes. AMUCL framework includes a unsupervised multi-head attention mechanism, which deeply extracts multi-omics data features. Importantly, a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism is proposed to learn multi-omics data features and clusters and identify new cancer subtypes. This unsupervised contrastive learning method clusters subtypes by calculating the similarity between samples in the feature space and sample space of multi-omics data. Compared to 11 other deep learning models, the DMACL model achieved a C-index of 0.002, a Silhouette score of 0.801, and a Davies Bouldin Score of 0.38 on a single-cell multi-omics dataset. On a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a Silhouette score of 0.688, and a Davies Bouldin Score of 0.46, and obtained the most reliable cancer subtype clustering results for each type of cancer. Finally, we used the DMACL model in the AMUCL framework to reveal six cancer subtypes of AML. By analyzing the GO functional enrichment, subtype-specific biological functions, and GSEA of AML, we further enhanced the interpretability of cancer subtype analysis based on the generalizable AMUCL framework

    High speed and robust illumination invariant face recognition techniques

    No full text
    The variation in illumination is still a challenging issue in automatic face recognition although considerable progresses have been achieved in controlled environments. It has been proven that differences between various lighting conditions are much greater than differences between individuals. Though numerous approaches have been proposed to address the issue in recent years, their performances are still not satisfying. In this thesis, some high speed approaches for face recognition under varying illuminations are developed. Firstly, a novel illumination normalization approach with low computation complexity is proposed based on Walsh-Hadamard transform (WHT) characterized by low-complexity. By discarding an appropriate number of low-frequency coefficients in the block-wise WHT domain, the effects caused by illumination variations are removed. The proposed method is validated on the Yale B and the Extended Yale B databases. In addition, both analytical proof and experimental results demonstrate that Principal Component Analysis (PCA) and Null-space-based Linear Discriminant Analysis (NLDA) can be directly implemented in the WHT domain without the inverse WHT to reduce computational burden further. Secondly, a novel illumination normalization approach is proposed for face recognition under varying illuminations. In the proposed approach, illumination is estimated in two steps. First of all, low frequency Discrete Cosine Transform (DCT) coefficients in the logarithm domain obtained in local areas are used to estimate illumination coarsely rather than estimating illumination in a global way. After that, a refining estimation step with mean operator is applied to estimate the illumination of every point more precisely. Experimental results demonstrate that the method is superior to other existing methods. Furthermore, a simplified version of the method is also proposed. Both theoretical analysis and experimental results demonstrate the validity and high computational efficiency of the simplified version. Performances of the proposed methods under different values of parameters are also discussed. In addition to the aforementioned illumination normalization methods, an illumination invariant facial feature local relation map (LRM) is explored according to local properties of human faces. A face model under varying illuminations with an additive term as noise is investigated besides the common multiplicative illumination term. High frequency coefficients of the DCT are zeroed to remove the noise. Experimental results validate the proposed face model and the assumption on noise. Different from the common assumption, the illumination and the reflectance cannot be well approximated by only low and high frequency components respectively. In this thesis, an adaptive illumination normalization approach is proposed based on a data-driven soft-thresholding denoising technique. The proposed method models each DCT coefficient except the DC component as Generalized Gaussian distribution (GGD). More information of the reflectance in the low-frequency band is preserved while illumination variations in the high-frequency band are removed. Moreover, the key parameters are adaptively determined without any prior information. Finally, a novel robust face descriptor named Local Line Derivative Pattern (LLDP) is presented for face recognition to deal with not only illumination variations but also expression and aging variations. High-order derivative images in two directions are obtained by convolving original images with Sobel Masks. In the LLDP, an improved binary coding function and three standards on arranging the weights are proposed, and a novel distance measuring both pixel-level and global-level information is also introduced. Promising experimental results are obtained from various face recognition databases.DOCTOR OF PHILOSOPHY (EEE
    • …
    corecore