55 research outputs found
A Unified Multi-Task Learning Framework of Real-Time Drone Supervision for Crowd Counting
In this paper, a novel Unified Multi-Task Learning Framework of Real-Time
Drone Supervision for Crowd Counting (MFCC) is proposed, which utilizes an
image fusion network architecture to fuse images from the visible and thermal
infrared image, and a crowd counting network architecture to estimate the
density map. The purpose of our framework is to fuse two modalities, including
visible and thermal infrared images captured by drones in real-time, that
exploit the complementary information to accurately count the dense population
and then automatically guide the flight of the drone to supervise the dense
crowd. To this end, we propose the unified multi-task learning framework for
crowd counting for the first time and re-design the unified training loss
functions to align the image fusion network and crowd counting network. We also
design the Assisted Learning Module (ALM) to fuse the density map feature to
the image fusion encoder process for learning the counting features. To improve
the accuracy, we propose the Extensive Context Extraction Module (ECEM) that is
based on a dense connection architecture to encode multi-receptive-fields
contextual information and apply the Multi-domain Attention Block (MAB) for
concerning the head region in the drone view. Finally, we apply the prediction
map to automatically guide the drones to supervise the dense crowd. The
experimental results on the DroneRGBT dataset show that, compared with the
existing methods, ours has comparable results on objective evaluations and an
easier training process
Face Recognition Under Varying Illumination
This study is a result of a successful joint-venture with my adviser Prof. Dr. Muhittin Gökmen. I am thankful to him for his continuous assistance on preparing this project. Special thanks to the assistants of the Computer Vision Laboratory for their steady support and help in many topics related with the project
Adversarial Purification of Information Masking
Adversarial attacks meticulously generate minuscule, imperceptible
perturbations to images to deceive neural networks. Counteracting these,
adversarial purification methods seek to transform adversarial input samples
into clean output images to defend against adversarial attacks. Nonetheless,
extent generative models fail to effectively eliminate adversarial
perturbations, yielding less-than-ideal purification results. We emphasize the
potential threat of residual adversarial perturbations to target models,
quantitatively establishing a relationship between perturbation scale and
attack capability. Notably, the residual perturbations on the purified image
primarily stem from the same-position patch and similar patches of the
adversarial sample. We propose a novel adversarial purification approach named
Information Mask Purification (IMPure), aims to extensively eliminate
adversarial perturbations. To obtain an adversarial sample, we first mask part
of the patches information, then reconstruct the patches to resist adversarial
perturbations from the patches. We reconstruct all patches in parallel to
obtain a cohesive image. Then, in order to protect the purified samples against
potential similar regional perturbations, we simulate this risk by randomly
mixing the purified samples with the input samples before inputting them into
the feature extraction network. Finally, we establish a combined constraint of
pixel loss and perceptual loss to augment the model's reconstruction
adaptability. Extensive experiments on the ImageNet dataset with three
classifier models demonstrate that our approach achieves state-of-the-art
results against nine adversarial attack methods. Implementation code and
pre-trained weights can be accessed at
\textcolor{blue}{https://github.com/NoWindButRain/IMPure}
CVFC: Attention-Based Cross-View Feature Consistency for Weakly Supervised Semantic Segmentation of Pathology Images
Histopathology image segmentation is the gold standard for diagnosing cancer,
and can indicate cancer prognosis. However, histopathology image segmentation
requires high-quality masks, so many studies now use imagelevel labels to
achieve pixel-level segmentation to reduce the need for fine-grained
annotation. To solve this problem, we propose an attention-based cross-view
feature consistency end-to-end pseudo-mask generation framework named CVFC
based on the attention mechanism. Specifically, CVFC is a three-branch joint
framework composed of two Resnet38 and one Resnet50, and the independent branch
multi-scale integrated feature map to generate a class activation map (CAM); in
each branch, through down-sampling and The expansion method adjusts the size of
the CAM; the middle branch projects the feature matrix to the query and key
feature spaces, and generates a feature space perception matrix through the
connection layer and inner product to adjust and refine the CAM of each branch;
finally, through the feature consistency loss and feature cross loss to
optimize the parameters of CVFC in co-training mode. After a large number of
experiments, An IoU of 0.7122 and a fwIoU of 0.7018 are obtained on the
WSSS4LUAD dataset, which outperforms HistoSegNet, SEAM, C-CAM, WSSS-Tissue, and
OEEM, respectively.Comment: Submitted to BIBM202
MGTUNet: An new UNet for colon nuclei instance segmentation and quantification
Colorectal cancer (CRC) is among the top three malignant tumor types in terms
of morbidity and mortality. Histopathological images are the gold standard for
diagnosing colon cancer. Cellular nuclei instance segmentation and
classification, and nuclear component regression tasks can aid in the analysis
of the tumor microenvironment in colon tissue. Traditional methods are still
unable to handle both types of tasks end-to-end at the same time, and have poor
prediction accuracy and high application costs. This paper proposes a new UNet
model for handling nuclei based on the UNet framework, called MGTUNet, which
uses Mish, Group normalization and transposed convolution layer to improve the
segmentation model, and a ranger optimizer to adjust the SmoothL1Loss values.
Secondly, it uses different channels to segment and classify different types of
nucleus, ultimately completing the nuclei instance segmentation and
classification task, and the nuclei component regression task simultaneously.
Finally, we did extensive comparison experiments using eight segmentation
models. By comparing the three evaluation metrics and the parameter sizes of
the models, MGTUNet obtained 0.6254 on PQ, 0.6359 on mPQ, and 0.8695 on R2.
Thus, the experiments demonstrated that MGTUNet is now a state-of-the-art
method for quantifying histopathological images of colon cancer.Comment: Published in BIBM2022(regular
paper),https://doi.org/10.1109/BIBM55620.2022.999566
Multi-Head Attention Mechanism Learning for Cancer New Subtypes and Treatment Based on Cancer Multi-Omics Data
Due to the high heterogeneity and clinical characteristics of cancer, there
are significant differences in multi-omics data and clinical features among
subtypes of different cancers. Therefore, the identification and discovery of
cancer subtypes are crucial for the diagnosis, treatment, and prognosis of
cancer. In this study, we proposed a generalization framework based on
attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze
cancer multi-omics data for the identification and characterization of cancer
subtypes. AMUCL framework includes a unsupervised multi-head attention
mechanism, which deeply extracts multi-omics data features. Importantly, a
decoupled contrastive learning model (DMACL) based on a multi-head attention
mechanism is proposed to learn multi-omics data features and clusters and
identify new cancer subtypes. This unsupervised contrastive learning method
clusters subtypes by calculating the similarity between samples in the feature
space and sample space of multi-omics data. Compared to 11 other deep learning
models, the DMACL model achieved a C-index of 0.002, a Silhouette score of
0.801, and a Davies Bouldin Score of 0.38 on a single-cell multi-omics dataset.
On a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a
Silhouette score of 0.688, and a Davies Bouldin Score of 0.46, and obtained the
most reliable cancer subtype clustering results for each type of cancer.
Finally, we used the DMACL model in the AMUCL framework to reveal six cancer
subtypes of AML. By analyzing the GO functional enrichment, subtype-specific
biological functions, and GSEA of AML, we further enhanced the interpretability
of cancer subtype analysis based on the generalizable AMUCL framework
High speed and robust illumination invariant face recognition techniques
The variation in illumination is still a challenging issue in automatic face recognition although considerable progresses have been achieved in controlled environments. It has been proven that differences between various lighting conditions are much greater than differences between individuals. Though numerous approaches have been proposed to address the issue in recent years, their performances are still not satisfying. In this thesis, some high speed approaches for face recognition under varying illuminations are developed. Firstly, a novel illumination normalization approach with low computation complexity is proposed based on Walsh-Hadamard transform (WHT) characterized by low-complexity. By discarding an appropriate number of low-frequency coefficients in the block-wise WHT domain, the effects caused by illumination variations are removed. The proposed method is validated on the Yale B and the Extended Yale B databases. In addition, both analytical proof and experimental results demonstrate that Principal Component Analysis (PCA) and Null-space-based Linear Discriminant Analysis (NLDA) can be directly implemented in the WHT domain without the inverse WHT to reduce computational burden further. Secondly, a novel illumination normalization approach is proposed for face recognition under varying illuminations. In the proposed approach, illumination is estimated in two steps. First of all, low frequency Discrete Cosine Transform (DCT) coefficients in the logarithm domain obtained in local areas are used to estimate illumination coarsely rather than estimating illumination in a global way. After that, a refining estimation step with mean operator is applied to estimate the illumination of every point more precisely. Experimental results demonstrate that the method is superior to other existing methods. Furthermore, a simplified version of the method is also proposed. Both theoretical analysis and experimental results demonstrate the validity and high computational efficiency of the simplified version. Performances of the proposed methods under different values of parameters are also discussed. In addition to the aforementioned illumination normalization methods, an illumination invariant facial feature local relation map (LRM) is explored according to local properties of human faces. A face model under varying illuminations with an additive term as noise is investigated besides the common multiplicative illumination term. High frequency coefficients of the DCT are zeroed to remove the noise. Experimental results validate the proposed face model and the assumption on noise.
Different from the common assumption, the illumination and the reflectance cannot be well approximated by only low and high frequency components respectively. In this thesis, an adaptive illumination normalization approach is proposed based on a data-driven soft-thresholding denoising technique. The proposed method models each DCT coefficient except the DC component as Generalized Gaussian distribution (GGD). More information of the reflectance in the low-frequency band is preserved while illumination variations in the high-frequency band are removed. Moreover, the key parameters are adaptively determined without any prior information. Finally, a novel robust face descriptor named Local Line Derivative Pattern (LLDP) is presented for face recognition to deal with not only illumination variations but also expression and aging variations. High-order derivative images in two directions are obtained by convolving original images with Sobel Masks. In the LLDP, an improved binary coding function and three standards on arranging the weights are proposed, and a novel distance measuring both pixel-level and global-level information is also introduced. Promising experimental results are obtained from various face recognition databases.DOCTOR OF PHILOSOPHY (EEE
- …