3 research outputs found
U-ASD Net: supervised crowd counting based on semantic segmentation and adaptive scenario discovery
Crowd counting considers one of the most significant and challenging issues in computer vision and deep learning communities, whose applications are being utilized for various tasks. While this issue is well studied, it remains an open challenge to manage perspective distortions and scale variations. How well these problems are resolved has a huge impact on predicting a high-quality crowd density map. In this study, a hybrid and modified deep neural network (U-ASD Net), based on U-Net and adaptive scenario discovery (ASD), is proposed to get precise and effective crowd counting. The U part is produced by replacing the nearest upsampling in the encoder of U-Net with max-unpooling. This modification provides a better crowd counting performance by capturing more spatial information. The max-unpooling layers upsample the feature maps based on the max locations held from the downsampling process. The ASD part is constructed with three light pathways, two of which have been learned to reflect various densities of the crowd and define the appropriate geometric configuration employing various sizes of the receptive field. The third pathway is an adaptation path, which implicitly discovers and models complex scenarios to recalibrate pathway-wise responses adaptively. ASD has no additional branches to avoid increasing the complexity. The designed model is end-to-end trainable. This integration provides an effective model to count crowds in both dense and sparse datasets. It also predicts an elevated quality density map with a high structural similarity index and a high peak signal-to-noise ratio. Several comprehensive experiments on four popular datasets for crowd counting have been carried out to demonstrate the proposed method's promising performance compared to other state-of-the-art approaches. Furthermore, a new dataset with its manual annotations, called Haramain with three different scenes and different densities, is introduced and used for evaluating the U-ASD Net
Multi-attention-based approach for deepfake face and expression swap detection and localization
Abstract Advancements in facial manipulation technology have resulted in highly realistic and indistinguishable face and expression swap videos. However, this has also raised concerns regarding the security risks associated with deepfakes. In the field of multimedia forensics, the detection and precise localization of image forgery has become essential tasks. Current deepfake detectors perform well with high-quality faces within specific datasets, but often struggle to maintain their performance when evaluated across different datasets. To this end, we propose an attention-based multi-task approach to improve feature maps for classification and localization tasks. The encoder and the attention-based decoder of our network generate localized maps that highlight regions with information about the type of manipulation. These localized features are shared with the classification network, improving its performance. Instead of using encoded spatial features, attention-based localized features from the decoder’s first layer are combined with frequency domain features to create a discriminative representation for deepfake detection. Through extensive experiments on face and expression swap datasets, we demonstrate that our method achieves competitive performance in comparison to state-of-the-art deepfake detection approaches in both in-dataset and cross-dataset scenarios. Code is available at https://github.com/saimawaseem/Multi-Attention-Based-Approach-for-Deepfake-Face-and-Expression-Swap-Detection-and-Localization