80 research outputs found

    Pornographic Image Recognition via Weighted Multiple Instance Learning

    Full text link
    In the era of Internet, recognizing pornographic images is of great significance for protecting children's physical and mental health. However, this task is very challenging as the key pornographic contents (e.g., breast and private part) in an image often lie in local regions of small size. In this paper, we model each image as a bag of regions, and follow a multiple instance learning (MIL) approach to train a generic region-based recognition model. Specifically, we take into account the region's degree of pornography, and make three main contributions. First, we show that based on very few annotations of the key pornographic contents in a training image, we can generate a bag of properly sized regions, among which the potential positive regions usually contain useful contexts that can aid recognition. Second, we present a simple quantitative measure of a region's degree of pornography, which can be used to weigh the importance of different regions in a positive image. Third, we formulate the recognition task as a weighted MIL problem under the convolutional neural network framework, with a bag probability function introduced to combine the importance of different regions. Experiments on our newly collected large scale dataset demonstrate the effectiveness of the proposed method, achieving an accuracy with 97.52% true positive rate at 1% false positive rate, tested on 100K pornographic images and 100K normal images.Comment: 9 pages, 3 figure

    Deep Architectures for Content Moderation and Movie Content Rating

    Full text link
    Rating a video based on its content is an important step for classifying video age categories. Movie content rating and TV show rating are the two most common rating systems established by professional committees. However, manually reviewing and evaluating scene/film content by a committee is a tedious work and it becomes increasingly difficult with the ever-growing amount of online video content. As such, a desirable solution is to use computer vision based video content analysis techniques to automate the evaluation process. In this paper, related works are summarized for action recognition, multi-modal learning, movie genre classification, and sensitive content detection in the context of content moderation and movie content rating. The project page is available at https://github.com/fcakyon/content-moderation-deep-learning

    Analysis of Deep-Fake Technology Impacting Digital World Credibility: A Comprehensive Literature Review

    Get PDF
    Deep-Fake Technique is a new scientific method that uses Artificial-Intelligince to make fake videos with an affect of facial expressions and coordinated movement of lips. This technology is frequently employed in a variety of contexts with various goals. Deep-Fake technology is being used to generate an extremely realistic fake video that can be widely distributed to promote false information or fake news about any celebrity or leader that was not created by them. Because of the widespread use of social media, these fraudulent videos can garner billions of views in under an hour and have a significant impact on our culture. Deep-Fakes are a threat to our celebrities, democracy, religious views, and commerce, according to the findings, but they can be managed through rules and regulations, strong company policy, and general internet user awareness and education. We need to devise a process for examining such video and distinguishing between actual and fraudulent footage

    Weakly supervised human skin segmentation using guidance attention mechanisms

    Get PDF
    Human skin segmentation is a crucial task in computer vision and biometric systems, yet it poses several challenges such as variability in skin colour, pose, and illumination. This paper presents a robust data-driven skin segmentation method for a single image that addresses these challenges through the integration of contextual information and efficient network design. In addition to robustness and accuracy, the integration into real-time systems requires a careful balance between computational power, speed, and performance. The proposed method incorporates two attention modules, Body Attention and Skin Attention, that utilize contextual information to improve segmentation results. These modules draw attention to the desired areas, focusing on the body boundaries and skin pixels, respectively. Additionally, an efficient network architecture is employed in the encoder part to minimize computational power while retaining high performance. To handle the issue of noisy labels in skin datasets, the proposed method uses a weakly supervised training strategy, relying on the Skin Attention module. The results of this study demonstrate that the proposed method is comparable to, or outperforms, state-of-the-art methods on benchmark datasets.This work is part of the visuAAL project on Privacy-Aware and Acceptable Video-Based Technologies and Services for Active and Assisted Living (https://www.visuaal-itn.eu/). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 861091

    Time-Sensitive Adaptive Model for Adult Image Classification

    Get PDF
    Images play an important role in modern internet communications, but not all of the images shared by the users are appropriate, and it is necessary to check and reject the inappropriate ones. Deep neural networks do this task perfectly, but it may not be necessary to use maximum power for all images. Many easier-to-identify images may be classified at a lower cost than running the full model. Also, the pressure on the system varies from time to time, so an algorithm that can produce the best possible results for different budgets is very useful. For this purpose, a deep convolutional neural network with the ability to generate several outputs from its various layers has been designed. Each output can be considered as a classifier with its own cost and accuracy. A selector is then used to select and combine the results of these outputs to produce the best possible result in the specified time budget. The selector uses a reinforcement learning model, which, despite the time-consuming learning phase, is fast at execution time. Our experiments on challenging social media images dataset show that the proposed model can reduce the processing time by 32 % by sacrificing only 1.4 % of accuracy compared to the VGG-f network. Also, using different metrics such as F1-score and AUC (the Area Under the Curve in the accuracy vs. time budget chart), the superiority of the proposed model at different time budgets over the base model is shown

    Impact of Deepfake Technology on Digital World Authenticity: A Review

    Get PDF
    Deep fake technology is an emerging technology that creates fake videos by using artificial intelligence (AI) with the facial expression and lips sing effect. Deep fake technology is widely used in different scenarios with different objectives. Deep fake technology is used to make a highly realistic fake video that can be widely used to spread the wrong information or fake news by regarding any celebrity or political leader which is not created by them. Due to the high impact of social media, these fake videos can reach millions of views within an hour and create a negative impact on our society. This technology can be used by criminals to threaten society by making such deep fake (AI) videos. The results suggest that deepfakes are a threat to our celebrities, political system, religious beliefs, and business, they can be controlled by rules and regulations, strict corporate policy and awareness, education, and training to the common internet users. We need to develop a technology that can examine such types of video and be able to differentiate between real and fake video. Government agency also needs to create some policy to regulate such technology so that monitoring and controlling the use of this AI technology can be managed

    Evaluating the Performance of Vision Transformer Architecture for Deepfake Image Classification

    Get PDF
    Deepfake classification has seen some impressive results lately, with the experimentation of various deep learning methodologies, researchers were able to design some state-of-the art techniques. This study attempts to use an existing technology “Transformers” in the field of Natural Language Processing (NLP) which has been a de-facto standard in text processing for the purposes of Computer Vision. Transformers use a mechanism called “self-attention”, which is different from CNN and LSTM. This study uses a novel technique that considers images as 16x16 words (Dosovitskiy et al., 2021) to train a deep neural network with “self-attention” blocks to detect deepfakes. It creates position embeddings of the image patches which can be passed to the Transformer block to classify the modified images from the CELEB-DF-v2 dataset. Furthermore, the difference between the mean accuracy of this model and an existing state-of-the-art detection technique that uses the Residual CNN network is compared for statistical significance. Both these models are compared on their performances mainly Accuracy and loss. This study shows the state-of-the-art results obtained using this novel technique. The Vision Transformer based model achieved state-of-the-art performance with 97.07% accuracy when compared to the ResNet-18 model which achieved 91.78% accuracy

    Deepfakes Generated by Generative Adversarial Networks

    Get PDF
    Deep learning is a type of Artificial Intelligence (AI) that mimics the workings of the human brain in processing data such as speech recognition, visual object recognition, object detection, language translation, and making decisions. A Generative adversarial network (GAN) is a special type of deep learning, designed by Goodfellow et al. (2014), which is what we call convolution neural networks (CNN). How a GAN works is that when given a training set, they can generate new data with the same information as the training set, and this is often what we refer to as deep fakes. CNN takes an input image, assigns learnable weights and biases to various aspects of the object and is able to differentiate one from the other. This is similar to what GAN does, it creates two neural networks called discriminator and generator, and they work together to differentiate the sample input from the generated input (deep fakes). Deep fakes is a machine learning technique where a person in an existing image or video is replaced by someone else’s likeness. Deep fakes have become a problem in society because it allows anyone’s image to be co-opted and calls into question our ability to trust what we see. In this project we develop a GAN to generate deepfakes. Next, we develop a survey to determine if participants are able to identify authentic versus deep fake images. The survey employed a questionnaire asking participants their perception on AI technology based on their overall familiarity of AI, deep fake generation, reliability and trustworthiness of AI, as well as testing to see if subjects can distinguish real versus deep fake images. Results show demographic differences in perceptions of AI and that humans are good at distinguishing real images from deep fakes
    • …
    corecore