443 research outputs found

    Pornographic Image Recognition via Weighted Multiple Instance Learning

    Full text link
    In the era of Internet, recognizing pornographic images is of great significance for protecting children's physical and mental health. However, this task is very challenging as the key pornographic contents (e.g., breast and private part) in an image often lie in local regions of small size. In this paper, we model each image as a bag of regions, and follow a multiple instance learning (MIL) approach to train a generic region-based recognition model. Specifically, we take into account the region's degree of pornography, and make three main contributions. First, we show that based on very few annotations of the key pornographic contents in a training image, we can generate a bag of properly sized regions, among which the potential positive regions usually contain useful contexts that can aid recognition. Second, we present a simple quantitative measure of a region's degree of pornography, which can be used to weigh the importance of different regions in a positive image. Third, we formulate the recognition task as a weighted MIL problem under the convolutional neural network framework, with a bag probability function introduced to combine the importance of different regions. Experiments on our newly collected large scale dataset demonstrate the effectiveness of the proposed method, achieving an accuracy with 97.52% true positive rate at 1% false positive rate, tested on 100K pornographic images and 100K normal images.Comment: 9 pages, 3 figure

    Fast and Effective Bag-of-Visual-Word Model to Pornographic Images Recognition Using the FREAK Descriptor

    Get PDF
    Recently, the Bag of Visual Word (BoVW) has gained enormous popularity between researchers to object recognition. Pornographic image recognition with respect to computational complexity, appropriate accuracy, and memory consumption is a major challenge in the applications with time constraints such as the internet pornography filtering. Most of the existing researches based on the Bow, using the very popular SIFT and SURF algorithms to description and match detected keypoints in the image. The main problem of these methods is high computational complexity due to constructing the high dimensional feature vectors. This research proposed a BoVW based model by adopting very fast and simple binary descriptor FREAK to speed-up pornographic recognition process. Meanwhile, the keypoints are detected in the ROI of images which improves the recognition speed due to eliminating many noise keypoints placed in the image background. Finally, in order to find the most representational visual-vocabulary, different vocabularies are generated from size 150 to 500 for BoVW. Compared with the similar works, the experimental results show that the proposed model has gained remarkable improvement in the terms of computational complexity

    Survey On Nudity Detection: Opportunities And Challenges Based On ‘Awrah Concept In Islamic Shari’a

    Get PDF
    The nudity or nakedness which known as awrah in Islam is part of the human body which in principle should not be seen by other people except those qualified to be her or his mahram or in an emergency or urgent need.Nudity detection technique has long been receiving a lot of attention by researchers worldwide due to its importance particularly to the global Muslim community. In this paper, the techniques were separated into four classifications namely methods based on body structure, image retrieval, the features of skin region, and bag-of-visual-words (BoVW). All of these techniques are applicable to some areas of skin on the body as well as on the sexual organs that should be visible to determine nude or not. While the concept of nakedness in Islamic Shari'a has different rules between men and women, such as the limit of male ‘awrah is between the navel and the knees, while the limit of female ‘awrah is the entire body except the face and hands which should be closed using the hijab. In general, existing techniques can be used to detect nakedness concerned bythe Islamic Shari'a. The selection ofhese techniques are employed based on the areas of skin on the body as well as or the sexual organs to indicate whether it falls to thenude category or not. While in Islamic Shari'a, different 'awrah rules are required for men and women such as the limit 'awrah, the requirements of clothes as cover awrah, and kinds of shapes and shades of Hijabs in various countries (for women only). These problems are the opportunities and challenges for the researcher to propose an ‘awrah detection technique in accordance with the Islamic Shari'a

    It is not Sexually Suggestive, It is Educative. Separating Sex Education from Suggestive Content on TikTok Videos

    Full text link
    We introduce SexTok, a multi-modal dataset composed of TikTok videos labeled as sexually suggestive (from the annotator's point of view), sex-educational content, or neither. Such a dataset is necessary to address the challenge of distinguishing between sexually suggestive content and virtual sex education videos on TikTok. Children's exposure to sexually suggestive videos has been shown to have adversarial effects on their development. Meanwhile, virtual sex education, especially on subjects that are more relevant to the LGBTQIA+ community, is very valuable. The platform's current system removes or penalizes some of both types of videos, even though they serve different purposes. Our dataset contains video URLs, and it is also audio transcribed. To validate its importance, we explore two transformer-based models for classifying the videos. Our preliminary results suggest that the task of distinguishing between these types of videos is learnable but challenging. These experiments suggest that this dataset is meaningful and invites further study on the subject.Comment: Accepted to ACL Findings 2023. 10 pages, 3 figures, 5 tables . Please refer to https://github.com/enfageorge/SexTok for dataset and related detail

    Análise de vídeo sensível

    Get PDF
    Orientadores: Anderson de Rezende Rocha, Siome Klein GoldensteinTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Vídeo sensível pode ser definido como qualquer filme capaz de oferecer ameaças à sua audiência. Representantes típicos incluem ¿ mas não estão limitados a ¿ pornografia, violência, abuso infantil, crueldade contra animais, etc. Hoje em dia, com o papel cada vez mais pervasivo dos dados digitais em nossa vidas, a análise de conteúdo sensível representa uma grande preocupação para representantes da lei, empresas, professores, e pais, devido aos potenciais danos que este tipo de conteúdo pode infligir a menores, estudantes, trabalhadores, etc. Não obstante, o emprego de mediadores humanos, para constantemente analisar grandes quantidades de dados sensíveis, muitas vezes leva a ocorrências de estresse e trauma, o que justifica a busca por análises assistidas por computador. Neste trabalho, nós abordamos este problema em duas frentes. Na primeira, almejamos decidir se um fluxo de vídeo apresenta ou não conteúdo sensível, à qual nos referimos como classificação de vídeo sensível. Na segunda, temos como objetivo encontrar os momentos exatos em que um fluxo começa e termina a exibição de conteúdo sensível, em nível de quadros de vídeo, à qual nos referimos como localização de conteúdo sensível. Para ambos os casos, projetamos e desenvolvemos métodos eficazes e eficientes, com baixo consumo de memória, e adequação à implantação em dispositivos móveis. Neste contexto, nós fornecemos quatro principais contribuições. A primeira é uma nova solução baseada em sacolas de palavras visuais, para a classificação eficiente de vídeos sensíveis, apoiada na análise de fenômenos temporais. A segunda é uma nova solução de fusão multimodal em alto nível semântico, para a localização de conteúdo sensível. A terceira, por sua vez, é um novo detector espaço-temporal de pontos de interesse, e descritor de conteúdo de vídeo. Finalmente, a quarta contribuição diz respeito a uma base de vídeos anotados em nível de quadro, que possui 140 horas de conteúdo pornográfico, e que é a primeira da literatura a ser adequada para a localização de pornografia. Um aspecto relevante das três primeiras contribuições é a sua natureza de generalização, no sentido de poderem ser empregadas ¿ sem modificações no passo a passo ¿ para a detecção de tipos diversos de conteúdos sensíveis, tais como os mencionados anteriormente. Para validação, nós escolhemos pornografia e violência ¿ dois dos tipos mais comuns de material impróprio ¿ como representantes de interesse, de conteúdo sensível. Nestes termos, realizamos experimentos de classificação e de localização, e reportamos resultados para ambos os tipos de conteúdo. As soluções propostas apresentam uma acurácia de 93% em classificação de pornografia, e permitem a correta localização de 91% de conteúdo pornográfico em fluxo de vídeo. Os resultados para violência também são interessantes: com as abordagens apresentadas, nós obtivemos o segundo lugar em uma competição internacional de detecção de cenas violentas. Colocando ambas em perspectiva, nós aprendemos que a detecção de pornografia é mais fácil que a de violência, abrindo várias oportunidades de pesquisa para a comunidade científica. A principal razão para tal diferença está relacionada aos níveis distintos de subjetividade que são inerentes a cada conceito. Enquanto pornografia é em geral mais explícita, violência apresenta um espectro mais amplo de possíveis manifestaçõesAbstract: Sensitive video can be defined as any motion picture that may pose threats to its audience. Typical representatives include ¿ but are not limited to ¿ pornography, violence, child abuse, cruelty to animals, etc. Nowadays, with the ever more pervasive role of digital data in our lives, sensitive-content analysis represents a major concern to law enforcers, companies, tutors, and parents, due to the potential harm of such contents over minors, students, workers, etc. Notwithstanding, the employment of human mediators for constantly analyzing huge troves of sensitive data often leads to stress and trauma, justifying the search for computer-aided analysis. In this work, we tackle this problem in two ways. In the first one, we aim at deciding whether or not a video stream presents sensitive content, which we refer to as sensitive-video classification. In the second one, we aim at finding the exact moments a stream starts and ends displaying sensitive content, at frame level, which we refer to as sensitive-content localization. For both cases, we aim at designing and developing effective and efficient methods, with low memory footprint and suitable for deployment on mobile devices. In this vein, we provide four major contributions. The first one is a novel Bag-of-Visual-Words-based pipeline for efficient time-aware sensitive-video classification. The second is a novel high-level multimodal fusion pipeline for sensitive-content localization. The third, in turn, is a novel space-temporal video interest point detector and video content descriptor. Finally, the fourth contribution comprises a frame-level annotated 140-hour pornographic video dataset, which is the first one in the literature that is appropriate for pornography localization. An important aspect of the first three contributions is their generalization nature, in the sense that they can be employed ¿ without step modifications ¿ to the detection of diverse sensitive content types, such as the previously mentioned ones. For validation, we choose pornography and violence ¿ two of the commonest types of inappropriate material ¿ as target representatives of sensitive content. We therefore perform classification and localization experiments, and report results for both types of content. The proposed solutions present an accuracy of 93% in pornography classification, and allow the correct localization of 91% of pornographic content within a video stream. The results for violence are also compelling: with the proposed approaches, we reached second place in an international competition of violent scenes detection. Putting both in perspective, we learned that pornography detection is easier than its violence counterpart, opening several opportunities for additional investigations by the research community. The main reason for such difference is related to the distinct levels of subjectivity that are inherent to each concept. While pornography is usually more explicit, violence presents a broader spectrum of possible manifestationsDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação1572763, 1197473CAPE

    Multimedia

    Get PDF
    The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications

    Distinguishing Medical Web Pages from Pornographic Ones: An Efficient Pornography Websites Filtering Method

    Get PDF
    Abstract In this paper, we apply the uncomplicated decision tree data mining algorithm to find association rules about pornographic and medical web pages. On the basis of these association rules, we propose a systematized method of filtering pornographic websites with the following major superiorities: 1) Check only contexts of web pages without scanning pictures to avoid the low operating efficiency in analyzing photographs. Moreover, the error rate is lowered and the accuracy of filtering is enhanced simultaneously. 2) While filtering the pornographic web pages accurately, the misjudgments of identifying medical web pages as pornographic ones will be reduced effectively. 3) A re-learning mechanism is designed to improve our filtering method incrementally. Therefore, the revision information learned from the misjudged web pages can incrementally give feedback to our method and improve its effectiveness. The experimental results showed that each efficacy assessment indexes reached a satisfactory value. Therefore, we can conclude that the proposed method is possessed of outstanding performance and effectivity

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Identification of Unknown Landscape Types Using CNN Transfer Learning

    Get PDF
    Unknown image type identification is the problem of identifying unknown types of images from the set of already provided images that are considered to be known, where the known and unknown sets represent different content types. Solving this problem has a lot of security applications such as suspicious object detection during baggage scanning at airport customs, border protection via remote sensing, cancer detection, weather and disaster monitoring, etc. In this thesis, we focus on identification of unknown landscape images. This application has a huge relevance to the context of a smart nation where it can be applied to major national security tasks such as monitoring the borders or the detection of unknown and potentially dangerous landscapes in critical locations. We propose effective semi-supervised novelty detection approaches for the unknown image type identification problem using Convolutional Neural Network (CNN) Transfer Learning. Recently, the CNN Transfer Learning approach has been very successful in various visual recognition tasks especially in cases where large training data is not available. Our main idea is to use pre-trained CNNs (i.e. already trained on large datasets like ImageNet [10]) that are then used to train new models specifically applicable to the landscape image dataset. Features extracted from these domain-specific trained CNN are then used with standard semi-supervised novelty detection algorithms like Gaussian Mixture Model, Isolation Forest, One-class Support Vector Machines (SVM) and Bayesian Gaussian Mixture Models to identify the unknown landscape images. We provide two fine-tuning approaches: supervised and unsupervised. Supervised fine-tuning approach simply uses the the class categories (landscape classes, e.g. airport, stadium, etc.) of the known images dataset. The unsupervised fine tuning approach on the other hand learns the class categories from the known images using the unsupervised clustering-based algorithm. We conducted extensive experiments that prove the effectiveness of our approaches. Our best values of AUROC and average precision scores for the identification problem are 0.96 and 0.94, respectively. In particular, we statistically prove that both fine-tuning methods significantly increase the performance of the identification with respect to the non fine-tuned CNN, and unsupervised and supervised fine tuning approaches are comparable
    corecore