8 research outputs found

    Tracing images back to their social network of origin: A CNN-based approach

    Get PDF
    Recovering information about the history of a digital content, such as an image or a video, can be strategic to address an investigation from the early stages. Storage devices, smart-phones and PCs, belonging to a suspect, are usually confiscated as soon as a warrant is issued. Any multimedia content found is analyzed in depth, in order to trace back its provenance and, if possible, its original source. This is particularly important when dealing with social networks, where most of the user-generated photos and videos are uploaded and shared daily. Being able to discern if images are downloaded from a social network or directly captured by a digital camera, can be crucial in leading consecutive investigations. In this paper, we propose a novel method based on convolutional neural networks (CNN) to determine the image provenance, whether it originates from a social network, a messaging application or directly from a photo-camera. By considering only the visual content, the method works irrespective of an eventual manipulation of metadata performed by an attacker. We have tested the proposed technique on three publicly available datasets of images downloaded from seven popular social networks, obtaining state-of-the-art results

    Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

    Full text link
    Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless, such methods usually require laborious object-level annotations (i.e., object labels and bounding boxes) for effective learning of the object-level visual features. In this paper, we propose a novel and efficient deep framework to boost multi-label classification by distilling knowledge from weakly-supervised detection task without bounding box annotations. Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs. The WSD model is the teacher model and the classification model is the student model. After this cross-task knowledge distillation, the performance of the classification model is significantly improved and the efficiency is maintained since the WSD model can be safely discarded in the test phase. Extensive experiments on two large-scale datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior performances over the state-of-the-art methods on both performance and efficiency.Comment: accepted by ACM Multimedia 2018, 9 pages, 4 figures, 5 table

    Analysis of Automatic Annotations of Real Video Surveillance Images

    Get PDF
    The results of the analysis of the automatic annotations of real video surveillance sequences are presented. The annotations of the frames of surveillance sequences of the parking lot of a university campus are generated. The purpose of the analysis is to evaluate the quality of the descriptions and analyze the correspondence between the semantic content of the images and the corresponding annotation. To perform the tests, a fixed camera was placed in the campus parking lot and video sequences of about 20 minutes were obtained, later each frame was annotated individually and a text repository with all the annotations was formed. It was observed that it is possible to take advantage of the properties of the video to evaluate the performance of the annotator and the example of the crossing of a pedestrian is presented as an example for its analysis

    VSE-ens: Visual-Semantic Embeddings with Efficient Negative Sampling

    Full text link
    Jointing visual-semantic embeddings (VSE) have become a research hotpot for the task of image annotation, which suffers from the issue of semantic gap, i.e., the gap between images' visual features (low-level) and labels' semantic features (high-level). This issue will be even more challenging if visual features cannot be retrieved from images, that is, when images are only denoted by numerical IDs as given in some real datasets. The typical way of existing VSE methods is to perform a uniform sampling method for negative examples that violate the ranking order against positive examples, which requires a time-consuming search in the whole label space. In this paper, we propose a fast adaptive negative sampler that can work well in the settings of no figure pixels available. Our sampling strategy is to choose the negative examples that are most likely to meet the requirements of violation according to the latent factors of images. In this way, our approach can linearly scale up to large datasets. The experiments demonstrate that our approach converges 5.02x faster than the state-of-the-art approaches on OpenImages, 2.5x on IAPR-TCI2 and 2.06x on NUS-WIDE datasets, as well as better ranking accuracy across datasets.Comment: Published by The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18

    Image Tagging using Modified Association Rule based on Semantic Neighbors

    Get PDF
    With the rapid development of the internet, mobiles, and social image-sharing websites, a large number of images are generated daily.  The huge repository of the images poses challenges for an image retrieval system. On image-sharing social websites such as Flickr, the users can assign keywords/tags to the images which can describe the content of the images. These tags play important role in an image retrieval system. However, the user-assigned tags are highly personalized which brings many challenges for retrieval of the images.  Thus, it is necessary to suggest appropriate tags to the images. Existing methods for tag recommendation based on nearest neighbors ignore the relationship between tags. In this paper, the method is proposed for tag recommendations for the images based on semantic neighbors using modified association rule. Given an image, the method identifies the semantic neighbors using random forest based on the weight assigned to each category. The tags associated with the semantic neighbors are used as candidate tags. The candidate tags are expanded by mining tags using modified association rules where each semantic neighbor is considered a transaction. In modified association rules, the probability of each tag is calculated using TF-IDF and confidence value. The experimentation is done on Flickr, NUS-WIDE, and Corel-5k datasets. The result obtained using the proposed method gives better performance as compared to the existing tag recommendation methods

    VSE-ens: Visual-Semantic Embeddings with Efficient Negative Sampling

    Get PDF
    Jointing visual-semantic embeddings (VSE) have become a research hotpot for the task of image annotation, which suffers from the issue of semantic gap, i.e., the gap between images' visual features (low-level) and labels' semantic features (high-level). This issue will be even more challenging if visual features cannot be retrieved from images, that is, when images are only denoted by numerical IDs as given in some real datasets. The typical way of existing VSE methods is to perform a uniform sampling method for negative examples that violate the ranking order against positive examples, which requires a time-consuming search in the whole label space. In this paper, we propose a fast adaptive negative sampler that can work well in the settings of no figure pixels available. Our sampling strategy is to choose the negative examples that are most likely to meet the requirements of violation according to the latent factors of images. In this way, our approach can linearly scale up to large datasets. The experiments demonstrate that our approach converges 5.02x faster than the state-of-the-art approaches on OpenImages, 2.5x on IAPR-TCI2 and 2.06x on NUS-WIDE datasets, as well as better ranking accuracy across datasets.Comment: Published by The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18

    AN OBJECT-BASED MULTIMEDIA FORENSIC ANALYSIS TOOL

    Get PDF
    With the enormous increase in the use and volume of photographs and videos, multimedia-based digital evidence now plays an increasingly fundamental role in criminal investigations. However, with the increase, it is becoming time-consuming and costly for investigators to analyse content manually. Within the research community, focus on multimedia content has tended to be on highly specialised scenarios such as tattoo identification, number plate recognition, and child exploitation. An investigator’s ability to search multimedia data based on keywords (an approach that already exists within forensic tools for character-based evidence) could provide a simple and effective approach for identifying relevant imagery. This thesis proposes and demonstrates the value of using a multi-algorithmic approach via fusion to achieve the best image annotation performance. The results show that from existing systems, the highest average recall was achieved by Imagga with 53% while the proposed multi-algorithmic system achieved 77% across the select datasets. Subsequently, a novel Object-based Multimedia Forensic Analysis Tool (OM-FAT) architecture was proposed. The OM-FAT automates the identification and extraction of annotation-based evidence from multimedia content. Besides making multimedia data searchable, the OM-FAT system enables investigators to perform various forensic analyses (search using annotations, metadata, object matching, text similarity and geo-tracking) to help investigators understand the relationship between artefacts, thus reducing the time taken to perform an investigation and the investigator’s cognitive load. It will enable investigators to ask higher-level and more abstract questions of the data, then find answers to the essential questions in the investigation: what, who, why, how, when, and where. The research includes a detailed illustration of the architectural requirements, engines, and complete design of the system workflow, which represents a full case management system. To highlight the ease of use and demonstrate the system’s ability to correlate between multimedia, a prototype was developed. The prototype integrates the functionalities of the OM-FAT tool and demonstrates how the system would help digital investigators find pieces of evidence among a large number of images starting from the acquisition stage and ending in the reporting stage with less effort and in less time.The Higher Committee for Education Development in Iraq (HCED
    corecore