11 research outputs found
Dynamic Deep Multi-modal Fusion for Image Privacy Prediction
With millions of images that are shared online on social networking sites,
effective methods for image privacy prediction are highly needed. In this
paper, we propose an approach for fusing object, scene context, and image tags
modalities derived from convolutional neural networks for accurately predicting
the privacy of images shared online. Specifically, our approach identifies the
set of most competent modalities on the fly, according to each new target image
whose privacy has to be predicted. The approach considers three stages to
predict the privacy of a target image, wherein we first identify the
neighborhood images that are visually similar and/or have similar sensitive
content as the target image. Then, we estimate the competence of the modalities
based on the neighborhood images. Finally, we fuse the decisions of the most
competent modalities and predict the privacy label for the target image.
Experimental results show that our approach predicts the sensitive (or private)
content more accurately than the models trained on individual modalities
(object, scene, and tags) and prior privacy prediction works. Also, our
approach outperforms strong baselines, that train meta-classifiers to obtain an
optimal combination of modalities.Comment: Accepted by The Web Conference (WWW) 201
Identifying Private Content for Online Image Sharing
I present the outline of my dissertation work, Identifying Private Content for Online Image Sharing. Particularly, in my dissertation, I explore learning models to predict appropriate binary privacy settings (i.e., private, public) for images, before they are shared online. Specifically, I investigate textual features (user-annotated tags and automatically derived tags), and visual semantic features that are transferred from various layers of deep Convolutional Neural Network (CNN). Experimental results show that the learning models based on the proposed features outperform strong baseline models for this task on the Flickr dataset of thousands of images
Identifying private content for online image sharing
Doctor of PhilosophyDepartment of Computer ScienceCornelia CarageaDoina CarageaImages today are increasingly shared online on social networking sites such as Facebook, Flickr, Foursquare, and Instagram. Image sharing occurs not only within a group of friends but also more and more outside a user's social circles for purposes of social discovery. Despite that current social networking sites allow users to change their privacy preferences, this is often a cumbersome task for the vast majority of users on the Web, who face difficulties in assigning and managing privacy settings. When these privacy settings are used inappropriately, online image sharing can potentially lead to unwanted disclosures and privacy violations. Thus, automatically predicting images' privacy to warn users about private or sensitive content before uploading these images on social networking sites has become a necessity in our current interconnected world.
In this dissertation, we first, explore learning models to automatically predict appropriate images' privacy as private or public using carefully identified image-specific features. We study deep visual semantic features that are derived from various layers of Convolutional Neural Networks (CNNs) as well as textual features such as user tags and deep tags generated from deep CNNs. Particularly, we extract deep (visual and tag) features from four pre-trained CNN architectures for object recognition, i.e., AlexNet, GoogLeNet, VGG-16, and ResNet, and compare their performance for image privacy prediction. Results of our experiments on a Flickr dataset of over thirty thousand images show that the learning models trained on features extracted from ResNet outperform the state-of-the-art models for image privacy prediction. We further investigate the combination of user tags and deep tags derived from CNN architectures using two settings: (1) SVM on the bag-of-tags features; and (2) text-based CNN. We compare these models with the models trained on ResNet visual features obtained for privacy prediction.
Further, we present a privacy-aware approach to automatic image tagging, which aims at improving the quality of user annotations, while also preserving the images' original privacy sharing patterns. Experimental results show that, although the user-input tags comprise
noise, our privacy-aware approach is able to predict accurate tags that can improve the performance of a downstream application on image privacy prediction, and outperforms an existing privacy-oblivious approach to image tagging. Crowd-sourcing the predicted tags exhibits the quality of our privacy-aware recommended tags. Finally, we propose an approach for fusing object, scene context, and image tags modalities
derived from convolutional neural networks for accurately predicting the privacy of images shared online. Specifically, our approach identifies the set of most competent modalities on the fly, according to each new target image whose privacy has to be predicted. Experimental results show that our approach predicts the sensitive (or private) content more
accurately than the models trained on individual modalities (object, scene, and tags) and prior privacy prediction works. Additionally, our approach outperforms the state-of-the-art baselines that also yield combinations of modalities
Recommended from our members
Object Recognition Using Scale-Invariant Chordiogram
This thesis describes an approach for object recognition using the chordiogram shape-based descriptor. Global shape representations are highly susceptible to clutter generated due to the background or other irrelevant objects in real-world images. To overcome the problem, we aim to extract precise object shape using superpixel segmentation, perceptual grouping, and connected components. The employed shape descriptor chordiogram is based on geometric relationships of chords generated from the pairs of boundary points of an object. The chordiogram descriptor applies holistic properties of the shape and also proven suitable for object detection and digit recognition mechanisms. Additionally, it is translation invariant and robust to shape deformations. In spite of such excellent properties, chordiogram is not scale-invariant. To this end, we propose scale invariant chordiogram descriptors and intend to achieve a similar performance before and after applying scale invariance. Our experiments show that we achieve similar performance with and without scale invariance for silhouettes and real world object images. We also show experiments at different scales to confirm that we obtain scale invariance for chordiogram
Image Privacy Prediction Using Deep Features
Online image sharing in social media sites such as Facebook, Flickr, and Instagram can lead to unwanted disclosure and privacy violations, when privacy settings are used inappropriately. With the exponential increase in the number of images that are shared online, the development of effective and efficient prediction methods for image privacy settings are highly needed. In this study, we explore deep visual features and deep image tags for image privacy prediction. The results of our experiments show that models trained on deep visual features outperform those trained on SIFT and GIST. The results also show that deep image tags combined with user tags perform best among all tested features
Dynamically Identifying Deep Multimodal Features for Image Privacy Prediction
With millions of images shared online, privacy concerns are on the rise. In this paper, we propose an approach to image privacy prediction by dynamically identifying powerful features corresponding to objects, scene context, and image tags derived from Convolutional Neural Networks for each test image. Specifically, our approach identifies the set of most “competent” features on the fly, according to each test image whose privacy has to be predicted. Experimental results on thousands of Flickr images show that our approach predicts the sensitive (or private) content more accurately than the models trained on each individual feature set (object, scene, and tags alone) or their combination