44 research outputs found
Supervised Deep Learning for Content-Aware Image Retargeting with Fourier Convolutions
Image retargeting aims to alter the size of the image with attention to the
contents. One of the main obstacles to training deep learning models for image
retargeting is the need for a vast labeled dataset. Labeled datasets are
unavailable for training deep learning models in the image retargeting tasks.
As a result, we present a new supervised approach for training deep learning
models. We use the original images as ground truth and create inputs for the
model by resizing and cropping the original images. A second challenge is
generating different image sizes in inference time. However, regular
convolutional neural networks cannot generate images of different sizes than
the input image. To address this issue, we introduced a new method for
supervised learning. In our approach, a mask is generated to show the desired
size and location of the object. Then the mask and the input image are fed to
the network. Comparing image retargeting methods and our proposed method
demonstrates the model's ability to produce high-quality retargeted images.
Afterward, we compute the image quality assessment score for each output image
based on different techniques and illustrate the effectiveness of our approach.Comment: 18 pages, 5 figure
Saliency-aware Stereoscopic Video Retargeting
Stereo video retargeting aims to resize an image to a desired aspect ratio.
The quality of retargeted videos can be significantly impacted by the stereo
videos spatial, temporal, and disparity coherence, all of which can be impacted
by the retargeting process. Due to the lack of a publicly accessible annotated
dataset, there is little research on deep learning-based methods for stereo
video retargeting. This paper proposes an unsupervised deep learning-based
stereo video retargeting network. Our model first detects the salient objects
and shifts and warps all objects such that it minimizes the distortion of the
salient parts of the stereo frames. We use 1D convolution for shifting the
salient objects and design a stereo video Transformer to assist the retargeting
process. To train the network, we use the parallax attention mechanism to fuse
the left and right views and feed the retargeted frames to a reconstruction
module that reverses the retargeted frames to the input frames. Therefore, the
network is trained in an unsupervised manner. Extensive qualitative and
quantitative experiments and ablation studies on KITTI stereo 2012 and 2015
datasets demonstrate the efficiency of the proposed method over the existing
state-of-the-art methods. The code is available at
https://github.com/z65451/SVR/.Comment: 8 pages excluding references. CVPRW conferenc
Preserving Trustworthiness and Confidentiality for Online Multimedia
Technology advancements in areas of mobile computing, social networks, and cloud computing have rapidly changed the way we communicate and interact. The wide adoption of media-oriented mobile devices such as smartphones and tablets enables people to capture information in various media formats, and offers them a rich platform for media consumption. The proliferation of online services and social networks makes it possible to store personal multimedia collection online and share them with family and friends anytime anywhere. Considering the increasing impact of digital multimedia and the trend of cloud computing, this dissertation explores the problem of how to evaluate trustworthiness and preserve confidentiality of online multimedia data.
The dissertation consists of two parts. The first part examines the problem of evaluating trustworthiness of multimedia data distributed online. Given the digital nature of multimedia data, editing and tampering of the multimedia content becomes very easy. Therefore, it is important to analyze and reveal the processing history of a multimedia document in order to evaluate its trustworthiness. We propose a new forensic technique called ``Forensic Hash", which draws synergy between two related research areas of image hashing and non-reference multimedia forensics. A forensic hash is a compact signature capturing important information from the original multimedia document to assist forensic analysis and reveal processing history of a multimedia document under question. Our proposed technique is shown to have the advantage of being compact and offering efficient and accurate analysis to forensic questions that cannot be easily answered by convention forensic techniques. The answers that we obtain from the forensic hash provide valuable information on the trustworthiness of online multimedia data.
The second part of this dissertation addresses the confidentiality issue of multimedia data stored with online services. The emerging cloud computing paradigm makes it attractive to store private multimedia data online for easy access and sharing. However, the potential of cloud services cannot be fully reached unless the issue of how to preserve confidentiality of sensitive data stored in the cloud is addressed. In this dissertation, we explore techniques that enable confidentiality-preserving search of encrypted multimedia, which can play a critical role in secure online multimedia services. Techniques from image processing, information retrieval, and cryptography are jointly and strategically applied to allow efficient rank-ordered search over encrypted multimedia database and at the same time preserve data confidentiality against malicious intruders and service providers. We demonstrate high efficiency and accuracy of the proposed techniques and provide a quantitative comparative study with conventional techniques based on heavy-weight cryptography primitives
Perceptually Guided Photo Retargeting
We propose perceptually guided photo retargeting, which shrinks a photo by simulating a human's process of sequentially perceiving visually/semantically important regions in a photo. In particular, we first project the local features (graphlets in this paper) onto a semantic space, wherein visual cues such as global spatial layout and rough geometric context are exploited. Thereafter, a sparsity-constrained learning algorithm is derived to select semantically representative graphlets of a photo, and the selecting process can be interpreted by a path which simulates how a human actively perceives semantics in a photo. Furthermore, we learn the prior distribution of such active graphlet paths (AGPs) from training photos that are marked as esthetically pleasing by multiple users. The learned priors enforce the corresponding AGP of a retargeted photo to be maximally similar to those from the training photos. On top of the retargeting model, we further design an online learning scheme to incrementally update the model with new photos that are esthetically pleasing. The online update module makes the algorithm less dependent on the number and contents of the initial training data. Experimental results show that: 1) the proposed AGP is over 90% consistent with human gaze shifting path, as verified by the eye-tracking data, and 2) the retargeting algorithm outperforms its competitors significantly, as AGP is more indicative of photo esthetics than conventional saliency maps
Spatiotemporal Saliency Detection: State of Art
Saliency detection has become a very prominent subject for research in recent time. Many techniques has been defined for the saliency detection.In this paper number of techniques has been explained that include the saliency detection from the year 2000 to 2015, almost every technique has been included.all the methods are explained briefly including their advantages and disadvantages. Comparison between various techniques has been done. With the help of table which includes authors name,paper name,year,techniques,algorithms and challenges. A comparison between levels of acceptance rates and accuracy levels are made
Analysis of Disparity Maps for Detecting Saliency in Stereoscopic Video
We present a system for automatically detecting salient image regions in stereoscopic videos. This report extends our previous system and provides additional details about its implementation. Our proposed algorithm considers information based on three dimensions: salient colors in individual frames, salient information derived from camera and object motion, and depth saliency. These three components are dynamically combined into one final saliency map based on the reliability of the individual saliency detectors. Such a combination allows using more efficient algorithms even if the quality of one detector degrades. For example, we use a computationally efficient stereo correspondence algorithm that might cause noisy disparity maps for certain scenarios. In this case, however, a more reliable saliency detection algorithm such as the image saliency is preferred. To evaluate the quality of the saliency detection, we created modified versions of stereoscopic videos with the non-salient regions blurred. Having users rate the quality of these videos, the results show that most users do not detect the blurred regions and that the automatic saliency detection is very reliable
FUZZY KERNEL REGRESSION FOR REGISTRATION AND OTHER IMAGE WARPING APPLICATIONS
In this dissertation a new approach for non-rigid medical im-
age registration is presented. It relies onto a probabilistic framework
based on the novel concept of Fuzzy Kernel Regression. The theoric
framework, after a formal introduction is applied to develop several
complete registration systems, two of them are interactive and one
is fully automatic. They all use the composition of local deforma-
tions to achieve the final alignment. Automatic one is based onto the
maximization of mutual information to produce local affine aligments
which are merged into the global transformation. Mutual Information
maximization procedure uses gradient descent method. Due to the
huge amount of data associated to medical images, a multi-resolution
topology is embodied, reducing processing time. The distance based
interpolation scheme injected facilitates the similairity measure op-
timization by attenuating the presence of local maxima in the func-
tional. System blocks are implemented on GPGPUs allowing efficient
parallel computation of large 3d datasets using SIMT execution. Due
to the flexibility of Mutual Information, it can be applied to multi-
modality image scans (MRI, CT, PET, etc.).
Both quantitative and qualitative experiments show promising results
and great potential for future extension.
Finally the framework flexibility is shown by means of its succesful
application to the image retargeting issue, methods and results are
presented