73 research outputs found
Supervised Deep Learning for Content-Aware Image Retargeting with Fourier Convolutions
Image retargeting aims to alter the size of the image with attention to the
contents. One of the main obstacles to training deep learning models for image
retargeting is the need for a vast labeled dataset. Labeled datasets are
unavailable for training deep learning models in the image retargeting tasks.
As a result, we present a new supervised approach for training deep learning
models. We use the original images as ground truth and create inputs for the
model by resizing and cropping the original images. A second challenge is
generating different image sizes in inference time. However, regular
convolutional neural networks cannot generate images of different sizes than
the input image. To address this issue, we introduced a new method for
supervised learning. In our approach, a mask is generated to show the desired
size and location of the object. Then the mask and the input image are fed to
the network. Comparing image retargeting methods and our proposed method
demonstrates the model's ability to produce high-quality retargeted images.
Afterward, we compute the image quality assessment score for each output image
based on different techniques and illustrate the effectiveness of our approach.Comment: 18 pages, 5 figure
An Abstraction Model for Semantic Segmentation Algorithms
Semantic segmentation is a process of classifying each pixel in the image.
Due to its advantages, sematic segmentation is used in many tasks such as
cancer detection, robot-assisted surgery, satellite image analysis,
self-driving car control, etc. In this process, accuracy and efficiency are the
two crucial goals for this purpose, and there are several state of the art
neural networks. In each method, by employing different techniques, new
solutions have been presented for increasing efficiency, accuracy, and reducing
the costs. The diversity of the implemented approaches for semantic
segmentation makes it difficult for researches to achieve a comprehensive view
of the field. To offer a comprehensive view, in this paper, an abstraction
model for the task of semantic segmentation is offered. The proposed framework
consists of four general blocks that cover the majority of majority of methods
that have been proposed for semantic segmentation. We also compare different
approaches and consider the importance of each part in the overall performance
of a method.Comment: 6 pages 2 figure
Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement
We present a novel spatiotemporal saliency detection method to estimate salient regions in videos based on the gradient flow field and energy optimization. The proposed gradient flow field incorporates two distinctive features: 1) intra-frame boundary information and 2) inter-frame motion information together for indicating the salient regions. Based on the effective utilization of both intra-frame and inter-frame information in the gradient flow field, our algorithm is robust enough to estimate the object and background in complex scenes with various motion patterns and appearances. Then, we introduce local as well as global contrast saliency measures using the foreground and background information estimated from the gradient flow field. These enhanced contrast saliency cues uniformly highlight an entire object. We further propose a new energy function to encourage the spatiotemporal consistency of the output saliency maps, which is seldom explored in previous video saliency methods. The experimental results show that the proposed algorithm outperforms state-of-the-art video saliency detection methods
Shapes and Context: In-the-Wild Image Synthesis & Manipulation
We introduce a data-driven approach for interactively synthesizing
in-the-wild images from semantic label maps. Our approach is dramatically
different from recent work in this space, in that we make use of no learning.
Instead, our approach uses simple but classic tools for matching scene context,
shapes, and parts to a stored library of exemplars. Though simple, this
approach has several notable advantages over recent work: (1) because nothing
is learned, it is not limited to specific training data distributions (such as
cityscapes, facades, or faces); (2) it can synthesize arbitrarily
high-resolution images, limited only by the resolution of the exemplar library;
(3) by appropriately composing shapes and parts, it can generate an
exponentially large set of viable candidate output images (that can say, be
interactively searched by a user). We present results on the diverse COCO
dataset, significantly outperforming learning-based approaches on standard
image synthesis metrics. Finally, we explore user-interaction and
user-controllability, demonstrating that our system can be used as a platform
for user-driven content creation.Comment: Project Page: http://www.cs.cmu.edu/~aayushb/OpenShapes
- …