701 research outputs found
Laplacian-Steered Neural Style Transfer
Neural Style Transfer based on Convolutional Neural Networks (CNN) aims to
synthesize a new image that retains the high-level structure of a content
image, rendered in the low-level texture of a style image. This is achieved by
constraining the new image to have high-level CNN features similar to the
content image, and lower-level CNN features similar to the style image. However
in the traditional optimization objective, low-level features of the content
image are absent, and the low-level features of the style image dominate the
low-level detail structures of the new image. Hence in the synthesized image,
many details of the content image are lost, and a lot of inconsistent and
unpleasing artifacts appear. As a remedy, we propose to steer image synthesis
with a novel loss function: the Laplacian loss. The Laplacian matrix
("Laplacian" in short), produced by a Laplacian operator, is widely used in
computer vision to detect edges and contours. The Laplacian loss measures the
difference of the Laplacians, and correspondingly the difference of the detail
structures, between the content image and a new image. It is flexible and
compatible with the traditional style transfer constraints. By incorporating
the Laplacian loss, we obtain a new optimization objective for neural style
transfer named Lapstyle. Minimizing this objective will produce a stylized
image that better preserves the detail structures of the content image and
eliminates the artifacts. Experiments show that Lapstyle produces more
appealing stylized images with less artifacts, without compromising their
"stylishness".Comment: Accepted by the ACM Multimedia Conference (MM) 2017. 9 pages, 65
figure
Segmental Spatiotemporal CNNs for Fine-grained Action Segmentation
Joint segmentation and classification of fine-grained actions is important
for applications of human-robot interaction, video surveillance, and human
skill evaluation. However, despite substantial recent progress in large-scale
action classification, the performance of state-of-the-art fine-grained action
recognition approaches remains low. We propose a model for action segmentation
which combines low-level spatiotemporal features with a high-level segmental
classifier. Our spatiotemporal CNN is comprised of a spatial component that
uses convolutional filters to capture information about objects and their
relationships, and a temporal component that uses large 1D convolutional
filters to capture information about how object relationships change across
time. These features are used in tandem with a semi-Markov model that models
transitions from one action to another. We introduce an efficient constrained
segmental inference algorithm for this model that is orders of magnitude faster
than the current approach. We highlight the effectiveness of our Segmental
Spatiotemporal CNN on cooking and surgical action datasets for which we observe
substantially improved performance relative to recent baseline methods.Comment: Updated from the ECCV 2016 version. We fixed an important
mathematical error and made the section on segmental inference cleare
- …