2,837 research outputs found
Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting
This paper proposes a weakly- and self-supervised deep convolutional neural
network (WSSDCNN) for content-aware image retargeting. Our network takes a
source image and a target aspect ratio, and then directly outputs a retargeted
image. Retargeting is performed through a shift map, which is a pixel-wise
mapping from the source to the target grid. Our method implicitly learns an
attention map, which leads to a content-aware shift map for image retargeting.
As a result, discriminative parts in an image are preserved, while background
regions are adjusted seamlessly. In the training phase, pairs of an image and
its image-level annotation are used to compute content and structure losses. We
demonstrate the effectiveness of our proposed method for a retargeting
application with insightful analyses.Comment: 10 pages, 11 figures. To appear in ICCV 2017, Spotlight Presentatio
One-Sided Competition in Two-Sided Social Platform Markets? An Organizational Ecology Perspective
Similar to love, competition can often be unrequited. This study explores the asymmetric pattern of competition driven by membership overlap in two-sided mobile social apps (MSAs) markets. Building on the niche-width dynamics framework, we theorize and validate the relative prevalence and survival capabilities of messaging apps and SNS apps, especially when membership overlap fosters current or potential competition between the two app categories. The analyses—based on panel dataset consisting of information on 8,483 panel members’ exact amount of time used for 21 mobile social apps—show that competition between SNS and messaging apps can be asymmetric in favor of messaging apps. This asymmetric pattern is more pronounced for membership-based competition compared to usage-based competition. In addition, different MSAs developed by same platform providers exhibit synergistic effects, rather than destructive consequences, on each other’s growth. The findings identify the complex nature of competition within-category and between-category competition in MSAs markets
Partial Sum Minimization of Singular Values in Robust PCA: Algorithm and Applications
Robust Principal Component Analysis (RPCA) via rank minimization is a
powerful tool for recovering underlying low-rank structure of clean data
corrupted with sparse noise/outliers. In many low-level vision problems, not
only it is known that the underlying structure of clean data is low-rank, but
the exact rank of clean data is also known. Yet, when applying conventional
rank minimization for those problems, the objective function is formulated in a
way that does not fully utilize a priori target rank information about the
problems. This observation motivates us to investigate whether there is a
better alternative solution when using rank minimization. In this paper,
instead of minimizing the nuclear norm, we propose to minimize the partial sum
of singular values, which implicitly encourages the target rank constraint. Our
experimental analyses show that, when the number of samples is deficient, our
approach leads to a higher success rate than conventional rank minimization,
while the solutions obtained by the two approaches are almost identical when
the number of samples is more than sufficient. We apply our approach to various
low-level vision problems, e.g. high dynamic range imaging, motion edge
detection, photometric stereo, image alignment and recovery, and show that our
results outperform those obtained by the conventional nuclear norm rank
minimization method.Comment: Accepted in Transactions on Pattern Analysis and Machine Intelligence
(TPAMI). To appea
Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages
Neural text-to-speech (TTS) models can synthesize natural human speech when
trained on large amounts of transcribed speech. However, collecting such
large-scale transcribed data is expensive. This paper proposes an unsupervised
pre-training method for a sequence-to-sequence TTS model by leveraging large
untranscribed speech data. With our pre-training, we can remarkably reduce the
amount of paired transcribed data required to train the model for the target
downstream TTS task. The main idea is to pre-train the model to reconstruct
de-warped mel-spectrograms from warped ones, which may allow the model to learn
proper temporal assignment relation between input and output sequences. In
addition, we propose a data augmentation method that further improves the data
efficiency in fine-tuning. We empirically demonstrate the effectiveness of our
proposed method in low-resource language scenarios, achieving outstanding
performance compared to competing methods. The code and audio samples are
available at: https://github.com/cnaigithub/SpeechDewarpingComment: ICASSP 202
Personalized Cinemagraphs using Semantic Understanding and Collaborative Learning
Cinemagraphs are a compelling way to convey dynamic aspects of a scene. In
these media, dynamic and still elements are juxtaposed to create an artistic
and narrative experience. Creating a high-quality, aesthetically pleasing
cinemagraph requires isolating objects in a semantically meaningful way and
then selecting good start times and looping periods for those objects to
minimize visual artifacts (such a tearing). To achieve this, we present a new
technique that uses object recognition and semantic segmentation as part of an
optimization method to automatically create cinemagraphs from videos that are
both visually appealing and semantically meaningful. Given a scene with
multiple objects, there are many cinemagraphs one could create. Our method
evaluates these multiple candidates and presents the best one, as determined by
a model trained to predict human preferences in a collaborative way. We
demonstrate the effectiveness of our approach with multiple results and a user
study.Comment: To appear in ICCV 2017. Total 17 pages including the supplementary
materia
The Devil in the Details: Simple and Effective Optical Flow Synthetic Data Generation
Recent work on dense optical flow has shown significant progress, primarily
in a supervised learning manner requiring a large amount of labeled data. Due
to the expensiveness of obtaining large scale real-world data, computer
graphics are typically leveraged for constructing datasets. However, there is a
common belief that synthetic-to-real domain gaps limit generalization to real
scenes. In this paper, we show that the required characteristics in an optical
flow dataset are rather simple and present a simpler synthetic data generation
method that achieves a certain level of realism with compositions of elementary
operations. With 2D motion-based datasets, we systematically analyze the
simplest yet critical factors for generating synthetic datasets. Furthermore,
we propose a novel method of utilizing occlusion masks in a supervised method
and observe that suppressing gradients on occluded regions serves as a powerful
initial state in the curriculum learning sense. The RAFT network initially
trained on our dataset outperforms the original RAFT on the two most
challenging online benchmarks, MPI Sintel and KITTI 2015
- …