72 research outputs found
Character Generation through Self-Supervised Vectorization
The prevalent approach in self-supervised image generation is to operate on
pixel level representations. While this approach can produce high quality
images, it cannot benefit from the simplicity and innate quality of
vectorization. Here we present a drawing agent that operates on stroke-level
representation of images. At each time step, the agent first assesses the
current canvas and decides whether to stop or keep drawing. When a 'draw'
decision is made, the agent outputs a program indicating the stroke to be
drawn. As a result, it produces a final raster image by drawing the strokes on
a canvas, using a minimal number of strokes and dynamically deciding when to
stop. We train our agent through reinforcement learning on MNIST and Omniglot
datasets for unconditional generation and parsing (reconstruction) tasks. We
utilize our parsing agent for exemplar generation and type conditioned concept
generation in Omniglot challenge without any further training. We present
successful results on all three generation tasks and the parsing task.
Crucially, we do not need any stroke-level or vector supervision; we only use
raster images for training
Cosine Similarity Measure According to a Convex Cost Function
In this paper, we describe a new vector similarity measure associated with a
convex cost function. Given two vectors, we determine the surface normals of
the convex function at the vectors. The angle between the two surface normals
is the similarity measure. Convex cost function can be the negative entropy
function, total variation (TV) function and filtered variation function. The
convex cost function need not be differentiable everywhere. In general, we need
to compute the gradient of the cost function to compute the surface normals. If
the gradient does not exist at a given vector, it is possible to use the
subgradients and the normal producing the smallest angle between the two
vectors is used to compute the similarity measure
Self-Supervised Learning of 3D Human Pose using Multi-view Geometry
Training accurate 3D human pose estimators requires large amount of 3D
ground-truth data which is costly to collect. Various weakly or self supervised
pose estimation methods have been proposed due to lack of 3D data.
Nevertheless, these methods, in addition to 2D ground-truth poses, require
either additional supervision in various forms (e.g. unpaired 3D ground truth
data, a small subset of labels) or the camera parameters in multiview settings.
To address these problems, we present EpipolarPose, a self-supervised learning
method for 3D human pose estimation, which does not need any 3D ground-truth
data or camera extrinsics. During training, EpipolarPose estimates 2D poses
from multi-view images, and then, utilizes epipolar geometry to obtain a 3D
pose and camera geometry which are subsequently used to train a 3D pose
estimator. We demonstrate the effectiveness of our approach on standard
benchmark datasets i.e. Human3.6M and MPI-INF-3DHP where we set the new
state-of-the-art among weakly/self-supervised methods. Furthermore, we propose
a new performance measure Pose Structure Score (PSS) which is a scale
invariant, structure aware measure to evaluate the structural plausibility of a
pose with respect to its ground truth. Code and pretrained models are available
at https://github.com/mkocabas/EpipolarPoseComment: CVPR 2019 camera ready. Code is available at
https://github.com/mkocabas/EpipolarPos
Representation Recycling for Streaming Video Analysis
We present StreamDEQ, a method that aims to infer frame-wise representations
on videos with minimal per-frame computation. Conventional deep networks do
feature extraction from scratch at each frame in the absence of ad-hoc
solutions. We instead aim to build streaming recognition models that can
natively exploit temporal smoothness between consecutive video frames. We
observe that the recently emerging implicit layer models provide a convenient
foundation to construct such models, as they define representations as the
fixed-points of shallow networks, which need to be estimated using iterative
methods. Our main insight is to distribute the inference iterations over the
temporal axis by using the most recent representation as a starting point at
each frame. This scheme effectively recycles the recent inference computations
and greatly reduces the needed processing time. Through extensive experimental
analysis, we show that StreamDEQ is able to recover near-optimal
representations in a few frames' time and maintain an up-to-date representation
throughout the video duration. Our experiments on video semantic segmentation,
video object detection, and human pose estimation in videos show that StreamDEQ
achieves on-par accuracy with the baseline while being more than 2-4x faster.Comment: v3: ECCV2022 paper. This version: extended version under review at
TPAM
Approximate Computation of DFT without Performing Any Multiplications: Applications to Radar Signal Processing
In many practical problems it is not necessary to compute the DFT in a
perfect manner including some radar problems. In this article a new
multiplication free algorithm for approximate computation of the DFT is
introduced. All multiplications in DFT are replaced by an
operator which computes . The new transform is
especially useful when the signal processing algorithm requires correlations.
Ambiguity function in radar signal processing requires high number of
multiplications to compute the correlations. This new additive operator is used
to decrease the number of multiplications. Simulation examples involving
passive radars are presented
Improving Sketch Colorization using Adversarial Segmentation Consistency
We propose a new method for producing color images from sketches. Current
solutions in sketch colorization either necessitate additional user instruction
or are restricted to the "paired" translation strategy. We leverage semantic
image segmentation from a general-purpose panoptic segmentation network to
generate an additional adversarial loss function. The proposed loss function is
compatible with any GAN model. Our method is not restricted to datasets with
segmentation labels and can be applied to unpaired translation tasks as well.
Using qualitative, and quantitative analysis, and based on a user study, we
demonstrate the efficacy of our method on four distinct image datasets. On the
FID metric, our model improves the baseline by up to 35 points. Our code,
pretrained models, scripts to produce newly introduced datasets and
corresponding sketch images are available at
https://github.com/giddyyupp/AdvSegLoss.Comment: Under review at Pattern Recognition Letters. arXiv admin note:
substantial text overlap with arXiv:2102.0619
- …