70 research outputs found
Character Generation through Self-Supervised Vectorization
The prevalent approach in self-supervised image generation is to operate on
pixel level representations. While this approach can produce high quality
images, it cannot benefit from the simplicity and innate quality of
vectorization. Here we present a drawing agent that operates on stroke-level
representation of images. At each time step, the agent first assesses the
current canvas and decides whether to stop or keep drawing. When a 'draw'
decision is made, the agent outputs a program indicating the stroke to be
drawn. As a result, it produces a final raster image by drawing the strokes on
a canvas, using a minimal number of strokes and dynamically deciding when to
stop. We train our agent through reinforcement learning on MNIST and Omniglot
datasets for unconditional generation and parsing (reconstruction) tasks. We
utilize our parsing agent for exemplar generation and type conditioned concept
generation in Omniglot challenge without any further training. We present
successful results on all three generation tasks and the parsing task.
Crucially, we do not need any stroke-level or vector supervision; we only use
raster images for training
Cosine Similarity Measure According to a Convex Cost Function
In this paper, we describe a new vector similarity measure associated with a
convex cost function. Given two vectors, we determine the surface normals of
the convex function at the vectors. The angle between the two surface normals
is the similarity measure. Convex cost function can be the negative entropy
function, total variation (TV) function and filtered variation function. The
convex cost function need not be differentiable everywhere. In general, we need
to compute the gradient of the cost function to compute the surface normals. If
the gradient does not exist at a given vector, it is possible to use the
subgradients and the normal producing the smallest angle between the two
vectors is used to compute the similarity measure
Self-Supervised Learning of 3D Human Pose using Multi-view Geometry
Training accurate 3D human pose estimators requires large amount of 3D
ground-truth data which is costly to collect. Various weakly or self supervised
pose estimation methods have been proposed due to lack of 3D data.
Nevertheless, these methods, in addition to 2D ground-truth poses, require
either additional supervision in various forms (e.g. unpaired 3D ground truth
data, a small subset of labels) or the camera parameters in multiview settings.
To address these problems, we present EpipolarPose, a self-supervised learning
method for 3D human pose estimation, which does not need any 3D ground-truth
data or camera extrinsics. During training, EpipolarPose estimates 2D poses
from multi-view images, and then, utilizes epipolar geometry to obtain a 3D
pose and camera geometry which are subsequently used to train a 3D pose
estimator. We demonstrate the effectiveness of our approach on standard
benchmark datasets i.e. Human3.6M and MPI-INF-3DHP where we set the new
state-of-the-art among weakly/self-supervised methods. Furthermore, we propose
a new performance measure Pose Structure Score (PSS) which is a scale
invariant, structure aware measure to evaluate the structural plausibility of a
pose with respect to its ground truth. Code and pretrained models are available
at https://github.com/mkocabas/EpipolarPoseComment: CVPR 2019 camera ready. Code is available at
https://github.com/mkocabas/EpipolarPos
Approximate Computation of DFT without Performing Any Multiplications: Applications to Radar Signal Processing
In many practical problems it is not necessary to compute the DFT in a
perfect manner including some radar problems. In this article a new
multiplication free algorithm for approximate computation of the DFT is
introduced. All multiplications in DFT are replaced by an
operator which computes . The new transform is
especially useful when the signal processing algorithm requires correlations.
Ambiguity function in radar signal processing requires high number of
multiplications to compute the correlations. This new additive operator is used
to decrease the number of multiplications. Simulation examples involving
passive radars are presented
Improving Sketch Colorization using Adversarial Segmentation Consistency
We propose a new method for producing color images from sketches. Current
solutions in sketch colorization either necessitate additional user instruction
or are restricted to the "paired" translation strategy. We leverage semantic
image segmentation from a general-purpose panoptic segmentation network to
generate an additional adversarial loss function. The proposed loss function is
compatible with any GAN model. Our method is not restricted to datasets with
segmentation labels and can be applied to unpaired translation tasks as well.
Using qualitative, and quantitative analysis, and based on a user study, we
demonstrate the efficacy of our method on four distinct image datasets. On the
FID metric, our model improves the baseline by up to 35 points. Our code,
pretrained models, scripts to produce newly introduced datasets and
corresponding sketch images are available at
https://github.com/giddyyupp/AdvSegLoss.Comment: Under review at Pattern Recognition Letters. arXiv admin note:
substantial text overlap with arXiv:2102.0619
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection
We propose average Localisation-Recall-Precision (aLRP), a unified, bounded,
balanced and ranking-based loss function for both classification and
localisation tasks in object detection. aLRP extends the
Localisation-Recall-Precision (LRP) performance metric (Oksuz et al., 2018)
inspired from how Average Precision (AP) Loss extends precision to a
ranking-based loss function for classification (Chen et al., 2020). aLRP has
the following distinct advantages: (i) aLRP is the first ranking-based loss
function for both classification and localisation tasks. (ii) Thanks to using
ranking for both tasks, aLRP naturally enforces high-quality localisation for
high-precision classification. (iii) aLRP provides provable balance between
positives and negatives. (iv) Compared to on average 6 hyperparameters in
the loss functions of state-of-the-art detectors, aLRP Loss has only one
hyperparameter, which we did not tune in practice. On the COCO dataset, aLRP
Loss improves its ranking-based predecessor, AP Loss, up to around AP
points, achieves AP without test time augmentation and outperforms all
one-stage detectors. Code available at: https://github.com/kemaloksuz/aLRPLoss .Comment: NeurIPS 2020 spotlight pape
- …