441 research outputs found
PointWise: An Unsupervised Point-wise Feature Learning Network
We present a novel approach to learning a point-wise, meaningful embedding
for point-clouds in an unsupervised manner, through the use of neural-networks.
The domain of point-cloud processing via neural-networks is rapidly evolving,
with novel architectures and applications frequently emerging. Within this
field of research, the availability and plethora of unlabeled point-clouds as
well as their possible applications make finding ways of characterizing this
type of data appealing. Though significant advancement was achieved in the
realm of unsupervised learning, its adaptation to the point-cloud
representation is not trivial. Previous research focuses on the embedding of
entire point-clouds representing an object in a meaningful manner. We present a
deep learning framework to learn point-wise description from a set of shapes
without supervision. Our approach leverages self-supervision to define a
relevant loss function to learn rich per-point features. We train a
neural-network with objectives based on context derived directly from the raw
data, with no added annotation. We use local structures of point-clouds to
incorporate geometric information into each point's latent representation. In
addition to using local geometric information, we encourage adjacent points to
have similar representations and vice-versa, creating a smoother, more
descriptive representation. We demonstrate the ability of our method to capture
meaningful point-wise features through three applications. By clustering the
learned embedding space, we perform unsupervised part-segmentation on point
clouds. By calculating euclidean distance in the latent space we derive
semantic point-analogies. Finally, by retrieving nearest-neighbors in our
learned latent space we present meaningful point-correspondence within and
among point-clouds
Generative Low-Shot Network Expansion
Conventional deep learning classifiers are static in the sense that they are
trained on a predefined set of classes and learning to classify a novel class
typically requires re-training. In this work, we address the problem of
Low-Shot network expansion learning. We introduce a learning framework which
enables expanding a pre-trained (base) deep network to classify novel classes
when the number of examples for the novel classes is particularly small. We
present a simple yet powerful hard distillation method where the base network
is augmented with additional weights to classify the novel classes, while
keeping the weights of the base network unchanged. We show that since only a
small number of weights needs to be trained, the hard distillation excels in
low-shot training scenarios. Furthermore, hard distillation avoids detriment to
classification performance on the base classes. Finally, we show that low-shot
network expansion can be done with a very small memory footprint by using a
compact generative model of the base classes training data with only a
negligible degradation relative to learning with the full training set
Clusterplot: High-dimensional Cluster Visualization
We present Clusterplot, a multi-class high-dimensional data visualization
tool designed to visualize cluster-level information offering an intuitive
understanding of the cluster inter-relations. Our unique plots leverage 2D
blobs devised to convey the geometrical and topological characteristics of
clusters within the high-dimensional data, and their pairwise relations, such
that general inter-cluster behavior is easily interpretable in the plot. Class
identity supervision is utilized to drive the measuring of relations among
clusters in high-dimension, particularly, proximity and overlap, which are then
reflected spatially through the 2D blobs. We demonstrate the strength of our
clusterplots and their ability to deliver a clear and intuitive informative
exploration experience for high-dimensional clusters characterized by complex
structure and significant overlap
P2P-NET: Bidirectional Point Displacement Net for Shape Transform
We introduce P2P-NET, a general-purpose deep neural network which learns
geometric transformations between point-based shape representations from two
domains, e.g., meso-skeletons and surfaces, partial and complete scans, etc.
The architecture of the P2P-NET is that of a bi-directional point displacement
network, which transforms a source point set to a target point set with the
same cardinality, and vice versa, by applying point-wise displacement vectors
learned from data. P2P-NET is trained on paired shapes from the source and
target domains, but without relying on point-to-point correspondences between
the source and target point sets. The training loss combines two
uni-directional geometric losses, each enforcing a shape-wise similarity
between the predicted and the target point sets, and a cross-regularization
term to encourage consistency between displacement vectors going in opposite
directions. We develop and present several different applications enabled by
our general-purpose bidirectional P2P-NET to highlight the effectiveness,
versatility, and potential of our network in solving a variety of point-based
shape transformation problems.Comment: siggraph revision is done. 13 page
Face Identity Disentanglement via Latent Space Mapping
Learning disentangled representations of data is a fundamental problem in
artificial intelligence. Specifically, disentangled latent representations
allow generative models to control and compose the disentangled factors in the
synthesis process. Current methods, however, require extensive supervision and
training, or instead, noticeably compromise quality. In this paper, we present
a method that learn show to represent data in a disentangled way, with minimal
supervision, manifested solely using available pre-trained networks. Our key
insight is to decouple the processes of disentanglement and synthesis, by
employing a leading pre-trained unconditional image generator, such as
StyleGAN. By learning to map into its latent space, we leverage both its
state-of-the-art quality generative power, and its rich and expressive latent
space, without the burden of training it.We demonstrate our approach on the
complex and high dimensional domain of human heads. We evaluate our method
qualitatively and quantitatively, and exhibit its success with
de-identification operations and with temporal identity coherency in image
sequences. Through this extensive experimentation, we show that our method
successfully disentangles identity from other facial attributes, surpassing
existing methods, even though they require more training and supervision.Comment: 17 pages, 10 figure
Image Resizing by Reconstruction from Deep Features
Traditional image resizing methods usually work in pixel space and use
various saliency measures. The challenge is to adjust the image shape while
trying to preserve important content. In this paper we perform image resizing
in feature space where the deep layers of a neural network contain rich
important semantic information. We directly adjust the image feature maps,
extracted from a pre-trained classification network, and reconstruct the
resized image using a neural-network based optimization. This novel approach
leverages the hierarchical encoding of the network, and in particular, the
high-level discriminative power of its deeper layers, that recognizes semantic
objects and regions and allows maintaining their aspect ratio. Our use of
reconstruction from deep features diminishes the artifacts introduced by
image-space resizing operators. We evaluate our method on benchmarks, compare
to alternative approaches, and demonstrate its strength on challenging images.Comment: 13 pages, 21 figure
Implicit Pairs for Boosting Unpaired Image-to-Image Translation
In image-to-image translation the goal is to learn a mapping from one image
domain to another. In the case of supervised approaches the mapping is learned
from paired samples. However, collecting large sets of image pairs is often
either prohibitively expensive or not possible. As a result, in recent years
more attention has been given to techniques that learn the mapping from
unpaired sets.
In our work, we show that injecting implicit pairs into unpaired sets
strengthens the mapping between the two domains, improves the compatibility of
their distributions, and leads to performance boosting of unsupervised
techniques by over 14% across several measurements.
The competence of the implicit pairs is further displayed with the use of
pseudo-pairs, i.e., paired samples which only approximate a real pair. We
demonstrate the effect of the approximated implicit samples on image-to-image
translation problems, where such pseudo-pairs may be synthesized in one
direction, but not in the other. We further show that pseudo-pairs are
significantly more effective as implicit pairs in an unpaired setting, than
directly using them explicitly in a paired setting
Outlier Detection for Robust Multi-dimensional Scaling
Multi-dimensional scaling (MDS) plays a central role in data-exploration,
dimensionality reduction and visualization. State-of-the-art MDS algorithms are
not robust to outliers, yielding significant errors in the embedding even when
only a handful of outliers are present. In this paper, we introduce a technique
to detect and filter outliers based on geometric reasoning. We test the
validity of triangles formed by three points, and mark a triangle as broken if
its triangle inequality does not hold. The premise of our work is that unlike
inliers, outlier distances tend to break many triangles. Our method is tested
and its performance is evaluated on various datasets and distributions of
outliers. We demonstrate that for a reasonable amount of outliers, e.g., under
, our method is effective, and leads to a high embedding quality
LOGAN: Unpaired Shape Transform in Latent Overcomplete Space
We introduce LOGAN, a deep neural network aimed at learning general-purpose
shape transforms from unpaired domains. The network is trained on two sets of
shapes, e.g., tables and chairs, while there is neither a pairing between
shapes from the domains as supervision nor any point-wise correspondence
between any shapes. Once trained, LOGAN takes a shape from one domain and
transforms it into the other. Our network consists of an autoencoder to encode
shapes from the two input domains into a common latent space, where the latent
codes concatenate multi-scale shape features, resulting in an overcomplete
representation. The translator is based on a generative adversarial network
(GAN), operating in the latent space, where an adversarial loss enforces
cross-domain translation while a feature preservation loss ensures that the
right shape features are preserved for a natural shape transform. We conduct
ablation studies to validate each of our key network designs and demonstrate
superior capabilities in unpaired shape transforms on a variety of examples
over baselines and state-of-the-art approaches. We show that LOGAN is able to
learn what shape features to preserve during shape translation, either local or
non-local, whether content or style, depending solely on the input domains for
training.Comment: Download supplementary material here ->
https://kangxue.org/papers/logan_supp.pd
Blind Visual Motif Removal from a Single Image
Many images shared over the web include overlaid objects, or visual motifs,
such as text, symbols or drawings, which add a description or decoration to the
image. For example, decorative text that specifies where the image was taken,
repeatedly appears across a variety of different images. Often, the reoccurring
visual motif, is semantically similar, yet, differs in location, style and
content (e.g. text placement, font and letters). This work proposes a deep
learning based technique for blind removal of such objects. In the blind
setting, the location and exact geometry of the motif are unknown. Our approach
simultaneously estimates which pixels contain the visual motif, and synthesizes
the underlying latent image. It is applied to a single input image, without any
user assistance in specifying the location of the motif, achieving
state-of-the-art results for blind removal of both opaque and semi-transparent
visual motifs.Comment: CVPR 201
- …