3 research outputs found
Global Adaptive Filtering Layer for Computer Vision
We devise a universal adaptive neural layer to "learn" optimal frequency
filter for each image together with the weights of the base neural network that
performs some computer vision task. The proposed approach takes the source
image in the spatial domain, automatically selects the best frequencies from
the frequency domain, and transmits the inverse-transform image to the main
neural network. Remarkably, such a simple add-on layer dramatically improves
the performance of the main network regardless of its design. We observe that
the light networks gain a noticeable boost in the performance metrics; whereas,
the training of the heavy ones converges faster when our adaptive layer is
allowed to "learn" alongside the main architecture. We validate the idea in
four classical computer vision tasks: classification, segmentation, denoising,
and erasing, considering popular natural and medical data benchmarks.Comment: 28 pages, 25 figures (main article and supplementary material). V.S.
and I.B contributed equally, D.V.D is Corresponding autho
BRUL\`E: Barycenter-Regularized Unsupervised Landmark Extraction
Unsupervised retrieval of image features is vital for many computer vision
tasks where the annotation is missing or scarce. In this work, we propose a new
unsupervised approach to detect the landmarks in images, validating it on the
popular task of human face key-points extraction. The method is based on the
idea of auto-encoding the wanted landmarks in the latent space while discarding
the non-essential information (and effectively preserving the
interpretability). The interpretable latent space representation (the
bottleneck containing nothing but the wanted key-points) is achieved by a new
two-step regularization approach. The first regularization step evaluates
transport distance from a given set of landmarks to some average value (the
barycenter by Wasserstein distance). The second regularization step controls
deviations from the barycenter by applying random geometric deformations
synchronously to the initial image and to the encoded landmarks. We demonstrate
the effectiveness of the approach both in unsupervised and semi-supervised
training scenarios using 300-W, CelebA, and MAFL datasets. The proposed
regularization paradigm is shown to prevent overfitting, and the detection
quality is shown to improve beyond the state-of-the-art face models.Comment: 10 main pages with 6 figures and 1 Table, 14 pages total with 6
supplementary figures. I.B. and N.B. contributed equally. D.V.D. is
corresponding autho
Landmarks Augmentation with Manifold-Barycentric Oversampling
The training of Generative Adversarial Networks (GANs) requires a large
amount of data, stimulating the development of new augmentation methods to
alleviate the challenge. Oftentimes, these methods either fail to produce
enough new data or expand the dataset beyond the original manifold. In this
paper, we propose a new augmentation method that guarantees to keep the new
data within the original data manifold thanks to the optimal transport theory.
The proposed algorithm finds cliques in the nearest-neighbors graph and, at
each sampling iteration, randomly draws one clique to compute the Wasserstein
barycenter with random uniform weights. These barycenters then become the new
natural-looking elements that one could add to the dataset. We apply this
approach to the problem of landmarks detection and augment the available
annotation in both unpaired and in semi-supervised scenarios. Additionally, the
idea is validated on cardiac data for the task of medical segmentation. Our
approach reduces the overfitting and improves the quality metrics beyond the
original data outcome and beyond the result obtained with popular modern
augmentation methods.Comment: 11 pages, 4 figures, 3 tables. I.B. and N.B. contributed equally.
D.V.D. is the corresponding autho