425 research outputs found
Deep Learning for Semantic Part Segmentation with High-Level Guidance
In this work we address the task of segmenting an object into its parts, or
semantic part segmentation. We start by adapting a state-of-the-art semantic
segmentation system to this task, and show that a combination of a
fully-convolutional Deep CNN system coupled with Dense CRF labelling provides
excellent results for a broad range of object categories. Still, this approach
remains agnostic to high-level constraints between object parts. We introduce
such prior information by means of the Restricted Boltzmann Machine, adapted to
our task and train our model in an discriminative fashion, as a hidden CRF,
demonstrating that prior information can yield additional improvements. We also
investigate the performance of our approach ``in the wild'', without
information concerning the objects' bounding boxes, using an object detector to
guide a multi-scale segmentation scheme. We evaluate the performance of our
approach on the Penn-Fudan and LFW datasets for the tasks of pedestrian parsing
and face labelling respectively. We show superior performance with respect to
competitive methods that have been extensively engineered on these benchmarks,
as well as realistic qualitative results on part segmentation, even for
occluded or deformable objects. We also provide quantitative and extensive
qualitative results on three classes from the PASCAL Parts dataset. Finally, we
show that our multi-scale segmentation scheme can boost accuracy, recovering
segmentations for finer parts.Comment: 11 pages (including references), 3 figures, 2 table
Deep Learning for Vanishing Point Detection Using an Inverse Gnomonic Projection
We present a novel approach for vanishing point detection from uncalibrated
monocular images. In contrast to state-of-the-art, we make no a priori
assumptions about the observed scene. Our method is based on a convolutional
neural network (CNN) which does not use natural images, but a Gaussian sphere
representation arising from an inverse gnomonic projection of lines detected in
an image. This allows us to rely on synthetic data for training, eliminating
the need for labelled images. Our method achieves competitive performance on
three horizon estimation benchmark datasets. We further highlight some
additional use cases for which our vanishing point detection algorithm can be
used.Comment: Accepted for publication at German Conference on Pattern Recognition
(GCPR) 2017. This research was supported by German Research Foundation DFG
within Priority Research Programme 1894 "Volunteered Geographic Information:
Interpretation, Visualisation and Social Computing
Deep Filter Banks for Texture Recognition, Description, and Segmentation
Visual textures have played a key role in image understanding because they convey important semantics of images, and because texture representations that pool local image descriptors in an orderless manner have had a tremendous impact in diverse applications. In this paper we make several contributions to texture understanding. First, instead of focusing on texture instance and material category recognition, we propose a human-interpretable vocabulary of texture attributes to describe common texture patterns, complemented by a new describable texture dataset for benchmarking. Second, we look at the problem of recognizing materials and texture attributes in realistic imaging conditions, including when textures appear in clutter, developing corresponding benchmarks on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic texture represenations, including bag-of-visual-words and the Fisher vectors, in the context of deep learning and show that these have excellent efficiency and generalization properties if the convolutional layers of a deep model are used as filter banks. We obtain in this manner state-of-the-art performance in numerous datasets well beyond textures, an efficient method to apply deep features to image regions, as well as benefit in transferring features from one domain to another
The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices
This paper proposes scalable and fast algorithms for solving the Robust PCA
problem, namely recovering a low-rank matrix with an unknown fraction of its
entries being arbitrarily corrupted. This problem arises in many applications,
such as image processing, web data ranking, and bioinformatic data analysis. It
was recently shown that under surprisingly broad conditions, the Robust PCA
problem can be exactly solved via convex optimization that minimizes a
combination of the nuclear norm and the -norm . In this paper, we apply
the method of augmented Lagrange multipliers (ALM) to solve this convex
program. As the objective function is non-smooth, we show how to extend the
classical analysis of ALM to such new objective functions and prove the
optimality of the proposed algorithms and characterize their convergence rate.
Empirically, the proposed new algorithms can be more than five times faster
than the previous state-of-the-art algorithms for Robust PCA, such as the
accelerated proximal gradient (APG) algorithm. Moreover, the new algorithms
achieve higher precision, yet being less storage/memory demanding. We also show
that the ALM technique can be used to solve the (related but somewhat simpler)
matrix completion problem and obtain rather promising results too. We further
prove the necessary and sufficient condition for the inexact ALM to converge
globally. Matlab code of all algorithms discussed are available at
http://perception.csl.illinois.edu/matrix-rank/home.htmlComment: Please cite "Zhouchen Lin, Risheng Liu, and Zhixun Su, Linearized
Alternating Direction Method with Adaptive Penalty for Low Rank
Representation, NIPS 2011." (available at arXiv:1109.0367) instead for a more
general method called Linearized Alternating Direction Method This manuscript
first appeared as University of Illinois at Urbana-Champaign technical report
#UILU-ENG-09-2215 in October 2009 Zhouchen Lin, Risheng Liu, and Zhixun Su,
Linearized Alternating Direction Method with Adaptive Penalty for Low Rank
Representation, NIPS 2011. (available at http://arxiv.org/abs/1109.0367
Unsupervised learning of probably symmetric deformable 3D objects from images in the wild (extended abstract)
We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least approximately, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. Code and demo available at https://github.com/elliottwu/unsup3d
PASS: An ImageNet replacement for self-supervised pretraining without humans
Computer vision has long relied on ImageNet and other large datasets of images sampled from the Internet for pretraining models. However, these datasets have ethical and technical shortcomings, such as containing personal information taken without consent, unclear license usage, biases, and, in some cases, even problematic image content. On the other hand, state-of-the-art pretraining is nowadays obtained with unsupervised methods, meaning that labelled datasets such as ImageNet may not be necessary, or perhaps not even optimal, for model pretraining. We thus propose an unlabelled dataset PASS: Pictures without humAns for Self-Supervision. PASS only contains images with CC-BY license and complete attribution metadata, addressing the copyright issue. Most importantly, it contains no images of people at all, and also avoids other types of images that are problematic for data protection or ethics. We show that PASS can be used for pretraining with methods such as MoCo-v2, SwAV and DINO. In the transfer learning setting, it yields similar downstream performances to ImageNet pretraining even on tasks that involve humans, such as human pose estimation. PASS does not make existing datasets obsolete, as for instance it is insufficient for benchmarking. However, it shows that model pretraining is often possible while using safer data, and it also provides the basis for a more robust evaluation of pretraining methods
HoloFusion: Towards Photo-realistic 3D Generative Modeling
Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation: existing diffusion methods can either generate low-resolution but 3D consistent outputs, or detailed 2D views of 3D objects but with potential structural defects and lacking view consistency or realism. We present HoloFusion, a method that combines the best of these approaches to produce high-fidelity, plausible, and diverse 3D samples while learning from a collection of multi-view 2D images only. The method first generates coarse 3D samples using a variant of the recently proposed HoloDiffusion generator. Then, it independently renders and upsamples a large number of views of the coarse 3D model, super-resolves them to add detail, and distills those into a single, high-fidelity implicit 3D representation, which also ensures view-consistency of the final renders. The super-resolution network is trained as an integral part of HoloFusion, end-to-end, and the final distillation uses a new sampling scheme to capture the space of super-resolved signals. We compare our method against existing baselines, including DreamFusion, Get3D, EG3D, and HoloDiffusion, and achieve, to the best of our knowledge, the most realistic results on the challenging CO3Dv2 dataset
Simultaneous object detection and ranking with weak supervision
A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs and the PASCAL VOC dataset, and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results
Supervised Versus Unsupervised Deep Learning Based Methods for Skin Lesion Segmentation in Dermoscopy Images
Image segmentation is considered a crucial step in automatic dermoscopic image analysis as it affects the accuracy of subsequent steps. The huge progress in deep learning has recently revolutionized the image recognition and computer vision domains. In this paper, we compare a supervised deep learning based approach with an unsupervised deep learning based approach for the task of skin lesion segmentation in dermoscopy images. Results show that, by using the default parameter settings and network configurations proposed in the original approaches, although the unsupervised approach could detect fine structures of skin lesions in some occasions, the supervised approach shows much higher accuracy in terms of Dice coefficient and Jaccard index compared to the unsupervised approach, resulting in 77.7% vs. 40% and 67.2% vs. 30.4%, respectively. With a proposed modification to the unsupervised approach, the Dice and Jaccard values improved to 54.3% and 44%, respectively
Taking visual motion prediction to new heightfields
While the basic laws of Newtonian mechanics are well understood, explaining a physical scenario still requires manually modeling the problem with suitable equations and estimating the associated parameters. In order to be able to leverage the approximation capabilities of artificial intelligence techniques in such physics related contexts, researchers have handcrafted relevant states, and then used neural networks to learn the state transitions using simulation runs as training data. Unfortunately, such approaches are unsuited for modeling complex real-world scenarios, where manually authoring relevant state spaces tend to be tedious and challenging. In this work, we investigate if neural networks can implicitly learn physical states of real-world mechanical processes only based on visual data while internally modeling non-homogeneous environment and in the process enable long-term physical extrapolation. We develop a recurrent neural network architecture for this task and also characterize resultant uncertainties in the form of evolving variance estimates. We evaluate our setup, to extrapolate motion of rolling ball(s) on bowls of varying shape and orientation, and on arbitrary heightfields using only images as input. We report significant improvements over existing image-based methods both in terms of accuracy of predictions and complexity of scenarios; and report competitive performance with approaches that, unlike us, assume access to internal physical states
- …
