434 research outputs found
Discrete Visual Perception
International audienceComputational vision and biomedical image have made tremendous progress of the past decade. This is mostly due the development of efficient learning and inference algorithms which allow better, faster and richer modeling of visual perception tasks. Graph-based representations are among the most prominent tools to address such perception through the casting of perception as a graph optimization problem. In this paper, we briefly introduce the interest of such representations, discuss their strength and limitations and present their application to address a variety of problems in computer vision and biomedical image analysis
Modeling the structure of multivariate manifolds: Shape maps
We propose a shape population metric that reflects the interdependencies between points observed in a set of examples. It provides a notion of topology for shape and appearance models that represents the behavior of individual observations in a metric space, in which distances between points correspond to their joint modeling properties. A Markov chain is learnt using the description lengths of models that describe sub sets of the entire data. The according diffusion map or shape map provides for the metric that reflects the behavior of the training population. With this metric functional clustering, deformation- or motion segmentation, sparse sampling and the treatment of outliers can be dealt with in a unified and transparent manner. We report experimental results on synthetic and real world data and compare the framework with existing specialized approaches. 1
Building detection in very high resolution multispectral data with deep learning features
International audienceThe automated man-made object detection and building extraction from single satellite images is, still, one of the most challenging tasks for various urban planning and monitoring engineering applications. To this end, in this paper we propose an automated building detection framework from very high resolution remote sensing data based on deep convolu-tional neural networks. The core of the developed method is based on a supervised classification procedure employing a very large training dataset. An MRF model is then responsible for obtaining the optimal labels regarding the detection of scene buildings. The experimental results and the performed quantitative validation indicate the quite promising potentials of the developed approach
Learning Grammars for Architecture-Specific Facade Parsing
International audienceParsing facade images requires optimal handcrafted grammar for a given class of buildings. Such a handcrafted grammar is often designed manually by experts. In this paper, we present a novel framework to learn a compact grammar from a set of ground-truth images. To this end, parse trees of ground-truth annotated images are obtained running existing inference algorithms with a simple, very general grammar. From these parse trees, repeated subtrees are sought and merged together to share derivations and produce a grammar with fewer rules. Furthermore, unsupervised clustering is performed on these rules, so that, rules corresponding to the same complex pattern are grouped together leading to a rich compact grammar. Experimental validation and comparison with the state-of-the-art grammar-based methods on four diff erent datasets show that the learned grammar helps in much faster convergence while producing equal or more accurate parsing results compared to handcrafted grammars as well as grammars learned by other methods. Besides, we release a new dataset of facade images from Paris following the Art-deco style and demonstrate the general applicability and extreme potential of the proposed framework
Cooperative Object Segmentation and Behavior Inference inImage Sequences
In this paper, we propose a general framework for fusing bottom-up segmentation with top-down object behavior inference over an image sequence. This approach is beneficial for both tasks, since it enables them to cooperate so that knowledge relevant to each can aid in the resolution of the other, thus enhancing the final result. In particular, the behavior inference process offers dynamic probabilistic priors to guide segmentation. At the same time, segmentation supplies its results to the inference process, ensuring that they are consistent both with prior knowledge and with new image information. The prior models are learned from training data and they adapt dynamically, based on newly analyzed images. We demonstrate the effectiveness of our framework via particular implementations that we have employed in the resolution of two hand gesture recognition applications. Our experimental results illustrate the robustness of our joint approach to segmentation and behavior inference in challenging conditions involving complex backgrounds and occlusions of the target objec
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation
During the past decade, with the significant progress of computational power
as well as ever-rising data availability, deep learning techniques became
increasingly popular due to their excellent performance on computer vision
problems. The size of the Protein Data Bank has increased more than 15 fold
since 1999, which enabled the expansion of models that aim at predicting
enzymatic function via their amino acid composition. Amino acid sequence
however is less conserved in nature than protein structure and therefore
considered a less reliable predictor of protein function. This paper presents
EnzyNet, a novel 3D-convolutional neural networks classifier that predicts the
Enzyme Commission number of enzymes based only on their voxel-based spatial
structure. The spatial distribution of biochemical properties was also examined
as complementary information. The 2-layer architecture was investigated on a
large dataset of 63,558 enzymes from the Protein Data Bank and achieved an
accuracy of 78.4% by exploiting only the binary representation of the protein
shape. Code and datasets are available at https://github.com/shervinea/enzynet.Comment: 11 pages, 6 figure
Pose Invariant Deformable Shape Priors Using L1 Higher Order Sparse Graphs
International audienceIn this paper we propose a novel method for knowledge-based segmentation. We adopt a point distribution graphical model formulation which encodes pose invariant shape priors through L1 sparse higher order cliques. Local shape deformation properties of the model can be captured and learned in an optimal manner from a training set using dual decomposition. These higher order shape terms are combined with conventional visual ones aiming at maximizing the posterior segmentation likelihood. The considered graphical model is optimized using dual decomposition and is used towards 2D (computer vision) and 3D object segmentation (medical imaging) with promising results
- …