Search CORE

10 research outputs found

Interpretable Transformations with Encoder-Decoder Networks

Author: Brostow Gabriel J.
Garbin Stephan J.
Turmukhambetov Daniyar
Worrall Daniel E.
Publication venue
Publication date: 19/10/2017
Field of study

Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding factors, such as pose, appearance, and illumination, from object identity. Disentangling these is difficult because they interact in very nonlinear ways. We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations. A person or algorithm can then manipulate the disentangled representation, for example, to re-render an image with explicit control over parameterized degrees of freedom. The feature space is constructed using a transforming encoder-decoder network with a custom feature transform layer, acting on the hidden representations. We demonstrate the advantages of explicit disentangling on a variety of datasets and transformations, and as an aid for traditional tasks, such as classification.Comment: Accepted at ICCV 201

arXiv.org e-Print Archive

Crossref

CubeNet: Equivariance to 3D Rotation and Translation

Author: E Barnard
EP Simoncelli
GE Hinton
GS Chirikjian
I Arganda-Carreras
J Bruna
JL Crowley
N Srivastava
O Ronneberger
R Lenz
T Beier
T Lindeberg
WT Freeman
Publication venue
Publication date: 12/04/2018
Field of study

3D Convolutional Neural Networks are sensitive to transformations applied to their input. This is a problem because a voxelized version of a 3D object, and its rotated clone, will look unrelated to each other after passing through to the last layer of a network. Instead, an idealized model would preserve a meaningful representation of the voxelized object, while explaining the pose-difference between the two inputs. An equivariant representation vector has two components: the invariant identity part, and a discernable encoding of the transformation. Models that can't explain pose-differences risk "diluting" the representation, in pursuit of optimizing a classification or regression loss function. We introduce a Group Convolutional Neural Network with linear equivariance to translations and right angle rotations in three dimensions. We call this network CubeNet, reflecting its cube-like symmetry. By construction, this network helps preserve a 3D shape's global and local signature, as it is transformed through successive layers. We apply this network to a variety of 3D inference problems, achieving state-of-the-art on the ModelNet10 classification challenge, and comparable performance on the ISBI 2012 Connectome Segmentation Benchmark. To the best of our knowledge, this is the first 3D rotation equivariant CNN for voxel representations.Comment: Preprin

arXiv.org e-Print Archive

Crossref

UCL Discovery

Interpretable transformations with Encoder-Decoder Networks

Author: Brostow GJ
Garbin SJ
Turmukhambetov D
Worrall DE
Publication venue: IEEE International Conference on Computer Vision (ICCV)
Publication date: 25/12/2017
Field of study

UCL Discovery

Quantum Eigenfaces: Linear Feature Mapping and Nearest Neighbor Classification with Outlier Detection

Author: Bellante Armando
Bonvini William
Vanerio Stefano
Zanero Stefano
Publication venue
Publication date: 01/01/2023
Field of study

We propose a quantum machine learning algorithm for data classification, inspired by the seminal computer vision approach of eigenfaces for face recognition. The algorithm enhances nearest neighbor/centroid classifiers with concepts from principal component analysis, enabling the automatic detection of outliers and finding use in anomaly detection domains beyond face recognition. Assuming classical input data, we formalize how to implement the algorithm using a quantum random access memory and state-of-the-art quantum linear algebra, discussing the complexity of performing the classification algorithm on a fault-tolerant quantum device. The asymptotic time complexity analysis shows that the quantum classification algorithm can be more efficient than its classical counterpart. We showcase an application of this algorithm for face recognition and image classification datasets with anomalies, obtaining promising results for the running time parameters. This work contributes to the growing field of quantum machine learning applications, and the algorithm's simplicity makes it easily adoptable by future quantum machine learning practitioners

Archivio istituzionale della ricerca - Politecnico di Milano

Réalité Enrichie par Synthèse

Author: Jancène Pierre
Meilhac Christophe
Neyret Fabrice
Provot Xavier
Tarel Jean-Philippe
Verroust Anne
Vezien Jean-Marc
Publication venue: HAL CCSD
Publication date: 16/01/1996
Field of study

International audienceIn this paper, a technical solution is presented to automate the mixing of real and synthetic objects in a same animated video sequence. We aim at achieving a close binding between 3D-based analysis and synthesis techniques to compute the interaction between a real scene captured in a sequence of calibrated images, and a computer-generated environment.Nous proposons une méthode pour automatiser l'insertion cohérente d'objets 3D de synthèse dans des séquences d'images réelles. Celle-ci fait appel conjointement à des techniques d'analyse et de synthèse d'images pour calculer les images modifiées

INRIA a CCSD electronic archive server

Detecção e agrupamento de contornos

Author: Jasnau Caeiro José
Publication venue
Publication date: 01/07/2010
Field of study

A detecção de contornos a partir de imagens digitais é um procedimento do qual resulta informação essencial para muitos algoritmos de visão por computador. A natureza das imagens digitais bidimensionais: a sua relativamente baixa resolução; a amostragem espacial e em amplitude; a presença de ruído; a falta de informação em profundidade; as oclusões, etc., e a importância dos contornos como informação básica para muitos outros algoritmos a montante, fazem com que a detecção de contornos seja um problema apenas parcialmente resolvido, com múltiplas abordagens e dando origem desde há algumas décadas a larga quantidade de publicações. Continua a ser um tema actual de investigação como se comprova pela quantidade e qualidade das publicações científicas mais actuais nesta área. A tese discute a detecção de contornos nas suas fases clássicas: a estimação da amplitude do sinal que aponta a presença de um ponto de contorno; a pré-classiﬁcação dos pontos da imagem com base nos sinais estimados e o posterior agrupamento dos pontos de contorno individuais em segmentos de curvas de contorno. Propõe-se, nesta tese: um método de projecto de estimadores de presença de pontos de contorno baseado na utilização de equações integrais de Fredholm; um classiﬁcador não-linear que utiliza informação de pontos vizinhos para a tomada de decisão, e uma metodologia de agrupamento de pontos de contorno com crescimento iterativo com uma função de custo com suporte local. A metodologia de extracção das propriedades baseada na equação integral de Fredholm de primeira ordem permite uma análise uniﬁcadora de vários métodos previamente propostos na literatura sobre o assunto. O procedimento de classiﬁcação dos pontos de contorno baseia-se na análise das sequências ordenadas das amplitudes do gradiente na vizinhança do ponto de contorno. O procedimento é estudado com base nas funções densidade de distribuição das estatísticas ordenadas dos pontos de contorno vizinhos e na assunção de que os pontos de um mesmo contorno possuem distribuições ordenadas similares. A fase ﬁnal da detecção de contornos é realizada com um procedimento de agrupamento de contornos em que se constrói uma hipótese de vizinhança para eventual crescimento do contorno e em que se estima o melhor ponto para agregação ao contorno. Os resultados experimentais para os métodos propostos são apresentados e analisados com imagens reais e sintéticas

Repositório Digital IPBeja

Automatic visual recognition using parallel machines

Author: Chen Yui-Liang
Publication venue: Digital Commons @ NJIT
Publication date: 31/10/1995
Field of study

Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

Digital Commons @ New Jersey Institute of Technology (NJIT)

Interactive specification and acquisition of depth from single images

Author: Chen Max, 1977-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2001
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Architecture, 2001.Includes bibliographical references (p. 99-101).We describe a system for interactively acquiring depth for an image-based representation consisting of a single input image. We use layers of images with depth to represent the scene. Unlike traditional 3D modeling and rendering systems that require precise information that are usually difficult to model and manipulate, our system's emphasis is on ease of use, comprehensiveness, and use of potentially crude depth information. Depth is extracted by the user through intuitive, interactive tools using the powerful notion of selection. Once a set of pixels is selected, the user can assign depth by painting and chiseling, using shape from shading, applying filters, aligning and extracting shape from geometry primitives, or using level set methods. The ground plane tool provides an intuitive depth reference for all other tools and serves as an initial step in depth specification. Our system is based on pixels and selections, and therefore does not impose any restriction on the scene geometry. Our system allows the user to interactively perform high quality editing operations on SGI 02s and Octanes. We demonstrate the application of our system in the architectural design (relighting, sketching, 3D walkthroughs from images), complex photocompositing, and fine art exploration contexts.by Max Chen.S.M

DSpace@MIT

Learning and recovering 3D surface deformations

Author: Salzmann Mathieu
Publication venue: Lausanne, EPFL
Publication date: 06/11/2008
Field of study

Recovering the 3D deformations of a non-rigid surface from a single viewpoint has applications in many domains such as sports, entertainment, and medical imaging. Unfortunately, without any knowledge of the possible deformations that the object of interest can undergo, it is severely under-constrained, and extremely different shapes can have very similar appearances when reprojected onto an image plane. In this thesis, we first exhibit the ambiguities of the reconstruction problem when relying on correspondences between a reference image for which we know the shape and an input image. We then propose several approaches to overcoming these ambiguities. The core idea is that some a priori knowledge about how a surface can deform must be introduced to solve them. We therefore present different ways to formulate that knowledge that range from very generic constraints to models specifically designed for a particular object or material. First, we propose generally applicable constraints formulated as motion models. Such models simply link the deformations of the surface from one image to the next in a video sequence. The obvious advantage is that they can be used independently of the physical properties of the object of interest. However, to be effective, they require the presence of texture over the whole surface, and, additionally, do not prevent error accumulation from frame to frame. To overcome these weaknesses, we propose to introduce statistical learning techniques that let us build a model from a large set of training examples, that is, in our case, known 3D deformations. The resulting model then essentially performs linear or non-linear interpolation between the training examples. Following this approach, we first propose a linear global representation that models the behavior of the whole surface. As is the case with all statistical learning techniques, the applicability of this representation is limited by the fact that acquiring training data is far from trivial. A large surface can undergo many subtle deformations, and thus a large amount of training data must be available to build an accurate model. We therefore propose an automatic way of generating such training examples in the case of inextensible surfaces. Furthermore, we show that the resulting linear global models can be incorporated into a closed-form solution to the shape recovery problem. This lets us not only track deformations from frame to frame, but also reconstruct surfaces from individual images. The major drawback of global representations is that they can only model the behavior of a specific surface, which forces us to re-train a new model for every new shape, even though it is made of a material observed before. To overcome this issue, and simultaneously reduce the amount of required training data, we propose local deformation models. Such models describe the behavior of small portions of a surface, and can be combined to form arbitrary global shapes. For this purpose, we study both linear and non-linear statistical learning methods, and show that, whereas the latter are better suited for traking deformations from frame to frame, the former can also be used for reconstruction from a single image

Infoscience - École polytechnique fédérale de Lausanne