10 research outputs found

    Interpretable Transformations with Encoder-Decoder Networks

    Full text link
    Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding factors, such as pose, appearance, and illumination, from object identity. Disentangling these is difficult because they interact in very nonlinear ways. We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations. A person or algorithm can then manipulate the disentangled representation, for example, to re-render an image with explicit control over parameterized degrees of freedom. The feature space is constructed using a transforming encoder-decoder network with a custom feature transform layer, acting on the hidden representations. We demonstrate the advantages of explicit disentangling on a variety of datasets and transformations, and as an aid for traditional tasks, such as classification.Comment: Accepted at ICCV 201

    CubeNet: Equivariance to 3D Rotation and Translation

    Full text link
    3D Convolutional Neural Networks are sensitive to transformations applied to their input. This is a problem because a voxelized version of a 3D object, and its rotated clone, will look unrelated to each other after passing through to the last layer of a network. Instead, an idealized model would preserve a meaningful representation of the voxelized object, while explaining the pose-difference between the two inputs. An equivariant representation vector has two components: the invariant identity part, and a discernable encoding of the transformation. Models that can't explain pose-differences risk "diluting" the representation, in pursuit of optimizing a classification or regression loss function. We introduce a Group Convolutional Neural Network with linear equivariance to translations and right angle rotations in three dimensions. We call this network CubeNet, reflecting its cube-like symmetry. By construction, this network helps preserve a 3D shape's global and local signature, as it is transformed through successive layers. We apply this network to a variety of 3D inference problems, achieving state-of-the-art on the ModelNet10 classification challenge, and comparable performance on the ISBI 2012 Connectome Segmentation Benchmark. To the best of our knowledge, this is the first 3D rotation equivariant CNN for voxel representations.Comment: Preprin

    Interpretable transformations with Encoder-Decoder Networks

    Get PDF
    Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding factors, such as pose, appearance, and illumination, from object identity. Disentangling these is difficult because they interact in very nonlinear ways. We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations. A person or algorithm can then manipulate the disentangled representation, for example, to re-render an image with explicit control over parameterized degrees of freedom. The feature space is constructed using a transforming encoder-decoder network with a custom feature transform layer, acting on the hidden representations. We demonstrate the advantages of explicit disentangling on a variety of datasets and transformations, and as an aid for traditional tasks, such as classification

    Quantum Eigenfaces: Linear Feature Mapping and Nearest Neighbor Classification with Outlier Detection

    Get PDF
    We propose a quantum machine learning algorithm for data classification, inspired by the seminal computer vision approach of eigenfaces for face recognition. The algorithm enhances nearest neighbor/centroid classifiers with concepts from principal component analysis, enabling the automatic detection of outliers and finding use in anomaly detection domains beyond face recognition. Assuming classical input data, we formalize how to implement the algorithm using a quantum random access memory and state-of-the-art quantum linear algebra, discussing the complexity of performing the classification algorithm on a fault-tolerant quantum device. The asymptotic time complexity analysis shows that the quantum classification algorithm can be more efficient than its classical counterpart. We showcase an application of this algorithm for face recognition and image classification datasets with anomalies, obtaining promising results for the running time parameters. This work contributes to the growing field of quantum machine learning applications, and the algorithm's simplicity makes it easily adoptable by future quantum machine learning practitioners

    Réalité Enrichie par Synthèse

    Get PDF
    International audienceIn this paper, a technical solution is presented to automate the mixing of real and synthetic objects in a same animated video sequence. We aim at achieving a close binding between 3D-based analysis and synthesis techniques to compute the interaction between a real scene captured in a sequence of calibrated images, and a computer-generated environment.Nous proposons une méthode pour automatiser l'insertion cohérente d'objets 3D de synthèse dans des séquences d'images réelles. Celle-ci fait appel conjointement à des techniques d'analyse et de synthèse d'images pour calculer les images modifiées

    Detecção e agrupamento de contornos

    Get PDF
    A detecção de contornos a partir de imagens digitais é um procedimento do qual resulta informação essencial para muitos algoritmos de visão por computador. A natureza das imagens digitais bidimensionais: a sua relativamente baixa resolução; a amostragem espacial e em amplitude; a presença de ruído; a falta de informação em profundidade; as oclusões, etc., e a importância dos contornos como informação básica para muitos outros algoritmos a montante, fazem com que a detecção de contornos seja um problema apenas parcialmente resolvido, com múltiplas abordagens e dando origem desde há algumas décadas a larga quantidade de publicações. Continua a ser um tema actual de investigação como se comprova pela quantidade e qualidade das publicações científicas mais actuais nesta área. A tese discute a detecção de contornos nas suas fases clássicas: a estimação da amplitude do sinal que aponta a presença de um ponto de contorno; a pré-classificação dos pontos da imagem com base nos sinais estimados e o posterior agrupamento dos pontos de contorno individuais em segmentos de curvas de contorno. Propõe-se, nesta tese: um método de projecto de estimadores de presença de pontos de contorno baseado na utilização de equações integrais de Fredholm; um classificador não-linear que utiliza informação de pontos vizinhos para a tomada de decisão, e uma metodologia de agrupamento de pontos de contorno com crescimento iterativo com uma função de custo com suporte local. A metodologia de extracção das propriedades baseada na equação integral de Fredholm de primeira ordem permite uma análise unificadora de vários métodos previamente propostos na literatura sobre o assunto. O procedimento de classificação dos pontos de contorno baseia-se na análise das sequências ordenadas das amplitudes do gradiente na vizinhança do ponto de contorno. O procedimento é estudado com base nas funções densidade de distribuição das estatísticas ordenadas dos pontos de contorno vizinhos e na assunção de que os pontos de um mesmo contorno possuem distribuições ordenadas similares. A fase final da detecção de contornos é realizada com um procedimento de agrupamento de contornos em que se constrói uma hipótese de vizinhança para eventual crescimento do contorno e em que se estima o melhor ponto para agregação ao contorno. Os resultados experimentais para os métodos propostos são apresentados e analisados com imagens reais e sintéticas

    Automatic visual recognition using parallel machines

    Get PDF
    Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

    Interactive specification and acquisition of depth from single images

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Architecture, 2001.Includes bibliographical references (p. 99-101).We describe a system for interactively acquiring depth for an image-based representation consisting of a single input image. We use layers of images with depth to represent the scene. Unlike traditional 3D modeling and rendering systems that require precise information that are usually difficult to model and manipulate, our system's emphasis is on ease of use, comprehensiveness, and use of potentially crude depth information. Depth is extracted by the user through intuitive, interactive tools using the powerful notion of selection. Once a set of pixels is selected, the user can assign depth by painting and chiseling, using shape from shading, applying filters, aligning and extracting shape from geometry primitives, or using level set methods. The ground plane tool provides an intuitive depth reference for all other tools and serves as an initial step in depth specification. Our system is based on pixels and selections, and therefore does not impose any restriction on the scene geometry. Our system allows the user to interactively perform high quality editing operations on SGI 02s and Octanes. We demonstrate the application of our system in the architectural design (relighting, sketching, 3D walkthroughs from images), complex photocompositing, and fine art exploration contexts.by Max Chen.S.M

    Learning and recovering 3D surface deformations

    Get PDF
    Recovering the 3D deformations of a non-rigid surface from a single viewpoint has applications in many domains such as sports, entertainment, and medical imaging. Unfortunately, without any knowledge of the possible deformations that the object of interest can undergo, it is severely under-constrained, and extremely different shapes can have very similar appearances when reprojected onto an image plane. In this thesis, we first exhibit the ambiguities of the reconstruction problem when relying on correspondences between a reference image for which we know the shape and an input image. We then propose several approaches to overcoming these ambiguities. The core idea is that some a priori knowledge about how a surface can deform must be introduced to solve them. We therefore present different ways to formulate that knowledge that range from very generic constraints to models specifically designed for a particular object or material. First, we propose generally applicable constraints formulated as motion models. Such models simply link the deformations of the surface from one image to the next in a video sequence. The obvious advantage is that they can be used independently of the physical properties of the object of interest. However, to be effective, they require the presence of texture over the whole surface, and, additionally, do not prevent error accumulation from frame to frame. To overcome these weaknesses, we propose to introduce statistical learning techniques that let us build a model from a large set of training examples, that is, in our case, known 3D deformations. The resulting model then essentially performs linear or non-linear interpolation between the training examples. Following this approach, we first propose a linear global representation that models the behavior of the whole surface. As is the case with all statistical learning techniques, the applicability of this representation is limited by the fact that acquiring training data is far from trivial. A large surface can undergo many subtle deformations, and thus a large amount of training data must be available to build an accurate model. We therefore propose an automatic way of generating such training examples in the case of inextensible surfaces. Furthermore, we show that the resulting linear global models can be incorporated into a closed-form solution to the shape recovery problem. This lets us not only track deformations from frame to frame, but also reconstruct surfaces from individual images. The major drawback of global representations is that they can only model the behavior of a specific surface, which forces us to re-train a new model for every new shape, even though it is made of a material observed before. To overcome this issue, and simultaneously reduce the amount of required training data, we propose local deformation models. Such models describe the behavior of small portions of a surface, and can be combined to form arbitrary global shapes. For this purpose, we study both linear and non-linear statistical learning methods, and show that, whereas the latter are better suited for traking deformations from frame to frame, the former can also be used for reconstruction from a single image
    corecore