89 research outputs found

    Multi-manifold Attention for Vision Transformers

    Full text link
    Vision Transformers are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although their performance has been greatly enhanced through highly descriptive patch embeddings and hierarchical structures, there is still limited research on utilizing additional data representations so as to refine the selfattention map of a Transformer. To address this problem, a novel attention mechanism, called multi-manifold multihead attention, is proposed in this work to substitute the vanilla self-attention of a Transformer. The proposed mechanism models the input space in three distinct manifolds, namely Euclidean, Symmetric Positive Definite and Grassmann, thus leveraging different statistical and geometrical properties of the input for the computation of a highly descriptive attention map. In this way, the proposed attention mechanism can guide a Vision Transformer to become more attentive towards important appearance, color and texture features of an image, leading to improved classification and segmentation results, as shown by the experimental results on well-known datasets.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Non-Rigid Structure from Motion

    Get PDF
    This thesis revisits a challenging classical problem in geometric computer vision known as "Non-Rigid Structure-from-Motion" (NRSfM). It is a well-known problem where the task is to recover the 3D shape and motion of a non-rigidly moving object from image data. A reliable solution to this problem is valuable in several industrial applications such as virtual reality, medical surgery, animation movies etc. Nevertheless, to date, there does not exist any algorithm that can solve NRSfM for all kinds of conceivable motion. As a result, additional constraints and assumptions are often employed to solve NRSfM. The task is challenging due to the inherent unconstrained nature of the problem itself as many 3D varying configurations can have similar image projections. The problem becomes even more challenging if the camera is moving along with the object. The thesis takes on a modern view to this challenging problem and proposes a few algorithms that have set a new performance benchmark to solve NRSfM. The thesis not only discusses the classical work in NRSfM but also proposes some powerful elementary modification to it. The foundation of this thesis surpass the traditional single object NRSFM and for the first time provides an effective formulation to realise multi-body NRSfM. Most techniques for NRSfM under factorisation can only handle sparse feature correspondences. These sparse features are then used to construct a scene using the organisation of points, lines, planes or other elementary geometric primitive. Nevertheless, sparse representation of the scene provides an incomplete information about the scene. This thesis goes from sparse NRSfM to dense NRSfM for a single object, and then slowly lifts the intuition to realise dense 3D reconstruction of the entire dynamic scene as a global as rigid as possible deformation problem. The core of this work goes beyond the traditional approach to deal with deformation. It shows that relative scales for multiple deforming objects can be recovered under some mild assumption about the scene. The work proposes a new approach for dense detailed 3D reconstruction of a complex dynamic scene from two perspective frames. Since the method does not need any depth information nor it assumes a template prior, or per-object segmentation, or knowledge about the rigidity of the dynamic scene, it is applicable to a wide range of scenarios including YouTube Videos. Lastly, this thesis provides a new way to perceive the depth of a dynamic scene which essentially trivialises the notion of motion estimation as a compulsory step to solve this problem. Conventional geometric methods to address depth estimation requires a reliable estimate of motion parameters for each moving object, which is difficult to obtain and validate. In contrast, this thesis introduces a new motion-free approach to estimate the dense depth map of a complex dynamic scene for successive/multiple frames. The work show that given per-pixel optical flow correspondences between two consecutive frames and the sparse depth prior for the reference frame, we can recover the dense depth map for the successive frames without solving for motion parameters. By assigning the locally rigid structure to the piece-wise planar approximation of a dynamic scene which transforms as rigid as possible over frames, we can bypass the motion estimation step. Experiments results and MATLAB codes on relevant examples are provided to validate the motion-free idea

    Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

    Get PDF
    The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1

    Low-rank Tensor Estimation via Riemannian Gauss-Newton: Statistical Optimality and Second-Order Convergence

    Full text link
    In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. The general problem covers many specific examples arising from applications, including tensor regression, tensor completion, and tensor PCA/SVD. We consider an efficient Riemannian Gauss-Newton (RGN) method for low Tucker rank tensor estimation. Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first local quadratic convergence guarantee of RGN for low-rank tensor estimation in the noisy setting under some regularity conditions and provide the corresponding estimation error upper bounds. A deterministic estimation error lower bound, which matches the upper bound, is provided that demonstrates the statistical optimality of RGN. The merit of RGN is illustrated through two machine learning applications: tensor regression and tensor SVD. Finally, we provide the simulation results to corroborate our theoretical findings

    Nonlinear eigenvalue problems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 1998.Includes bibliographical references (p. 211-217).by Ross Adams Lippert.Ph.D

    Scalable Low-rank Matrix and Tensor Decomposition on Graphs

    Get PDF
    In many signal processing, machine learning and computer vision applications, one often has to deal with high dimensional and big datasets such as images, videos, web content, etc. The data can come in various forms, such as univariate or multivariate time series, matrices or high dimensional tensors. The goal of the data mining community is to reveal the hidden linear or non-linear structures in the datasets. Over the past couple of decades matrix factorization, owing to its intrinsic association with dimensionality reduction has been adopted as one of the key methods in this context. One can either use a single linear subspace to approximate the data (the standard Principal Component Analysis (PCA) approach) or a union of low dimensional subspaces where each data class belongs to a different subspace. In many cases, however, the low dimensional data follows some additional structure. Knowledge of such structure is beneficial, as we can use it to enhance the representativity of our models by adding structured priors. A nowadays standard way to represent pairwise affinity between objects is by using graphs. The introduction of graph-based priors to enhance matrix factorization models has recently brought them back to the highest attention of the data mining community. Representation of a signal on a graph is well motivated by the emerging field of signal processing on graphs, based on notions of spectral graph theory. The underlying assumption is that high-dimensional data samples lie on or close to a smooth low-dimensional manifold. Interestingly, the underlying manifold can be represented by its discrete proxy, i.e. a graph. A primary limitation of the state-of-the-art low-rank approximation methods is that they do not generalize for the case of non-linear low-rank structures. Furthermore, the standard low-rank extraction methods for many applications, such as low-rank and sparse decomposition, are computationally cumbersome. We argue, that for many machine learning and signal processing applications involving big data, an approximate low-rank recovery suffices. Thus, in this thesis, we present solutions to the above two limitations by presenting a new framework for scalable but approximate low-rank extraction which exploits the hidden structure in the data using the notion of graphs. First, we present a novel signal model, called `Multilinear low-rank tensors on graphs (MLRTG)' which states that a tensor can be encoded as a multilinear combination of the low-frequency graph eigenvectors, where the graphs are constructed along the various modes of the tensor. Since the graph eigenvectors have the interpretation of \textit{non-linear} embedding of a dataset on the low-dimensional manifold, we propose a method called `Graph Multilinear SVD (GMLSVD)' to recover PCA based linear subspaces from these eigenvectors. Finally, we propose a plethora of highly scalable matrix and tensor based problems for low-rank extraction which implicitly or explicitly make use of the GMLSVD framework. The core idea is to replace the expensive iterative SVD operations by updating the linear subspaces from the fixed non-linear ones via low-cost operations. We present applications in low-rank and sparse decomposition and clustering of the low-rank features to evaluate all the proposed methods. Our theoretical analysis shows that the approximation error of the proposed framework depends on the spectral properties of the graph Laplacian

    System- and Data-Driven Methods and Algorithms

    Get PDF
    An increasing complexity of models used to predict real-world systems leads to the need for algorithms to replace complex models with far simpler ones, while preserving the accuracy of the predictions. This two-volume handbook covers methods as well as applications. This first volume focuses on real-time control theory, data assimilation, real-time visualization, high-dimensional state spaces and interaction of different reduction techniques
    • 

    corecore