27 research outputs found

    Rekonstruktion, Analyse und Editierung dynamisch deformierter 3D-Oberflächen

    Get PDF
    Dynamically deforming 3D surfaces play a major role in computer graphics. However, producing time-varying dynamic geometry at ever increasing detail is a time-consuming and costly process, and so a recent trend is to capture geometry data directly from the real world. In the first part of this thesis, I propose novel approaches for this research area. These approaches capture dense dynamic 3D surfaces from multi-camera systems in a particularly robust and accurate way. This provides highly realistic dynamic surface models for phenomena like moving garments and bulging muscles. However, re-using, editing, or otherwise analyzing dynamic 3D surface data is not yet conveniently possible. To close this gap, the second part of this dissertation develops novel data-driven modeling and animation approaches. I first show a supervised data-driven approach for modeling human muscle deformations that scales to huge datasets and provides fine-scale, anatomically realistic deformations at high quality not attainable by previous methods. I then extend data-driven modeling to the unsupervised case, providing editing tools for a wider set of input data ranging from facial performance capture and full-body motion to muscle and cloth deformation. To this end, I introduce the concepts of sparsity and locality within a mathematical optimization framework. I also explore these concepts for constructing shape-aware functions that are useful for static geometry processing, registration, and localized editing.Dynamisch deformierbare 3D-Oberflächen spielen in der Computergrafik eine zentrale Rolle. Die Erstellung der für Computergrafik-Anwendungen benötigten, hochaufgelösten und zeitlich veränderlichen Oberflächengeometrien ist allerdings äußerst arbeitsintensiv. Aus dieser Problematik heraus hat sich der Trend entwickelt, Oberflächendaten direkt aus Aufnahmen der echten Welt zu erfassen. Dazu nötige 3D-Rekonstruktionsverfahren werden im ersten Teil der Arbeit entwickelt. Die vorgestellten, neuartigen Verfahren erlauben die Erfassung dynamischer 3D-Oberflächen aus Mehrkamera-Aufnahmen bei hoher Verlässlichkeit und Präzision. Auf diese Weise können detaillierte Oberflächenmodelle von Phänomenen wie in Bewegung befindliche Kleidung oder sich anspannende Muskeln erfasst werden. Aber auch die Wiederverwendung, Bearbeitung und Analyse derlei gewonnener 3D-Oberflächendaten ist aktuell noch nicht auf eine einfache Art und Weise möglich. Um diese Lücke zu schließen beschäftigt sich der zweite Teil der Arbeit mit der datengetriebenen Modellierung und Animation. Zunächst wird ein Ansatz für das überwachte Lernen menschlicher Muskel-Deformationen vorgestellt. Dieses neuartige Verfahren ermöglicht eine datengetriebene Modellierung mit besonders umfangreichen Datensätzen und liefert anatomisch-realistische Deformationseffekte. Es übertrifft damit die Genauigkeit früherer Methoden. Im nächsten Teil beschäftigt sich die Dissertation mit dem unüberwachten Lernen aus 3D-Oberflächendaten. Es werden neuartige Werkzeuge vorgestellt, die eine weitreichende Menge an Eingabedaten verarbeiten können, von aufgenommenen Gesichtsanimationen über Ganzkörperbewegungen bis hin zu Muskel- und Kleidungsdeformationen. Um diese Anwendungsbreite zu erreichen stützt sich die Arbeit auf die allgemeinen Konzepte der Spärlichkeit und Lokalität und bettet diese in einen mathematischen Optimierungsansatz ein. Abschließend zeigt die vorliegende Arbeit, wie diese Konzepte auch für die Konstruktion von oberflächen-adaptiven Basisfunktionen übertragen werden können. Dadurch können Anwendungen für die Verarbeitung, Registrierung und Bearbeitung statischer Oberflächenmodelle erschlossen werden

    LEARNING FROM MULTIPLE VIEWS OF DATA

    Get PDF
    This dissertation takes inspiration from the abilities of our brain to extract information and learn from multiple sources of data and try to mimic this ability for some practical problems. It explores the hypothesis that the human brain can extract and store information from raw data in a form, termed a common representation, suitable for cross-modal content matching. A human-level performance for the aforementioned task requires - a) the ability to extract sufficient information from raw data and b) algorithms to obtain a task-specific common representation from multiple sources of extracted information. This dissertation addresses the aforementioned requirements and develops novel content extraction and cross-modal content matching architectures. The first part of the dissertation proposes a learning-based visual information extraction approach: Recursive Context Propagation Network or RCPN, for semantic segmentation of images. It is a deep neural network that utilizes the contextual information from the entire image for semantic segmentation, through bottom-up followed by top-down context propagation. This improves the feature representation of every super-pixel in an image for better classification into semantic categories. RCPN is analyzed to discover that the presence of bypass-error paths in RCPN can hinder effective context propagation. It is shown that bypass-errors can be tackled by inclusion of classification loss of internal nodes as well. Secondly, a novel tree-MRF structure is developed using the parse trees to model the hierarchical dependency present in the output. The second part of this dissertation develops algorithms to obtain and match the common representations across different modalities. A novel Partial Least Square (PLS) based framework is proposed to learn a common subspace from multiple modalities of data. It is used for multi-modal face biometric problems such as pose-invariant face recognition and sketch-face recognition. The issue of sensitivity to the noise in pose variation is analyzed and a two-stage discriminative model is developed to tackle it. A generalized framework is proposed to extend various popular feature extraction techniques that can be solved as a generalized eigenvalue problem to their multi-modal counterpart. It is termed Generalized Multiview Analysis or GMA, and used for pose-and-lighting invariant face recognition and text-image retrieval

    Multi-view Data Analysis

    Get PDF
    Multi-view data analysis is a key technology for making effective decisions by leveraging information from multiple data sources. The process of data acquisition across various sensory modalities gives rise to the heterogeneous property of data. In my thesis, multi-view data representations are studied towards exploiting the enriched information encoded in different domains or feature types, and novel algorithms are formulated to enhance feature discriminability. Extracting informative data representation is a critical step in visual recognition and data mining tasks. Multi-view embeddings provide a new way of representation learning to bridge the semantic gap between the low-level observations and high-level human comprehensible knowledge benefitting from enriched information in multiple modalities.Recent advances on multi-view learning have introduced a new paradigm in jointly modeling cross-modal data. Subspace learning method, which extracts compact features by exploiting a common latent space and fuses multi-view information, has emerged proiminent among different categories of multi-view learning techniques. This thesis provides novel solutions in learning compact and discriminative multi-view data representations by exploiting the data structures in low dimensional subspace. We also demonstrate the performance of the learned representation scheme on a number of challenging tasks in recognition, retrieval and ranking problems.The major contribution of the thesis is a unified solution for subspace learning methods, which is extensible for multiple views, supervised learning, and non-linear transformations. Traditional statistical learning techniques including Canonical Correlation Analysis, Partial Least Square regression and Linear Discriminant Analysis are studied by constructing graphs of specific forms under the same framework. Methods using non-linear transforms based on kernels and (deep) neural networks are derived, which lead to superior performance compared to the linear ones. A novel multi-view discriminant embedding method is proposed by taking the view difference into consideration. Secondly, a multiview nonparametric discriminant analysis method is introduced by exploiting the class boundary structure and discrepancy information of the available views. This allows for multiple projecion directions, by relaxing the Gaussian distribution assumption of related methods. Thirdly, we propose a composite ranking method by keeping a close correlation with the individual rankings for optimal rank fusion. We propose a multi-objective solution to ranking problems by capturing inter-view and intra-view information using autoencoderlike networks. Finally, a novel end-to-end solution is introduced to enhance joint ranking with minimum view-specific ranking loss, so that we can achieve the maximum global view agreements within a single optimization process.In summary, this thesis aims to address the challenges in representing multi-view data across different tasks. The proposed solutions have shown superior performance in numerous tasks, including object recognition, cross-modal image retrieval, face recognition and object ranking

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE
    corecore