8 research outputs found

    Spherical matching for temporal correspondence of non-rigid surfaces

    Get PDF

    Simplicial Complex based Point Correspondence between Images warped onto Manifolds

    Full text link
    Recent increase in the availability of warped images projected onto a manifold (e.g., omnidirectional spherical images), coupled with the success of higher-order assignment methods, has sparked an interest in the search for improved higher-order matching algorithms on warped images due to projection. Although currently, several existing methods "flatten" such 3D images to use planar graph / hypergraph matching methods, they still suffer from severe distortions and other undesired artifacts, which result in inaccurate matching. Alternatively, current planar methods cannot be trivially extended to effectively match points on images warped onto manifolds. Hence, matching on these warped images persists as a formidable challenge. In this paper, we pose the assignment problem as finding a bijective map between two graph induced simplicial complexes, which are higher-order analogues of graphs. We propose a constrained quadratic assignment problem (QAP) that matches each p-skeleton of the simplicial complexes, iterating from the highest to the lowest dimension. The accuracy and robustness of our approach are illustrated on both synthetic and real-world spherical / warped (projected) images with known ground-truth correspondences. We significantly outperform existing state-of-the-art spherical matching methods on a diverse set of datasets.Comment: Accepted at ECCV 202

    Time-slice analysis of dyadic human activity

    Get PDF
    La reconnaissance d’activités humaines à partir de données vidéo est utilisée pour la surveillance ainsi que pour des applications d’interaction homme-machine. Le principal objectif est de classer les vidéos dans l’une des k classes d’actions à partir de vidéos entièrement observées. Cependant, de tout temps, les systèmes intelligents sont améliorés afin de prendre des décisions basées sur des incertitudes et ou des informations incomplètes. Ce besoin nous motive à introduire le problème de l’analyse de l’incertitude associée aux activités humaines et de pouvoir passer à un nouveau niveau de généralité lié aux problèmes d’analyse d’actions. Nous allons également présenter le problème de reconnaissance d’activités par intervalle de temps, qui vise à explorer l’activité humaine dans un intervalle de temps court. Il a été démontré que l’analyse par intervalle de temps est utile pour la caractérisation des mouvements et en général pour l’analyse de contenus vidéo. Ces études nous encouragent à utiliser ces intervalles de temps afin d’analyser l’incertitude associée aux activités humaines. Nous allons détailler à quel degré de certitude chaque activité se produit au cours de la vidéo. Dans cette thèse, l’analyse par intervalle de temps d’activités humaines avec incertitudes sera structurée en 3 parties. i) Nous présentons une nouvelle famille de descripteurs spatiotemporels optimisés pour la prédiction précoce avec annotations d’intervalle de temps. Notre représentation prédictive du point d’intérêt spatiotemporel (Predict-STIP) est basée sur l’idée de la contingence entre intervalles de temps. ii) Nous exploitons des techniques de pointe pour extraire des points d’intérêts afin de représenter ces intervalles de temps. iii) Nous utilisons des relations (uniformes et par paires) basées sur les réseaux neuronaux convolutionnels entre les différentes parties du corps de l’individu dans chaque intervalle de temps. Les relations uniformes enregistrent l’apparence locale de la partie du corps tandis que les relations par paires captent les relations contextuelles locales entre les parties du corps. Nous extrayons les spécificités de chaque image dans l’intervalle de temps et examinons différentes façons de les agréger temporellement afin de générer un descripteur pour tout l’intervalle de temps. En outre, nous créons une nouvelle base de données qui est annotée à de multiples intervalles de temps courts, permettant la modélisation de l’incertitude inhérente à la reconnaissance d’activités par intervalle de temps. Les résultats expérimentaux montrent l’efficience de notre stratégie dans l’analyse des mouvements humains avec incertitude.Recognizing human activities from video data is routinely leveraged for surveillance and human-computer interaction applications. The main focus has been classifying videos into one of k action classes from fully observed videos. However, intelligent systems must to make decisions under uncertainty, and based on incomplete information. This need motivates us to introduce the problem of analysing the uncertainty associated with human activities and move to a new level of generality in the action analysis problem. We also present the problem of time-slice activity recognition which aims to explore human activity at a small temporal granularity. Time-slice recognition is able to infer human behaviours from a short temporal window. It has been shown that temporal slice analysis is helpful for motion characterization and for video content representation in general. These studies motivate us to consider timeslices for analysing the uncertainty associated with human activities. We report to what degree of certainty each activity is occurring throughout the video from definitely not occurring to definitely occurring. In this research, we propose three frameworks for time-slice analysis of dyadic human activity under uncertainty. i) We present a new family of spatio-temporal descriptors which are optimized for early prediction with time-slice action annotations. Our predictive spatiotemporal interest point (Predict-STIP) representation is based on the intuition of temporal contingency between time-slices. ii) we exploit state-of-the art techniques to extract interest points in order to represent time-slices. We also present an accumulative uncertainty to depict the uncertainty associated with partially observed videos for the task of early activity recognition. iii) we use Convolutional Neural Networks-based unary and pairwise relations between human body joints in each time-slice. The unary term captures the local appearance of the joints while the pairwise term captures the local contextual relations between the parts. We extract these features from each frame in a time-slice and examine different temporal aggregations to generate a descriptor for the whole time-slice. Furthermore, we create a novel dataset which is annotated at multiple short temporal windows, allowing the modelling of the inherent uncertainty in time-slice activity recognition. All the three methods have been evaluated on TAP dataset. Experimental results demonstrate the effectiveness of our framework in the analysis of dyadic activities under uncertaint

    Spherical Matching for Temporal Correspondence of Non-Rigid Surfaces

    No full text
    This paper introduces spherical matching to estimate dense temporal correspondence of non-rigid surfaces with genus-zero topology. The spherical domain gives a consistent 2D parameterisation of non-rigid surfaces for matching. Non-rigid 3D surface correspondence is formulated as the recovery of a bijective mapping between two surfaces in the 2D domain. Formulating matching as a 2D bijection guarantees a continuous one-to-one surface correspondence without overfolding. This overcomes limitations of direct estimation of non-rigid surface correspondence in the 3D domain. A multiple resolution coarse-to-fine algorithm is introduced to robustly estimate the dense correspondence which minimises the disparity in shape and appearance between two surfaces

    Processing and tracking human motions using optical, inertial, and depth sensors

    Get PDF
    The processing of human motion data constitutes an important strand of research with many applications in computer animation, sport science and medicine. Currently, there exist various systems for recording human motion data that employ sensors of different modalities such as optical, inertial and depth sensors. Each of these sensor modalities have intrinsic advantages and disadvantages that make them suitable for capturing specific aspects of human motions as, for example, the overall course of a motion, the shape of the human body, or the kinematic properties of motions. In this thesis, we contribute with algorithms that exploit the respective strengths of these different modalities for comparing, classifying, and tracking human motion in various scenarios. First, we show how our proposed techniques can be employed, e.g., for real-time motion reconstruction using efficient cross-modal retrieval techniques. Then, we discuss a practical application of inertial sensors-based features to the classification of trampoline motions. As a further contribution, we elaborate on estimating the human body shape from depth data with applications to personalized motion tracking. Finally, we introduce methods to stabilize a depth tracker in challenging situations such as in presence of occlusions. Here, we exploit the availability of complementary inertial-based sensor information.Die Verarbeitung menschlicher Bewegungsdaten stellt einen wichtigen Bereich der Forschung dar mit vielen Anwendungsmöglichkeiten in Computer-Animation, Sportwissenschaften und Medizin. Zurzeit existieren diverse Systeme für die Aufnahme von menschlichen Bewegungsdaten, welche unterschiedliche Sensor-Modalitäten, wie optische Sensoren, Trägheits- oder Tiefen-Sensoren, einsetzen. Alle diese Sensor-Modalitäten haben intrinsische Vor- und Nachteile, welche sie befähigen, spezifische Aspekte menschlicher Bewegungen, wie zum Beispiel den groben Verlauf von Bewegungen, die Form des menschlichen Körpers oder die kinetischen Eigenschaften von Bewegungen, einzufangen. In dieser Arbeit tragen wir mit Algorithmen bei, welche die jeweiligen Vorteile dieser verschiedenen Modalitäten ausnutzen, um menschliche Bewegungen in unterschiedlichen Szenarien zu vergleichen, zu klassifizieren und zu verfolgen. Zuerst zeigen wir, wie unsere vorgeschlagenen Techniken angewandt werden können, um z.B. in Echtzeit Bewegungen mit Hilfe von cross-modalem Suchen zu rekonstruieren. Dann diskutieren wir eine praktische Anwendung von Trägheitssensor-basierten Eigenschaften für die Klassifikation von Trampolinbewegungen. Als einen weiteren Beitrag gehen wir näher auf die Bestimmung der menschlichen Körperform aus Tiefen-Daten mit Anwendung in personalisierter Bewegungsverfolgung ein. Zuletzt führen wir Methoden ein, um einen Tiefen-Tracker in anspruchsvollen Situationen, wie z.B. in Anwesenheit von Verdeckungen, zu stabilisieren. Hier nutzen wir die Verfügbarkeit von komplementären, Trägheits-basierten Sensor-Informationen aus

    Fehlerkaschierte Bildbasierte Darstellungsverfahren

    Get PDF
    Creating photo-realistic images has been one of the major goals in computer graphics since its early days. Instead of modeling the complexity of nature with standard modeling tools, image-based approaches aim at exploiting real-world footage directly,as they are photo-realistic by definition. A drawback of these approaches has always been that the composition or combination of different sources is a non-trivial task, often resulting in annoying visible artifacts. In this thesis we focus on different techniques to diminish visible artifacts when combining multiple images in a common image domain. The results are either novel images, when dealing with the composition task of multiple images, or novel video sequences rendered in real-time, when dealing with video footage from multiple cameras.Fotorealismus ist seit jeher eines der großen Ziele in der Computergrafik. Anstatt die Komplexität der Natur mit standardisierten Modellierungswerkzeugen nachzubauen, gehen bildbasierte Ansätze den umgekehrten Weg und verwenden reale Bildaufnahmen zur Modellierung, da diese bereits per Definition fotorealistisch sind. Ein Nachteil dieser Variante ist jedoch, dass die Komposition oder Kombination mehrerer Quellbilder eine nichttriviale Aufgabe darstellt und häufig unangenehm auffallende Artefakte im erzeugten Bild nach sich zieht. In dieser Dissertation werden verschiedene Ansätze verfolgt, um Artefakte zu verhindern oder abzuschwächen, welche durch die Komposition oder Kombination mehrerer Bilder in einer gemeinsamen Bilddomäne entstehen. Im Ergebnis liefern die vorgestellten Verfahren neue Bilder oder neue Ansichten einer Bildsammlung oder Videosequenz, je nachdem, ob die jeweilige Aufgabe die Komposition mehrerer Bilder ist oder die Kombination mehrerer Videos verschiedener Kameras darstellt

    Qualifying 4D Deforming Surfaces by Registered Differential Features

    Get PDF
    Institute of Perception, Action and BehaviourRecent advances in 4D data acquisition systems in the field of Computer Vision have opened up many exciting new possibilities for the interpretation of complex moving surfaces. However, a fundamental problem is that this has also led to a huge increase in the volume of data to be handled. Attempting to make sense of this wealth of information is then a core issue to be addressed if such data can be applied to more complex tasks. Similar problems have been historically encountered in the analysis of 3D static surfaces, leading to the extraction of higher-level features based on analysis of the differential geometry.Our central hypothesis is that there exists a compact set of similarly useful descriptors for the analysis of dynamic 4D surfaces. The primary advantages in considering localised changes are that they provide a naturally useful set of invariant characteristics. We seek a constrained set of terms - a vocabulary - for describing all types of deformation. By using this, we show how to describe what the surface is doing more effectively; and thereby enable better characterisation, and consequently more effective visualisation and comparison.This thesis investigates this claim. We adopt a bottom-up approach of the problem, in which we acquire raw data from a newly constructed commercial 4D data capture system developed by our industrial partners. A crucial first step resolves the temporal non-linear registration between instances of the captured surface. We employ a combined optical/range flow to guide a conformation over a sequence. By extending the use of aligned colour information alongside the depth data we improve this estimation in the case of local surface motion ambiguities. By employing a KLT/thin-plate-spline method we also seek to preserve global deformation for regions with no estimate.We then extend aspects of differential geometry theory for existing static surface analysis to the temporal domain. Our initial formulation considers the possible intrinsic transitions from the set of shapes defined by the variations in the magnitudes of the principal curvatures. This gives rise to a total of 15 basic types of deformation. The change in the combined magnitudes also gives an indication of the extent of change. We then extend this to surface characteristics associated with expanding, rotating and shearing; to derive a full set of differential features.Our experimental results include qualitative assessment of deformations for short episodic registered sequences of both synthetic and real data. The higher-level distinctions extracted are furthermore a useful first step for parsimonious feature extraction, which we then proceed to demonstrate can be used as a basis for further analysis. We ultimately evaluate this approach by considering shape transition features occurring within the human face, and the applicability for identification and expression analysis tasks

    Learning and recovering 3D surface deformations

    Get PDF
    Recovering the 3D deformations of a non-rigid surface from a single viewpoint has applications in many domains such as sports, entertainment, and medical imaging. Unfortunately, without any knowledge of the possible deformations that the object of interest can undergo, it is severely under-constrained, and extremely different shapes can have very similar appearances when reprojected onto an image plane. In this thesis, we first exhibit the ambiguities of the reconstruction problem when relying on correspondences between a reference image for which we know the shape and an input image. We then propose several approaches to overcoming these ambiguities. The core idea is that some a priori knowledge about how a surface can deform must be introduced to solve them. We therefore present different ways to formulate that knowledge that range from very generic constraints to models specifically designed for a particular object or material. First, we propose generally applicable constraints formulated as motion models. Such models simply link the deformations of the surface from one image to the next in a video sequence. The obvious advantage is that they can be used independently of the physical properties of the object of interest. However, to be effective, they require the presence of texture over the whole surface, and, additionally, do not prevent error accumulation from frame to frame. To overcome these weaknesses, we propose to introduce statistical learning techniques that let us build a model from a large set of training examples, that is, in our case, known 3D deformations. The resulting model then essentially performs linear or non-linear interpolation between the training examples. Following this approach, we first propose a linear global representation that models the behavior of the whole surface. As is the case with all statistical learning techniques, the applicability of this representation is limited by the fact that acquiring training data is far from trivial. A large surface can undergo many subtle deformations, and thus a large amount of training data must be available to build an accurate model. We therefore propose an automatic way of generating such training examples in the case of inextensible surfaces. Furthermore, we show that the resulting linear global models can be incorporated into a closed-form solution to the shape recovery problem. This lets us not only track deformations from frame to frame, but also reconstruct surfaces from individual images. The major drawback of global representations is that they can only model the behavior of a specific surface, which forces us to re-train a new model for every new shape, even though it is made of a material observed before. To overcome this issue, and simultaneously reduce the amount of required training data, we propose local deformation models. Such models describe the behavior of small portions of a surface, and can be combined to form arbitrary global shapes. For this purpose, we study both linear and non-linear statistical learning methods, and show that, whereas the latter are better suited for traking deformations from frame to frame, the former can also be used for reconstruction from a single image
    corecore