26 research outputs found

    Generalized Rank Pooling for Activity Recognition

    Full text link
    Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity. Usually, this pooling step discards the temporal order of the frames, which could otherwise be used for better recognition. Towards this end, we propose a novel pooling method, generalized rank pooling (GRP), that takes as input, features from the intermediate layers of a CNN that is trained on tiny sub-sequences, and produces as output the parameters of a subspace which (i) provides a low-rank approximation to the features and (ii) preserves their temporal order. We propose to use these parameters as a compact representation for the video sequence, which is then used in a classification setup. We formulate an objective for computing this subspace as a Riemannian optimization problem on the Grassmann manifold, and propose an efficient conjugate gradient scheme for solving it. Experiments on several activity recognition datasets show that our scheme leads to state-of-the-art performance.Comment: Accepted at IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 201

    확률적인 3차원 자세 복원과 행동인식

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 오성회.These days, computer vision technology becomes popular and plays an important role in intelligent systems, such as augment reality, video and image analysis, and to name a few. Although cost effective depth cameras, like a Microsoft Kinect, have recently developed, most computer vision algorithms assume that observations are obtained from RGB cameras, which make 2D observations. If, somehow, we can estimate 3D information from 2D observations, it might give better solutions for many computer vision problems. In this dissertation, we focus on estimating 3D information from 2D observations, which is well known as non-rigid structure from motion (NRSfM). More formally, NRSfM finds the three dimensional structure of an object by analyzing image streams with the assumption that an object lies in a low-dimensional space. However, a human body for long periods of time can have complex shape variations and it makes a challenging problem for NRSfM due to its increased degree of freedom. In order to handle complex shape variations, we propose a Procrustean normal distribution mixture model (PNDMM) by extending a recently proposed Procrustean normal distribution (PND), which captures the distribution of non-rigid variations of an object by excluding the effects of rigid motion. Unlike existing methods which use a single model to solve an NRSfM problem, the proposed PNDMM decomposes complex shape variations into a collection of simpler ones, thereby model learning can be more tractable and accurate. We perform experiments showing that the proposed method outperforms existing methods on highly complex and long human motion sequences. In addition, we extend the PNDMM to a single view 3D human pose estimation problem. While recovering a 3D structure of a human body from an image is important, it is a highly ambiguous problem due to the deformation of an articulated human body. Moreover, before estimating a 3D human pose from a 2D human pose, it is important to obtain an accurate 2D human pose. In order to address inaccuracy of 2D pose estimation on a single image and 3D human pose ambiguities, we estimate multiple 2D and 3D human pose candidates and select the best one which can be explained by a 2D human pose detector and a 3D shape model. We also introduce a model transformation which is incorporated into the 3D shape prior model, such that the proposed method can be applied to a novel test image. Experimental results show that the proposed method can provide good 3D reconstruction results when tested on a novel test image, despite inaccuracies of 2D part detections and 3D shape ambiguities. Finally, we handle an action recognition problem from a video clip. Current studies show that high-level features obtained from estimated 2D human poses enable action recognition performance beyond current state-of-the-art methods using low- and mid-level features based on appearance and motion, despite inaccuracy of human pose estimation. Based on these findings, we propose an action recognition method using estimated 3D human pose information since the proposed PNDMM is able to reconstruct 3D shapes from 2D shapes. Experimental results show that 3D pose based descriptors are better than 2D pose based descriptors for action recognition, regardless of classification methods. Considering the fact that we use simple 3D pose descriptors based on a 3D shape model which is learned from 2D shapes, results reported in this dissertation are promising and obtaining accurate 3D information from 2D observations is still an important research issue for reliable computer vision systems.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Research Issues 4 1.3 Organization of the Dissertation 6 Chapter 2 Preliminary 9 2.1 Generalized Procrustes Analysis (GPA) 11 2.2 EM-GPA Algorithm 12 2.2.1 Objective function 12 2.2.2 E-step 15 2.2.3 M-step 16 2.3 Implementation Considerations for EM-GPA 18 2.3.1 Preprocessing stage 18 2.3.2 Small update rate for the covariance matrix 20 2.4 Experiments 21 2.4.1 Shape alignment with the missing information 23 2.4.2 3D shape modeling 24 2.4.3 2D+3D active appearance models 28 2.5 Chapter Summary and Discussion 32 Chapter 3 Procrustean Normal Distribution Mixture Model 33 3.1 Non-Rigid Structure from Motion 35 3.2 Procrustean Normal Distribution (PND) 38 3.3 PND Mixture Model 41 3.4 Learning a PNDMM 43 3.4.1 E-step 44 3.4.2 M-step 46 3.5 Learning an Adaptive PNDMM 48 3.6 Experiments 50 3.6.1 Experimental setup 50 3.6.2 CMU Mocap database 53 3.6.3 UMPM dataset 69 3.6.4 Simple and short motions 74 3.6.5 Real sequence - qualitative representation 77 3.7 Chapter Summary 78 Chapter 4 Recovering a 3D Human Pose from a Novel Image 83 4.1 Single View 3D Human Pose Estimation 85 4.2 Candidate Generation 87 4.2.1 Initial pose generation 87 4.2.2 Part recombination 88 4.3 3D Shape Prior Model 89 4.3.1 Procrustean mixture model learning 89 4.3.2 Procrustean mixture model fitting 91 4.4 Model Transformation 92 4.4.1 Model normalization 92 4.4.2 Model adaptation 95 4.5 Result Selection 96 4.6 Experiments 98 4.6.1 Implementation details 98 4.6.2 Evaluation of the joint 2D and 3D pose estimation 99 4.6.3 Evaluation of the 2D pose estimation 104 4.6.4 Evaluation of the 3D pose estimation 106 4.7 Chapter Summary 108 Chapter 5 Application to Action Recognition 109 5.1 Appearance and Motion Based Descriptors 112 5.2 2D Pose Based Descriptors 113 5.3 Bag-of-Features with a Multiple Kernel Method 114 5.4 Classification - Kernel Group Sparse Representation 115 5.4.1 Group sparse representation for classification 116 5.4.2 Kernel group sparse (KGS) representation for classification 118 5.5 Experiment on sub-JHMDB Dataset 120 5.5.1 Experimental setup 120 5.5.2 3D pose based descriptor 122 5.5.3 Experimental results 123 5.6 Chapter Summary 129 Chapter 6 Conclusion and Future Work 131 Appendices 135 A Proof of Propositions in Chapter 2 137 A.1 Proof of Proposition 1 137 A.2 Proof of Proposition 3 138 A.3 Proof of Proposition 4 139 B Calculation of p(XijDii) in Chapter 3 141 B.1 Without the Dirac-delta term 141 B.2 With the Dirac-delta term 142 C Procrustean Mixture Model Learning and Fitting in Chapter 4 145 C.1 Procrustean Mixture Model Learning 145 C.2 Procrustean Mixture Model Fitting 147 Bibliography 153 초 록 167Docto

    A geometric framework to predict structure from function in neural networks

    Full text link
    Neural computation in biological and artificial networks relies on the nonlinear summation of many inputs. The structural connectivity matrix of synaptic weights between neurons is a critical determinant of overall network function, but quantitative links between neural network structure and function are complex and subtle. For example, many networks can give rise to similar functional responses, and the same network can function differently depending on context. Whether certain patterns of synaptic connectivity are required to generate specific network-level computations is largely unknown. Here we introduce a geometric framework for identifying synaptic connections required by steady-state responses in recurrent networks of threshold-linear neurons. Assuming that the number of specified response patterns does not exceed the number of input synapses, we analytically calculate the solution space of all feedforward and recurrent connectivity matrices that can generate the specified responses from the network inputs. A generalization accounting for noise further reveals that the solution space geometry can undergo topological transitions as the allowed error increases, which could provide insight into both neuroscience and machine learning. We ultimately use this geometric characterization to derive certainty conditions guaranteeing a non-zero synapse between neurons. Our theoretical framework could thus be applied to neural activity data to make rigorous anatomical predictions that follow generally from the model architecture.Comment: 45 pages, 12 figures, major revision, reorganized sections, altered notations, additional main text material, additional appendix materia

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Towards Comprehensive Foundations of Computational Intelligence

    Full text link
    Abstract. Although computational intelligence (CI) covers a vast variety of different methods it still lacks an integrative theory. Several proposals for CI foundations are discussed: computing and cognition as compression, meta-learning as search in the space of data models, (dis)similarity based methods providing a framework for such meta-learning, and a more general approach based on chains of transformations. Many useful transformations that extract information from features are discussed. Heterogeneous adaptive systems are presented as particular example of transformation-based systems, and the goal of learning is redefined to facilitate creation of simpler data models. The need to understand data structures leads to techniques for logical and prototype-based rule extraction, and to generation of multiple alternative models, while the need to increase predictive power of adaptive models leads to committees of competent models. Learning from partial observations is a natural extension towards reasoning based on perceptions, and an approach to intuitive solving of such problems is presented. Throughout the paper neurocognitive inspirations are frequently used and are especially important in modeling of the higher cognitive functions. Promising directions such as liquid and laminar computing are identified and many open problems presented.

    Learning from complex networks

    Get PDF
    Graph Theory has proven to be a universal language for describing modern complex systems. The elegant theoretical framework of graphs drew the researchers' attention over decades. Therefore, graphs have emerged as a ubiquitous data structure in various applications where a relational characteristic is evident. Graph-driven applications are found, e.g., in social network analysis, telecommunication networks, logistic processes, recommendation systems, modeling kinetic interactions in protein networks, or the 'Internet of Things' (IoT) where modeling billions of interconnected web-enabled devices is of paramount importance. This thesis dives deep into the challenges of modern graph applications. It proposes a robustified and accelerated spectral clustering model in homogeneous graphs and novel transformer-driven graph shell models for attributed graphs. A new data structure is introduced for probabilistic graphs to compute the information flow efficiently. Moreover, a metaheuristic algorithm is designed to find a good solution to an optimization problem composed of an extended vehicle routing problem. The thesis closes with an analysis of trend flows in social media data. Detecting communities within a graph is a fundamental data mining task of interest in virtually all areas and also serves as an unsupervised preprocessing step for many downstream tasks. One most the most well-established clustering methods is Spectral Clustering. However, standard spectral clustering is highly sensitive to noisy input data, and the eigendecomposition has a high, cubic runtime complexity O(n^3). Tackling one of these problems often exacerbates the other. This thesis presents a new model which accelerates the eigendecomposition step by replacing it with a Nyström approximation. Robustness is achieved by iteratively separating the data into a cleansed and noisy part of the data. In this process, representing the input data as a graph is vital to identify parts of the data being well connected by analyzing the vertices' distances in the eigenspace. With the advances in deep learning architectures, we also observe a surge in research on graph representation learning. The message-passing paradigm in Graph Neural Networks (GNNs) formalizes a predominant heuristic for multi-relational and attributed graph data to learn node representations. In downstream applications, we can use the representations to tackle theoretical problems known as node classification, graph classification/regression, and relation prediction. However, a common issue in GNNs is known as over-smoothing. By increasing the number of iterations within the message-passing, the nodes' representations of the input graph align and become indiscernible. This thesis shows an efficient way of relaxing the GNN architecture by employing a routing heuristic in the general workflow. Specifically, an additional layer routes the nodes' representations to dedicated experts. Each expert calculates the representations according to their respective GNN workflow. The definitions of distinguishable GNNs result from k-localized views starting from a central node. This procedure is referred to as Graph Shell Attention (SEA), where experts process different subgraphs in a transformer-motivated fashion. Reliable propagation of information through large communication networks, social networks, or sensor networks is relevant to applications concerning marketing, social analysis, or monitoring physical or environmental conditions. However, social ties of friendship may be obsolete, and communication links may fail, inducing the notion of uncertainty in such networks. This thesis addresses the problem of optimizing information propagation in uncertain networks given a constrained budget of edges. A specialized data structure, called F-tree, addresses two NP-hard subproblems: the computation of the expected information flow and the optimal choice of edges. The F-tree identifies independent components of a probabilistic input graph for which the information flow can either be computed analytically and efficiently or for which traditional Monte-Carlo sampling can be applied independently of the remaining network. The next part of the thesis covers a graph problem from the Operations Research point of view. A new variant of the well-known vehicle routing problem (VRP) is introduced, where customers are served within a specific time window (TW), as well as flexible delivery locations (FL) including capacity constraints. The latter implies that each customer is scheduled in one out of a set of capacitated delivery service locations. Practically, the VRPTW-FL problem is relevant for applications in parcel delivery, routing with limited parking space, or, for example, in the scope of hospital-wide scheduling of physical therapists. This thesis presents a metaheuristic built upon a hybrid Adaptive Large Neighborhood Search (ALNS). Moreover, a backtracking mechanism in the construction phase is introduced to alter unsatisfactory decisions at early stages. In the computational study, hospital data is used to evaluate the utility of flexible delivery locations and various cost functions. In the last part of the thesis, social media trends are analyzed, which yields insights into user sentiment and newsworthy topics. Such trends consist of bursts of messages concerning a particular topic within a time frame, significantly deviating from the average appearance frequency of the same subject. This thesis presents a method to classify trend archetypes to predict future dissemination by investigating the dissemination of such trends in space and time. Generally, with the ever-increasing scale and complexity of graph-structured datasets and artificial intelligence advances, AI-backed models will inevitably play an important role in analyzing, modeling, and enhancing knowledge extraction from graph data.Die Graphentheorie hat sich zur einer universellen Sprache entwickelt, mit Hilfe derer sich moderne und komplexe Systeme und Zusammenhänge beschreiben lassen. Diese theoretisch elegante und gut fundierte Rahmenstruktur attrahierte über Dekaden hinweg die Aufmerksamkeit von Wissenschaftlern/-innen. In der heutigen Informationstechnologie-Landschaft haben sich Graphen längst zu einer allgegenwärtigen Datenstruktur in Anwendungen etabliert, innerhalb derer charakteristische Zusammenhangskomponenten eine zentrale Rolle spielen. Anwendungen, die über Graphen unterstützt werden, finden sich u.a. in der Analyse von sozialen Netzwerken, Telekommunikationsnetwerken, logistische Prozessverwaltung, Analyse von Empfehlungsdiensten, in der Modellierung kinetischer Interaktionen von Proteinstrukturen, oder auch im "Internet der Dinge" (engl.: 'Internet Of Things' (IoT)), welches das Zusammenspiel von abermillionen web-unterstützte Endgeräte abbildet und eine prädominierende Rolle für große IT-Unternehmen spielt. Diese Dissertation beleuchtet die Herausforderungen moderner Graphanwendungen. Im Bereich homogener Netzwerken wird ein beschleunigtes und robustes spektrales Clusteringverfahren, sowie ein Modell zur Untersuchung von Teilgraphen mittels Transformer-Architekturen für attribuierte Graphen vorgestellt. Auf wahrscheinlichkeitsbasierten homogenen Netzwerken wird eine neue Datenstruktur eingeführt, die es erlaubt einen effizienten Informationsfluss innerhalb eines Graphen zu berechnen. Darüber hinaus wird ein Optimierungsproblem in Transportnetzwerken beleuchtet, sowie eine Untersuchung von Trendflüssen in sozialen Medien diskutiert. Die Untersuchung von Verbünden (engl.: 'Clusters') von Graphdaten stellt einen Eckpfeiler im Bereich der Datengewinnung dar. Die Erkenntnisse sind nahezu in allen praktischen Bereichen von Relevanz und dient im Bereich des unüberwachten Lernens als Vorverarbeitungsschritt für viele nachgeschaltete Aufgaben. Einer der weit verbreitetsten Methodiken zur Verbundanalyse ist das spektrale Clustering. Die Qualität des spektralen Clusterings leidet, wenn die Eingabedaten sehr verrauscht sind und darüber hinaus ist die Eigenwertzerlegung mit O(n^3) eine teure Operation und damit wesentlich für die hohe, kubische Laufzeitkomplexität verantwortlich. Die Optimierung von einem dieser Kriterien exazerbiert oftmals das verbleibende Kriterium. In dieser Dissertation wird ein neues Modell vorgestellt, innerhalb dessen die Eigenwertzerlegung über eine Nyström Annäherung beschleunigt wird. Die Robustheit wird über ein iteratives Verfahren erreicht, das die gesäuberten und die verrauschten Daten voneinander trennt. Die Darstellung der Eingabedaten über einen Graphen spielt hierbei die zentrale Rolle, die es erlaubt die dicht verbundenen Teile des Graphen zu identifizieren. Dies wird über eine Analyse der Distanzen im Eigenraum erreicht. Parallel zu neueren Erkenntnissen im Bereich des Deep Learnings lässt sich auch ein Forschungsdrang im repräsentativen Lernen von Graphen erkennen. Graph Neural Networks (GNN) sind eine neue Unterform von künstlich neuronalen Netzen (engl.: 'Artificial Neural Networks') auf der Basis von Graphen. Das Paradigma des sogenannten 'message-passing' in neuronalen Netzen, die auf Graphdaten appliziert werden, hat sich hierbei zur prädominierenden Heuristik entwickelt, um Vektordarstellungen von Knoten aus (multi-)relationalen, attribuierten Graphdaten zu lernen. Am Ende der Prozesskette können wir somit theoretische Probleme angehen und lösen, die sich mit Fragestellungen über die Klassifikation von Knoten oder Graphen, über regressive Ausdrucksmöglichkeiten bis hin zur Vorhersage von relationaler Verbindungen beschäftigen. Ein klassisches Problem innerhalb graphischer neuronaler Netze ist bekannt unter der Terminologie des 'over-smoothing' (dt.: 'Überglättens'). Es beschreibt, dass sich mit steigender Anzahl an Iterationen des wechselseitigen Informationsaustausches, die Knotenrepräsentationen im vektoriellen Raum angleichen und somit nicht mehr unterschieden werden können. In dieser Forschungsarbeit wird eine effiziente Methode vorgestellt, die die klassische GNN Architektur aufbricht und eine Vermittlerschicht in den herkömmlichen Verarbeitungsfluss einarbeitet. Konkret gesprochen werden hierbei Knotenrepräsentationen an ausgezeichnete Experten geschickt. Jeder Experte verarbeitet auf idiosynkratischer Basis die Knoteninformation. Ausgehend von einem Anfrageknoten liegt das Kriterium für die Unterscheidbarkeit von Experten in der restriktiven Verarbeitung lokaler Information. Diese neue Heuristik wird als 'Graph Shell Attention' (SEA) bezeichnet und beschreibt die Informationsverarbeitung unterschiedlicher Teilgraphen von Experten unter der Verwendung der Transformer-technologie. Eine zuverlässige Weiterleitung von Informationen über größere Kommunikationsnetzwerken, sozialen Netzwerken oder Sensorennetzwerken spielen eine wichtige Rolle in Anwendungen der Marktanalyse, der Analyse eines sozialen Gefüges, oder der Überwachung der physischen und umweltorientierten Bedingungen. Innerhalb dieser Anwendungen können Fälle auftreten, wo Freundschaftsbeziehungen nicht mehr aktuell sind, wo die Kommunikation zweier Endpunkte zusammenbricht, welches mittels einer Unsicherheit des Informationsaustausches zweier Endpunkte ausgedrückt werden kann. Diese Arbeit untersucht die Optimierung des Informationsflusses in Netzwerken, deren Verbindungen unsicher sind, hinsichtlich der Bedingung, dass nur ein Bruchteil der möglichen Kanten für den Informationsaustausch benutzt werden dürfen. Eine eigens entwickelte Datenstruktur - der F-Baum - wird eingeführt, die 2 NP-harte Teilprobleme auf einmal adressiert: zum einen die Berechnung des erwartbaren Informationsflusses und zum anderen die Auswahl der optimalen Kanten. Der F-Baum unterscheidet hierbei unabhängige Zusammenhangskomponenten der wahrscheinlichkeitsbasierten Eingabedaten, deren Informationsfluss entweder analytisch korrekt und effizient berechnet werden können, oder lokal über traditionelle Monte-Carlo sampling approximiert werden können. Der darauffolgende Abschnitt dieser Arbeit befasst sich mit einem Graphproblem aus Sicht der Optimierungsforschung angewandter Mathematik. Es wird eine neue Variante der Tourenplanung vorgestellt, welches neben kundenspezifischer Zeitfenster auch flexible Zustellstandorte beinhaltet. Darüber hinaus obliegt den Zielorten, an denen Kunden bedient werden können, weiteren Kapazitätslimitierungen. Aus praktischer Sicht ist das VRPTW-FL (engl.: "Vehicle Routing Problem with Time Windows and Flexible Locations") eine bedeutende Problemstellung für Paketdienstleister, Routenplanung mit eingeschränkten Stellplätzen oder auch für die praktische Planung der Arbeitsaufteilung von behandelnden Therapeuten/-innen und Ärzten/-innen in einem Krankenhaus. In dieser Arbeit wird für die Bewältigung dieser Problemstellung eine Metaheuristik vorgestellt, die einen hybriden Ansatz mit der sogenannten Adaptive Large Neighborhood Search (ALNS) impliziert. Darüber hinaus wird als Konstruktionsheuristik ein 'Backtracking'-Mechanismus (dt.: Rückverfolgung) angewandt, um initiale Startlösungen aus dem Lösungssuchraum auszuschließen, die weniger vielversprechend sind. In der Evaluierung dieses neuen Ansatz werden Krankenhausdaten untersucht, um auch die Nützlichkeit von flexiblen Zielorten unter verschiedenen Kostenfunktionen herauszuarbeiten. Im letzten Kapitel dieser Dissertation werden Trends in sozialen Daten analysiert, die Auskunft über die Stimmung der Benutzer liefern, sowie Einblicke in tagesaktuelle Geschehnisse gewähren. Ein Kennzeichen solcher Trends liegt in dem Aufbraußen von inhaltsspezifischen Themen innerhalb eines Zeitfensters, die von der durchschnittlichen Erscheinungshäufigkeit desselben Themas signifikant abweichen. Die Untersuchung der Verbreitung solches Trends über die zeitliche und örtliche Dimension erlaubt es, Trends in Archetypen zu klassifizieren, um somit die Ausbreitung zukünftiger Trends hervorzusagen. Mit der immerwährenden Skalierung von Graphdaten und deren Komplexität, und den Fortschritten innerhalb der künstlichen Intelligenz, wird das maschinelle Lernen unweigerlich weiterhin eine wesentliche Rolle spielen, um Graphdaten zu modellieren, analysieren und schlussendlich die Wissensextraktion aus derartigen Daten maßgeblich zu fördern.La théorie des graphes s'est révélée être une langue universel pour décrire les systèmes complexes modernes. L'élégant cadre théorique des graphes a attiré l'attention des chercheurs pendant des décennies. Par conséquent, les graphes sont devenus une structure de données omniprésente dans diverses applications où une caractéristique relationnelle est évidente. Les applications basées sur les graphes se retrouvent, par exemple, dans l'analyse des réseaux sociaux, les réseaux de télécommunication, les processus logistiques, les systèmes de recommandation, la modélisation des interactions cinétiques dans les réseaux de protéines, ou l'"Internet des objets" (IoT) où la modélisation de milliards de dispositifs interconnectés basés sur le web est d'une importance capitale. Cette thèse se penche sur les défis posés par les applications modernes des graphes. Elle propose un modèle de regroupement spectral robuste et accéléré dans les graphes homogènes et de nouveaux modèles d'enveloppe de graphe pilotés par transformateur pour les graphes attribués. Une nouvelle structure de données est introduite pour les graphes probabilistes afin de calculer efficacement le flux d'informations. De plus, un algorithme métaheuristique est conçu pour trouver une bonne solution à un problème d'optimisation composé d'un problème étendu de routage de véhicules. La thèse se termine par une analyse des flux de tendances dans les données des médias sociaux. La détection de communautés au sein d'un graphe est une tâche fondamentale d'exploration de données qui présente un intérêt dans pratiquement tous les domaines et sert également d'étape de prétraitement non supervisé pour de nombreuses tâches en aval. L'une des méthodes de regroupement les mieux établies est le regroupement spectral. Cependant, le regroupement spectral standard est très sensible aux données d'entrée bruitées, et l'eigendecomposition a une complexité d'exécution cubique élevée O(n^3). S'attaquer à l'un de ces problèmes exacerbe souvent l'autre. Cette thèse présente un nouveau modèle qui accélère l'étape d'eigendecomposition en la remplaçant par une approximation de Nyström. La robustesse est obtenue en séparant itérativement les données en une partie nettoyée et une partie bruyante. Dans ce processus, la représentation des données d'entrée sous forme de graphe est essentielle pour identifier les parties des données qui sont bien connectées en analysant les distances des sommets dans l'espace propre. Avec les progrès des architectures de Deep Learning, nous observons également une poussée de la recherche sur l'apprentissage de la représentation graphique. Le paradigme du passage de messages dans les réseaux neuronaux graphiques (GNN) formalise une heuristique prédominante pour les données graphiques multi-relationnelles et attribuées afin d'apprendre les représentations des nœuds. Dans les applications en aval, nous pouvons utiliser les représentations pour résoudre des problèmes théoriques tels que la classification des nœuds, la classification/régression des graphes et la prédiction des relations. Cependant, un problème courant dans les GNN est connu sous le nom de lissage excessif. En augmentant le nombre d'itérations dans le passage de messages, les représentations des nœuds du graphe d'entrée s'alignent et deviennent indiscernables. Cette thèse montre un moyen efficace d'assouplir l'architecture GNN en employant une heuristique de routage dans le flux de travail général. Plus précisément, une couche supplémentaire achemine les représentations des nœuds vers des experts spécialisés. Chaque expert calcule les représentations en fonction de son flux de travail GNN respectif. Les définitions de GNN distincts résultent de k vues localisées à partir d'un nœud central. Cette procédure est appelée Graph Shell Attention (SEA), dans laquelle les experts traitent différents sous-graphes à l'aide d'un transformateur. La propagation fiable d'informations par le biais de grands réseaux de communication, de réseaux sociaux ou de réseaux de capteurs est importante pour les applications concernant le marketing, l'analyse sociale ou la surveillance des conditions physiques ou environnementales. Cependant, les liens sociaux d'amitié peuvent être obsolètes, et les liens de communication peuvent échouer, induisant la notion d'incertitude dans de tels réseaux. Cette thèse aborde le problème de l'optimisation de la propagation de l'information dans les réseaux incertains compte tenu d'un budget contraint d'arêtes. Une structure de données spécialisée, appelée F-tree, traite deux sous-problèmes NP-hard: le calcul du flux d'information attendu et le choix optimal des arêtes. L'arbre F identifie les composants indépendants d'un graphe d'entrée probabiliste pour lesquels le flux d'informations peut être calculé analytiquement et efficacement ou pour lesquels l'échantillonnage Monte-Carlo traditionnel peut être appliqué indépendamment du reste du réseau. La partie suivante de la thèse couvre un problème de graphe du point de vue de la recherche opérationnelle. Une nouvelle variante du célèbre problème d'acheminement par véhicule (VRP) est introduite, où les clients sont servis dans une fenêtre temporelle spécifique (TW), ainsi que des lieux de livraison flexibles (FL) incluant des contraintes de capacité. Ces dernières impliquent que chaque client est programmé dans l'un des emplacements de service de livraison à capacité. En pratique, le problème VRPTW-FL est pertinent pour des applications de livraison de colis, d'acheminement avec un espace de stationnement limité ou, par exemple, dans le cadre de la programmation de kinésithérapeutes à l'échelle d'un hôpital. Cette thèse présente une métaheuristique construite sur une recherche hybride de grands voisinages adaptatifs (ALNS). En outre, un mécanisme de retour en arrière dans la phase de construction est introduit pour modifier les décisions insatisfaisantes à des stades précoces. Dans l'étude computationnelle, des données hospitalières sont utilisées pour évaluer l'utilité de lieux de livraison flexibles et de diverses fonctions de coût. Dans la dernière partie de la thèse, les tendances des médias sociaux sont analysées, ce qui donne un aperçu du sentiment des utilisateurs et des sujets d'actualité. Ces tendances consistent en des rafales de messages concernant un sujet particulier dans un laps de temps donné, s'écartant de manière significative de la fréquence moyenne d'apparition du même sujet. Cette thèse présente une méthode de classification des archétypes de tendances afin de prédire leur diffusion future en étudiant la diffusion de ces tendances dans l'espace et dans le temps. D'une manière générale, avec l'augmentation constante de l'échelle et de la complexité des ensembles de données structurées en graphe et les progrès de l'intelligence artificielle, les modèles soutenus par l'IA joueront inévitablement un rôle important dans l'analyse, la modélisation et l'amélioration de l'extraction de connaissances à partir de données en graphe

    High-quality face capture, animation and editing from monocular video

    Get PDF
    Digitization of virtual faces in movies requires complex capture setups and extensive manual work to produce superb animations and video-realistic editing. This thesis pushes the boundaries of the digitization pipeline by proposing automatic algorithms for high-quality 3D face capture and animation, as well as photo-realistic face editing. These algorithms reconstruct and modify faces in 2D videos recorded in uncontrolled scenarios and illumination. In particular, advances in three main areas offer solutions for the lack of depth and overall uncertainty in video recordings. First, contributions in capture include model-based reconstruction of detailed, dynamic 3D geometry that exploits optical and shading cues, multilayer parametric reconstruction of accurate 3D models in unconstrained setups based on inverse rendering, and regression-based 3D lip shape enhancement from high-quality data. Second, advances in animation are video-based face reenactment based on robust appearance metrics and temporal clustering, performance-driven retargeting of detailed facial models in sync with audio, and the automatic creation of personalized controllable 3D rigs. Finally, advances in plausible photo-realistic editing are dense face albedo capture and mouth interior synthesis using image warping and 3D teeth proxies. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of photo-realistic 3D faces.Die Digitalisierung von Gesichtern zum Einsatz in der Filmindustrie erfordert komplizierte Aufnahmevorrichtungen und die manuelle Nachbearbeitung von Rekonstruktionen, um perfekte Animationen und realistische Videobearbeitung zu erzielen. Diese Dissertation erweitert vorhandene Digitalisierungsverfahren durch die Erforschung von automatischen Verfahren zur qualitativ hochwertigen 3D Rekonstruktion, Animation und Modifikation von Gesichtern. Diese Algorithmen erlauben es, Gesichter in 2D Videos, die unter allgemeinen Bedingungen und unbekannten Beleuchtungsverhältnissen aufgenommen wurden, zu rekonstruieren und zu modifizieren. Vor allem Fortschritte in den folgenden drei Hauptbereichen tragen zur Kompensation von fehlender Tiefeninformation und der allgemeinen Mehrdeutigkeit von 2D Videoaufnahmen bei. Erstens, Beiträge zur modellbasierten Rekonstruktion von detaillierter und dynamischer 3D Geometrie durch optische Merkmale und die Shading-Eigenschaften des Gesichts, mehrschichtige parametrische Rekonstruktion von exakten 3D Modellen mittels inversen Renderings in allgemeinen Szenen und regressionsbasierter 3D Lippenformverfeinerung mittels qualitativ hochwertigen Daten. Zweitens, Fortschritte im Bereich der Computeranimation durch videobasierte Gesichtsausdrucksübertragung und temporaler Clusterbildung, Übertragung von detaillierten Gesichtsmodellen, deren Mundbewegung mit Ton synchronisiert ist, und die automatische Erstellung von personalisierten "3D Face Rigs". Schließlich werden Fortschritte im Bereich der realistischen Videobearbeitung vorgestellt, welche auf der dichten Rekonstruktion von Hautreflektionseigenschaften und der Mundinnenraumsynthese mittels bildbasierten und geometriebasierten Verfahren aufbauen. Qualitativ hochwertige Ergebnisse in anspruchsvollen Anwendungen untermauern die Wichtigkeit der geleisteten Beiträgen und zeigen das große Potential der automatischen Erstellung von realistischen digitalen 3D Gesichtern auf

    Proceedings of the fifth international workshop on Mathematical Foundations of Computational Anatomy (MFCA 2015)

    Get PDF
    International audienceComputational anatomy is an emerging discipline at the interface of geometry, statistics and image analysis which aims at modeling and analyzing the biological shape of tissues and organs. The goal is to estimate representative organ anatomies across diseases, populations, species or ages, to model the organ development across time (growth or aging), to establish their variability, and to correlate this variability information with other functional, genetic or structural information.The Mathematical Foundations of Computational Anatomy (MFCA) workshop aims at fostering the interactions between the mathematical community around shapes and the MICCAI community in view of computational anatomy applications. It targets more particularly researchers investigating the combination of statistical and geometrical aspects in the modeling of the variability of biological shapes. The workshop is a forum for the exchange of the theoretical ideas and aims at being a source of inspiration for new methodological developments in computational anatomy. A special emphasis is put on theoretical developments, applications and results being welcomed as illustrations.Following the first edition of this workshop in 20061, the second edition in New-York in 20082, the third edition in Toronto in 20113, the forth edition in Nagoya Japan on September 22 20134, the fifth edition was held in Munich on October 9 20155.Contributions were solicited in Riemannian, sub-Riemannian and group theoretical methods, advanced statistics on deformations and shapes, metrics for computational anatomy, statistics of surfaces, time-evolving geometric processes, stratified spaces, optimal transport, approximation methods in statistical learning and related subjects. Among the submitted papers, 14 were selected andorganized in 4 oral sessions

    Toward sparse and geometry adapted video approximations

    Get PDF
    Video signals are sequences of natural images, where images are often modeled as piecewise-smooth signals. Hence, video can be seen as a 3D piecewise-smooth signal made of piecewise-smooth regions that move through time. Based on the piecewise-smooth model and on related theoretical work on rate-distortion performance of wavelet and oracle based coding schemes, one can better analyze the appropriate coding strategies that adaptive video codecs need to implement in order to be efficient. Efficient video representations for coding purposes require the use of adaptive signal decompositions able to capture appropriately the structure and redundancy appearing in video signals. Adaptivity needs to be such that it allows for proper modeling of signals in order to represent these with the lowest possible coding cost. Video is a very structured signal with high geometric content. This includes temporal geometry (normally represented by motion information) as well as spatial geometry. Clearly, most of past and present strategies used to represent video signals do not exploit properly its spatial geometry. Similarly to the case of images, a very interesting approach seems to be the decomposition of video using large over-complete libraries of basis functions able to represent salient geometric features of the signal. In the framework of video, these features should model 2D geometric video components as well as their temporal evolution, forming spatio-temporal 3D geometric primitives. Through this PhD dissertation, different aspects on the use of adaptivity in video representation are studied looking toward exploiting both aspects of video: its piecewise nature and the geometry. The first part of this work studies the use of localized temporal adaptivity in subband video coding. This is done considering two transformation schemes used for video coding: 3D wavelet representations and motion compensated temporal filtering. A theoretical R-D analysis as well as empirical results demonstrate how temporal adaptivity improves coding performance of moving edges in 3D transform (without motion compensation) based video coding. Adaptivity allows, at the same time, to equally exploit redundancy in non-moving video areas. The analogy between motion compensated video and 1D piecewise-smooth signals is studied as well. This motivates the introduction of local length adaptivity within frame-adaptive motion compensated lifted wavelet decompositions. This allows an optimal rate-distortion performance when video motion trajectories are shorter than the transformation "Group Of Pictures", or when efficient motion compensation can not be ensured. After studying temporal adaptivity, the second part of this thesis is dedicated to understand the fundamentals of how can temporal and spatial geometry be jointly exploited. This work builds on some previous results that considered the representation of spatial geometry in video (but not temporal, i.e, without motion). In order to obtain flexible and efficient (sparse) signal representations, using redundant dictionaries, the use of highly non-linear decomposition algorithms, like Matching Pursuit, is required. General signal representation using these techniques is still quite unexplored. For this reason, previous to the study of video representation, some aspects of non-linear decomposition algorithms and the efficient decomposition of images using Matching Pursuits and a geometric dictionary are investigated. A part of this investigation concerns the study on the influence of using a priori models within approximation non-linear algorithms. Dictionaries with a high internal coherence have some problems to obtain optimally sparse signal representations when used with Matching Pursuits. It is proved, theoretically and empirically, that inserting in this algorithm a priori models allows to improve the capacity to obtain sparse signal approximations, mainly when coherent dictionaries are used. Another point discussed in this preliminary study, on the use of Matching Pursuits, concerns the approach used in this work for the decompositions of video frames and images. The technique proposed in this thesis improves a previous work, where authors had to recur to sub-optimal Matching Pursuit strategies (using Genetic Algorithms), given the size of the functions library. In this work the use of full search strategies is made possible, at the same time that approximation efficiency is significantly improved and computational complexity is reduced. Finally, a priori based Matching Pursuit geometric decompositions are investigated for geometric video representations. Regularity constraints are taken into account to recover the temporal evolution of spatial geometric signal components. The results obtained for coding and multi-modal (audio-visual) signal analysis, clarify many unknowns and show to be promising, encouraging to prosecute research on the subject
    corecore