19 research outputs found

    Simplicial variances, potentials and Mahalanobis distances

    Get PDF
    The average squared volume of simplices formed by k independent copies from the sameprobability measure µ on Rddefines an integral measure of dispersion k(µ), which is aconcave functional of µ after suitable normalization. When k = 1 it corresponds to tr(Σµ)and when k = d we obtain the usual generalized variance det(Σµ), with Σµthe covariancematrix of µ. The dispersion k(µ) generates a notion of simplicial potential at any x ∈ Rd,dependent on µ. We show that this simplicial potential is a quadratic convex function ofx, with minimum value at the mean aµfor µ, and that the potential at aµdefines a centralmeasure of scatter similar to k(µ), thereby generalizing results by Wilks (1960) andvan der Vaart (1965) for the generalized variance. Simplicial potentials define generalizedMahalanobis distances, expressed as weighted sums of such distances in every k-margin,and we show that the matrix involved in the generalized distance is a particular generalizedinverse of Σµ, constructed from its characteristic polynomial, when k = rank(Σµ). Finally,we show how simplicial potentials can be used to define simplicial distances between twodistributions, depending on their means and covariances, with interesting features whenthe distributions are close to singularit

    Simplicial and minimal-variance distances in multivariate data analysis

    Get PDF
    In this paper, we study the behaviour of the so-called k-simplicial distances and k-minimal-variance distances between a point and a sample. The family of k-simplicial distances includes the Euclidean distance, the Mahalanobis distance, Oja’s simplex distance and many others. We give recommendations about the choice of parameters used to calculate the distances, including the size of the sub-sample of simplices used to improve computation time, if needed. We introduce a new family of distances which we call k-minimal-variance distances. Each of these distances is constructed using polynomials in the sample covariance matrix, with the aim of providing an alternative to the inverse covariance matrix, that is applicable when data is degenerate. We explore some applications of the considered distances, including outlier detection and clustering, and compare how the behaviour of the distances is affected for different parameter choices

    Distance measures and whitening procedures for high dimensional data

    Get PDF
    The need to effectively analyse high dimensional data is increasingly crucial to many fields as data collection and storage capabilities continue to grow. Working with high dimensional data is fraught with difficulties, making many data analysis methods inadvisable, unstable or entirely unavailable. The Mahalanobis distance and data whitening are two methods that are integral to multivariate data analysis. These methods are reliant on the inverse of the covariance matrix, which is often non-existent or unstable in high dimensions. The methods that are currently used to circumvent singularity in the covariance matrix often impose structural assumptions on the data, which are not always appropriate or known. In this thesis, three novel methods are proposed. Two of these methods are distance measures which measure the proximity of a point x to a set of points X. The simplicial distances find the average volume of all k-dimensional simplices between x and vertices of X. The minimal-variance distances aim to minimize the variance of the distances produced, while adhering to a constraint ensuring similar behaviour to the Mahalanobis distance. Finally, the minimal-variance whitening method is detailed. This is a method of data whitening, and is constructed by minimizing the total variation of the transformed data subject to a constraint. All of these novel methods are shown to behave similarly to the Mahalanobis distances and data whitening methods that are used for full-rank data. Furthermore, unlike the methods that rely on the inverse covariance matrix, these new methods are well-defined for degenerate data and do not impose structural assumptions. This thesis explores the aims, constructions and limitations of these new methods, and offers many empirical examples and comparisons of their performances when used with high dimensional data

    Polynomial whitening for high-dimensional data

    Get PDF
    The inverse square root of a covariance matrix is often desirable for performing data whitening in the process of applying many common multivariate data analysis methods. Direct calculation of the inverse square root is not available when the covariance matrix is either singular or nearly singular, as often occurs in high dimensions. We develop new methods, which we broadly call polynomial whitening, to construct a low-degree polynomial in the empirical covariance matrix which has similar properties to the true inverse square root of the covariance matrix (should it exist). Our method does not suffer in singular or near-singular settings, and is computationally tractable in high dimensions. We demonstrate that our construction of low-degree polynomials provides a good substitute for high-dimensional inverse square root covariance matrices, in both

    Modeling of Craniofacial Anatomy, Variation, and Growth

    Get PDF

    Man-made Surface Structures from Triangulated Point Clouds

    Get PDF
    Photogrammetry aims at reconstructing shape and dimensions of objects captured with cameras, 3D laser scanners or other spatial acquisition systems. While many acquisition techniques deliver triangulated point clouds with millions of vertices within seconds, the interpretation is usually left to the user. Especially when reconstructing man-made objects, one is interested in the underlying surface structure, which is not inherently present in the data. This includes the geometric shape of the object, e.g. cubical or cylindrical, as well as corresponding surface parameters, e.g. width, height and radius. Applications are manifold and range from industrial production control to architectural on-site measurements to large-scale city models. The goal of this thesis is to automatically derive such surface structures from triangulated 3D point clouds of man-made objects. They are defined as a compound of planar or curved geometric primitives. Model knowledge about typical primitives and relations between adjacent pairs of them should affect the reconstruction positively. After formulating a parametrized model for man-made surface structures, we develop a reconstruction framework with three processing steps: During a fast pre-segmentation exploiting local surface properties we divide the given surface mesh into planar regions. Making use of a model selection scheme based on minimizing the description length, this surface segmentation is free of control parameters and automatically yields an optimal number of segments. A subsequent refinement introduces a set of planar or curved geometric primitives and hierarchically merges adjacent regions based on their joint description length. A global classification and constraint parameter estimation combines the data-driven segmentation with high-level model knowledge. Therefore, we represent the surface structure with a graphical model and formulate factors based on likelihood as well as prior knowledge about parameter distributions and class probabilities. We infer the most probable setting of surface and relation classes with belief propagation and estimate an optimal surface parametrization with constraints induced by inter-regional relations. The process is specifically designed to work on noisy data with outliers and a few exceptional freeform regions not describable with geometric primitives. It yields full 3D surface structures with watertightly connected surface primitives of different types. The performance of the proposed framework is experimentally evaluated on various data sets. On small synthetically generated meshes we analyze the accuracy of the estimated surface parameters, the sensitivity w.r.t. various properties of the input data and w.r.t. model assumptions as well as the computational complexity. Additionally we demonstrate the flexibility w.r.t. different acquisition techniques on real data sets. The proposed method turns out to be accurate, reasonably fast and little sensitive to defects in the data or imprecise model assumptions.Künstliche Oberflächenstrukturen aus triangulierten Punktwolken Ein Ziel der Photogrammetrie ist die Rekonstruktion der Form und Größe von Objekten, die mit Kameras, 3D-Laserscannern und anderern räumlichen Erfassungssystemen aufgenommen wurden. Während viele Aufnahmetechniken innerhalb von Sekunden triangulierte Punktwolken mit Millionen von Punkten liefern, ist deren Interpretation gewöhnlicherweise dem Nutzer überlassen. Besonders bei der Rekonstruktion künstlicher Objekte (i.S.v. engl. man-made = „von Menschenhand gemacht“ ist man an der zugrunde liegenden Oberflächenstruktur interessiert, welche nicht inhärent in den Daten enthalten ist. Diese umfasst die geometrische Form des Objekts, z.B. quaderförmig oder zylindrisch, als auch die zugehörigen Oberflächenparameter, z.B. Breite, Höhe oder Radius. Die Anwendungen sind vielfältig und reichen von industriellen Fertigungskontrollen über architektonische Raumaufmaße bis hin zu großmaßstäbigen Stadtmodellen. Das Ziel dieser Arbeit ist es, solche Oberflächenstrukturen automatisch aus triangulierten Punktwolken von künstlichen Objekten abzuleiten. Sie sind definiert als ein Verbund ebener und gekrümmter geometrischer Primitive. Modellwissen über typische Primitive und Relationen zwischen Paaren von ihnen soll die Rekonstruktion positiv beeinflussen. Nachdem wir ein parametrisiertes Modell für künstliche Oberflächenstrukturen formuliert haben, entwickeln wir ein Rekonstruktionsverfahren mit drei Verarbeitungsschritten: Im Rahmen einer schnellen Vorsegmentierung, die lokale Oberflächeneigenschaften berücksichtigt, teilen wir die gegebene vermaschte Oberfläche in ebene Regionen. Unter Verwendung eines Schemas zur Modellauswahl, das auf der Minimierung der Beschreibungslänge beruht, ist diese Oberflächensegmentierung unabhängig von Kontrollparametern und liefert automatisch eine optimale Anzahl an Regionen. Eine anschließende Verbesserung führt eine Menge von ebenen und gekrümmten geometrischen Primitiven ein und fusioniert benachbarte Regionen hierarchisch basierend auf ihrer gemeinsamen Beschreibungslänge. Eine globale Klassifikation und bedingte Parameterschätzung verbindet die datengetriebene Segmentierung mit hochrangigem Modellwissen. Dazu stellen wir die Oberflächenstruktur in Form eines graphischen Modells dar und formulieren Faktoren basierend auf der Likelihood sowie auf apriori Wissen über die Parameterverteilungen und Klassenwahrscheinlichkeiten. Wir leiten die wahrscheinlichste Konfiguration von Flächen- und Relationsklassen mit Hilfe von Belief-Propagation ab und schätzen eine optimale Oberflächenparametrisierung mit Bedingungen, die durch die Relationen zwischen benachbarten Primitiven induziert werden. Der Prozess ist eigens für verrauschte Daten mit Ausreißern und wenigen Ausnahmeregionen konzipiert, die nicht durch geometrische Primitive beschreibbar sind. Er liefert wasserdichte 3D-Oberflächenstrukturen mit Oberflächenprimitiven verschiedener Art. Die Leistungsfähigkeit des vorgestellten Verfahrens wird an verschiedenen Datensätzen experimentell evaluiert. Auf kleinen, synthetisch generierten Oberflächen untersuchen wir die Genauigkeit der geschätzten Oberflächenparameter, die Sensitivität bzgl. verschiedener Eigenschaften der Eingangsdaten und bzgl. Modellannahmen sowie die Rechenkomplexität. Außerdem demonstrieren wir die Flexibilität bzgl. verschiedener Aufnahmetechniken anhand realer Datensätze. Das vorgestellte Rekonstruktionsverfahren erweist sich als genau, hinreichend schnell und wenig anfällig für Defekte in den Daten oder falsche Modellannahmen

    Advancements in latent space network modelling

    Get PDF
    The ubiquity of relational data has motivated an extensive literature on network analysis, and over the last two decades the latent space approach has become a popular network modelling framework. In this approach, the nodes of a network are represented in a low-dimensional latent space and the probability of interactions occurring are modelled as a function of the associated latent coordinates. This thesis focuses on computational and modelling aspects of the latent space approach, and we present two main contributions. First, we consider estimation of temporally evolving latent space networks in which interactions among a fixed population are observed through time. The latent coordinates of each node evolve other time and this presents a natural setting for the application of sequential monte carlo (SMC) methods. This facilitates online inference which allows estimation for dynamic networks in which the number of observations in time is large. Since the performance of SMC methods degrades as the dimension of the latent state space increases, we explore the high-dimensional SMC literature to allow estimation of networks with a larger number of nodes. Second, we develop a latent space model for network data in which the interactions occur between sets of the population and, as a motivating example, we consider a coauthorship network in which it is typical for more than two authors to contribute to an article. This type of data can be represented as a hypergraph, and we extend the latent space framework to this setting. Modelling the nodes in a latent space provides a convenient visualisation of the data and allows properties to be imposed on the hypergraph relationships. We develop a parsimonious model with a computationally convenient likelihood. Furthermore, we theoretically consider the properties of the degree distribution of our model and further explore its properties via simulation

    Intelligent Sensor Networks

    Get PDF
    In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts

    Automatic Reconstruction of Textured 3D Models

    Get PDF
    Three dimensional modeling and visualization of environments is an increasingly important problem. This work addresses the problem of automatic 3D reconstruction and we present a system for unsupervised reconstruction of textured 3D models in the context of modeling indoor environments. We present solutions to all aspects of the modeling process and an integrated system for the automatic creation of large scale 3D models
    corecore