2,491 research outputs found

    Preserving Measured Structure During Generation and Reduction of Multivariate Point Configurations

    Get PDF
    Inherent in any multivariate data is structure, which describes the general shape and distribution of the underlying point configuration. While there are potentially many types of structure that could be of interest, consider restricting interest to two general types: geometric structure, the general shape of a point configuration, and probabilistic structure, the general distribution of points within the configuration. The ability to quantify geometric structure is an important step in many common statistical analyses. For instance, general neighbourhood structure is captured using a k-nearest neighbour graph in dimension reduction techniques such as isomap and locally-linear embedding. Neighbourhood graphs are also used in sensor network localization, which has applications in fields such as environmental habitat monitoring and wildlife monitoring. Another geometric graph, the convex hull, is also used in wildlife monitoring as a rough estimate of an animal's home range. The identification of areas of high and low density is one example of measuring the probability structure of a configuration, which can be done using a wide variety of methods. One such method is using kernel density estimation, which can be viewed as a weighted sum of nearby points. Kernel density estimation has widely varying applications, including in regression analysis, and is used in general to assess certain features of the data (modality, skewness, etc.). Related to the idea of measuring structure is the concept of "Cognostics", which has been formalized as scatterplot diagnostics (or scagnostics). Scagnostics provides a framework through which interesting structure can be measured in a configuration. The central idea is to numerically summarize the structure of a large number of two-dimensional point configurations via measures calculated on geometric graphs. This allows the interesting views to be quickly identified, and ultimately examined visually, while the views deemed to be uninteresting are simply discarded. While a good starting point, several issues in the current framework need to be addressed. For instance, while each measure is designed to be in [0,1], there are some that, when measured over tens of thousands of configurations, fail to achieve this range. In addition, there is a lot of structure that could be considered interesting that is not captured by the current framework. These issues, among others, will be addressed and rectified so that the current scagnostic framework can continue to be built upon. With tools to measure structure, attention is turned to making use of the structural information contained in the configuration. Consider the problem of preserving measured structure during the task of data aggregation, more commonly known as binning. Existing methods of data aggregation tend to exist on two ends of the structure retention spectrum. Through experimentation, methods such as equal width and hexagonal binning will be shown to tend to retain the shape of the configuration, at the expense of the density, while methods such as equal frequency and random sampling tend to retain relative density at the expense of overall shape. Tree-based binning, a general binning framework inspired by classification and regression trees, is proposed to bridge the gap between these sets of specialist algorithms. GapBin, a specially designed tree-based binning algorithm, will be shown through experimentation to provide a trade-off in low dimensional space between geometric structure retention and probabilistic structure retention. In higher dimensions, it will be shown to be the superior algorithm in terms of structure retention among those considered. Next, the general problem of constructing a configuration with a given underlying structure is considered. For example, the minimal spanning tree is known to carry important clustering information. Of interest then, is the generation of configurations with a given minimal spanning tree structure. The problem of generating a configuration with a known minimal spanning tree is equivalent to completing a Euclidean distance matrix where the only known entries are those in the minimal spanning tree. For this problem, there are several solutions, including those of Alfakih et. al., Fang & O'Leary, and Trosset. None of these algorithms, however, are designed to retain the structure of the minimal spanning tree. In addition, the sparsity of the Euclidean distance matrix containing only the minimal spanning tree results in completions that are not accurate as compared to the known completion. This leads to issues in the point configurations of the resulting completions. To resolve these, two new algorithms are proposed which are designed to retain the structure of the minimal spanning tree, leading to more accurate completions of these sparse matrices. To complement the algorithms presented, implementation of these algorithms in the statistical programming language R will also be discussed. In particular, the R package treebinr for tree-based binning, and edmcr for Euclidean distance matrix completions will be presented

    Noise-Stable Rigid Graphs for Euclidean Embedding

    Full text link
    We proposed a new criterion \textit{noise-stability}, which revised the classical rigidity theory, for evaluation of MDS algorithms which can truthfully represent the fidelity of global structure reconstruction; then we proved the noise-stability of the cMDS algorithm in generic conditions, which provides a rigorous theoretical guarantee for the precision and theoretical bounds for Euclidean embedding and its application in fields including wireless sensor network localization and satellite positioning. Furthermore, we looked into previous work about minimum-cost globally rigid spanning subgraph, and proposed an algorithm to construct a minimum-cost noise-stable spanning graph in the Euclidean space, which enabled reliable localization on sparse graphs of noisy distance constraints with linear numbers of edges and sublinear costs in total edge lengths. Additionally, this algorithm also suggests a scheme to reconstruct point clouds from pairwise distances at a minimum of O(n)O(n) time complexity, down from O(n3)O(n^3) for cMDS

    Barycentric Subspace Analysis on Manifolds

    Full text link
    This paper investigates the generalization of Principal Component Analysis (PCA) to Riemannian manifolds. We first propose a new and general type of family of subspaces in manifolds that we call barycentric subspaces. They are implicitly defined as the locus of points which are weighted means of k+1k+1 reference points. As this definition relies on points and not on tangent vectors, it can also be extended to geodesic spaces which are not Riemannian. For instance, in stratified spaces, it naturally allows principal subspaces that span several strata, which is impossible in previous generalizations of PCA. We show that barycentric subspaces locally define a submanifold of dimension k which generalizes geodesic subspaces.Second, we rephrase PCA in Euclidean spaces as an optimization on flags of linear subspaces (a hierarchy of properly embedded linear subspaces of increasing dimension). We show that the Euclidean PCA minimizes the Accumulated Unexplained Variances by all the subspaces of the flag (AUV). Barycentric subspaces are naturally nested, allowing the construction of hierarchically nested subspaces. Optimizing the AUV criterion to optimally approximate data points with flags of affine spans in Riemannian manifolds lead to a particularly appealing generalization of PCA on manifolds called Barycentric Subspaces Analysis (BSA).Comment: Annals of Statistics, Institute of Mathematical Statistics, A Para\^itr

    Euclidean distance geometry and applications

    Full text link
    Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure

    Well-solvable special cases of the TSP : a survey

    Get PDF
    The Traveling Salesman Problem belongs to the most important and most investigated problems in combinatorial optimization. Although it is an NP-hard problem, many of its special cases can be solved efficiently. We survey these special cases with emphasis on results obtained during the decade 1985-1995. This survey complements an earlier survey from 1985 compiled by Gilmore, Lawler and Shmoys. Keywords: Traveling Salesman Problem, Combinatorial optimization, Polynomial time algorithm, Computational complexity

    Tensor network and (pp-adic) AdS/CFT

    Full text link
    We use the tensor network living on the Bruhat-Tits tree to give a concrete realization of the recently proposed pp-adic AdS/CFT correspondence (a holographic duality based on the pp-adic number field Qp\mathbb{Q}_p). Instead of assuming the pp-adic AdS/CFT correspondence, we show how important features of AdS/CFT such as the bulk operator reconstruction and the holographic computation of boundary correlators are automatically implemented in this tensor network.Comment: 59 pages, 18 figures; v3: improved presentation, added figures and reference
    • …
    corecore