16 research outputs found

    A Nystr\"om method with missing distances

    Full text link
    We study the problem of determining the configuration of nn points, referred to as mobile nodes, by utilizing pairwise distances to mm fixed points known as anchor nodes. In the standard setting, we have information about the distances between anchors (anchor-anchor) and between anchors and mobile nodes (anchor-mobile), but the distances between mobile nodes (mobile-mobile) are not known. For this setup, the Nystr\"om method is a viable technique for estimating the positions of the mobile nodes. This study focuses on the setting where the anchor-mobile block of the distance matrix contains only partial distance information. First, we establish a relationship between the columns of the anchor-mobile block in the distance matrix and the columns of the corresponding block in the Gram matrix via a graph Laplacian. Exploiting this connection, we introduce a novel sampling model that frames the position estimation problem as low-rank recovery of an inner product matrix, given a subset of its expansion coefficients in a special non-orthogonal basis. This basis and its dual basis--the central elements of our model--are explicitly derived. Our analysis is grounded in a specific centering of the points that is unique to the Nystr\"om method. With this in mind, we extend previous work in Euclidean distance geometry by providing a general dual basis approach for points centered anywhere.Comment: 11 page

    Machine learning in space forms: Embeddings, classification, and similarity comparisons

    Get PDF
    We take a non-Euclidean view at three classical machine learning subjects: low-dimensional embedding, classification, and similarity comparisons. We first introduce kinetic Euclidean distance matrices to solve kinetic distance geometry problems. In distance geometry problems (DGPs), the task is to find a geometric representation, that is, an embedding, for a collection of entities consistent with pairwise distance (metric) or similarity (nonmetric) measurements. In kinetic DGPs, the twist is that the points are dynamic. And our goal is to localize them by exploiting the information about their trajectory class. We show that a semidefinite relaxation can reconstruct trajectories from incomplete, noisy, time-varying distance observations. We then introduce another distance-geometric object: hyperbolic distance matrices. Recent works have focused on hyperbolic embedding methods for low-distortion embedding of distance measurements associated with hierarchical data. We derive a semidefinite relaxation to estimate the missing distance measurements and denoise them. Further, we formalize the hyperbolic Procrustes analysis, which uses extraneous information in the form of anchor points, to uniquely identify the embedded points. Next, we address the design of learning algorithms in mixed-curvature spaces. Learning algorithms in low-dimensional mixed-curvature spaces have been limited to certain non-Euclidean neural networks. Here, we study the problem of learning a linear classifier (a perceptron) in product of Euclidean, spherical, and hyperbolic spaces, i.e., space forms. We introduce a notion of linear separation surfaces in Riemannian manifolds and use a metric that renders distances in different space forms compatible with each other and integrates them into one classifier. Lastly, we show how similarity comparisons carry information about the underlying space of geometric graphs. We introduce the ordinal spread of a distance list and relate it to the ordinal capacity of their underlying space, a notion that quantifies the space's ability to host extreme patterns in nonmetric measurements. Then, we use the distribution of random ordinal spread variables as a practical tool to identify the underlying space form

    Efficient Optimization Algorithms for Nonlinear Data Analysis

    Get PDF
    Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.Siirretty Doriast

    PROBABILISTIC PREDICTION USING EMBEDDED RANDOM PROJECTIONS OF HIGH DIMENSIONAL DATA

    Get PDF
    The explosive growth of digital data collection and processing demands a new approach to the historical engineering methods of data correlation and model creation. A new prediction methodology based on high dimensional data has been developed. Since most high dimensional data resides on a low dimensional manifold, the new prediction methodology is one of dimensional reduction with embedding into a diffusion space that allows optimal distribution along the manifold. The resulting data manifold space is then used to produce a probability density function which uses spatial weighting to influence predictions i.e. data nearer the query have greater importance than data further away. The methodology also allows data of differing phenomenology e.g. color, shape, temperature, etc to be handled by regression or clustering classification. The new methodology is first developed, validated, then applied to common engineering situations, such as critical heat flux prediction and shuttle pitch angle determination. A number of illustrative examples are given with a significant focus placed on the objective identification of two-phase flow regimes. It is shown that the new methodology is robust through accurate predictions with even a small number of data points in the diffusion space as well as flexible in the ability to handle a wide range of engineering problems

    Persistent mutual information

    Get PDF
    We study Persistent Mutual Information (PMI), the information about the past that persists into the future as a function of the length of an intervening time interval. Particularly relevant is the limit of an infinite intervening interval, which we call Permanently Persistent MI. In the logistic and tent maps PPMI is found to be the logarithm of the global periodicity for both the cases of periodic attractor and multi-band chaos. This leads us to suggest that PPMI can be a good candidate for a measure of strong emergence, by which we mean behaviour that can be forecast only by examining a specific realisation. We develop the phenomenology to interpret PMI in systems where it increases indefinitely with resolution. Among those are area-preserving maps. The scaling factor r for how PMI grows with resolution can be written in terms of the combination of information dimensions of the underlying spaces. We identify r with the extent of causality recoverable at a certain resolution, and compute it numerically for the standard map, where it is found to reflect a variety of map features, such as the number of degrees of freedom, the scaling related to existence of different types of trajectories, or even the apparent peak which we conjecture to be a direct consequence of the stickiness phenomenon. We show that in general only a certain degree of mixing between regular and chaotic orbits can result in the observed values of r. Using the same techniques we also develop a method to compute PMI through local sampling of the joint distribution of past and future. Preliminary results indicate that PMI of the Double Pendulum shows some similar features, and that in area-preserving dynamical systems there might be regimes where the joint distribution is multifractal

    Detectando agrupamientos y contornos: un estudio doble sobre representaci贸n de formas

    Get PDF
    Las formas juegan un rol clave en nuestro sistema cognitivo: en la percepci贸n de las formas yace el principio de la formaci贸n de conceptos. Siguiendo esta l铆nea de pensamiento, la escuela de la Gestalt ha estudiado extensivamente la percep- ci贸n de formas como el proceso de asir caracter铆sticas estructurales encontradas o impuestas sobre el material de est铆mulo.En resumen, tenemos dos modelos de formas: pueden existir f铆sicamente o ser un producto de nuestros procesos cogni- tivos. El primer grupo est谩 compuesto por formas que pueden ser definidas extra- yendo los contornos de objetos s贸lidos. En este trabajo nos restringiremos al caso bidimensional. Decimos entonces que las formas del primer tipo son formas planares. Atacamos el problema de detectar y reconocer formas planares. Cier- tas restricciones te贸ricas y pr谩cticas nos llevan a definir una forma planar como cualquier pedazo de l铆nea de nivel de una imagen. Comenzamos por establecer que los m茅todos a contrario existentes para de- tectar l铆neas de nivel son usualmente muy restrictivos: una curva debe ser enter- amente saliente para ser detectada. Esto se encuentra en clara contradicci贸n con la observaci贸n de que pedazos de l铆neas de nivel coinciden con los contornos de los objetos. Por lo tanto proponemos una modificaci贸n en la que el algoritmo de detecci贸n es relajado, permitiendo la detecci贸n de curvas parcialmente salientes. En un segundo acercamiento, estudiamos la interacci贸n entre dos maneras diferentes de determinar la prominencia de una l铆nea de nivel. Proponemos un esquema para competici贸n de caracter铆sticas donde el contraste y la regularidad compiten entre ellos, resultando en que solamente las l铆neas de nivel contrastadas y regulares son consderedas salientes. Una tercera contribuci贸n es un algoritmo de limpieza que analiza l铆neas de nivel salientes, descartando los pedazos no salientes y conservando los salientes. Est谩 basado en un algoritmo para detecci贸n de multisegmentos que fue extendido para trabajar con entradas peri贸dicas. Finalmente, proponemos un descriptor de formas para codificar las formas detectadas, basado en el Shape Context global. Cada l铆nea de nivel es codificada usando shape contexts, generando as铆 un nuevo descriptor semi-local. A contin- uaci贸n adaptamos un algoritmShape plays a key role in our cognitive system: in the perception of shape lies the beginning of concept formation. Following this lines of thought, the Gestalt school has extensively studied shape perception as the grasping of structural fea- tures found in or imposed upon the stimulus material. In summary, we have two models for shapes: they can exist physically or be a product of our cognitive pro- cesses. The first group is formed by shapes that can be defined by extracting contours from solid objects. In this work we will restrict ourselves to the two dimensional case. Therefore we say that these shapes of the first type are planar shapes. We ad- dress the problem of detecting and recognizing planar shapes. A few theoretical and practical restrictions lead us to define a planar shape as any piece of mean- ingful level line of an image. We begin by stating that previous a contrario methods to detect level lines are often too restrictive: a curve must be entirely salient to be detected. This is clearly in contradiction with the observation that pieces to level lines coincide with object boundaries. Therefore we propose a modification in which the detection criterion is relaxed by permitting the detection of partially salient level lines. As a second approach, we study the interaction between two different ways of determining level line saliency: contrast and regularity. We propose a scheme for feature competition where contrast and regularity contend with each other, resulting in that only contrasted and regular level lines are considered salient. A third contribution is a clean-up algorithm that analyses salient level lines, discarding the non-salient pieces and returning the salient ones. It is based on an algorithm for multisegment detection, which was extended to work with periodic inputs. Finally, we propose a shape descriptor to encode the detected shapes, based on the global Shape Context. Each level line is encoded using shape contexts, thus generating a new semi-local descriptor. We then adapt an existing a contrario shape matching algorithm to our particular case. The second group is composed by shapes that do not correspond to a solid object but are formed by integrating several solid objects. The simplest shapes in this group are arrangements of points in two dimensions. Clustering techniques might be helpful in these situations. In a seminal work from 1971, Zahn faced the problem of finding perceptual clusters according to the proximity gestalt and proposed three basic principles for clustering algorithms: (1) only inter-point distances matter, (2) stable results across executions and (3) independence from the exploration strategy. A last implicit requirement is crucial: clusters may have arbitrary shapes and detection algorithms must be capable of dealing with this. In this part we will focus on designing clustering methods that completely fulfils the aforementioned requirements and that impose minimal assumptions on the data to be clustered. We begin by assessing the problem of validating clusters in a hierarchical struc- ture. Based on nonparametric density estimation methods, we propose to com- pute the saliency of a given cluster. Then, it is possible to select the most salient clusters in the hierarchy. In practice, the method shows a preference toward com- pact clusters and we propose a simple heuristic to correct this issue. In general, graph-based hierarchical methods require to first compute the com- plete graph of interpoint distances. For this reason, hierarchical methods are often considered slow. The most usually used, and the fastest hierarchical clustering al- gorithm is based on the Minimum Spanning Tree (MST). We therefore propose an algorithm to compute the MST while avoiding the intermediate step of computing the complete set of interpoint distances. Moreover, the algorithm can be fully par- allelized with ease. The algorithm exhibits good performance for low-dimensional datasets and allows for an approximate but robust solution for higher dimensions. Finally we propose a method to select clustered subtrees from the MST, by computing simple edge statistics. The method allows naturally to retrieve clus- ters with arbitrary shapes. It also works well in noisy situations, where noise is regarded as unclustered data, allowing to separate it from clustered data. We also show that the iterative application of the algorithm allows to solve a phenomenon called masking, where highly populated clusters avoid the detection less popu- lated ones.Fil:Tepper, Mariano. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina

    Beyond the horizon of measurement: Festschrift in honor of Ingwer Borg

    Full text link
    "Modesty and academic excellence paired with trustfulness and truthfulness, these are the descriptions we would choose, if asked to describe Ingwer Borg in a nutshell. A glance at his oevre reveals a multi-talented, innovative, and cross-disciplinary scientist, who, by all means, could fill his walls with eminent names, topics, positions, and publications. This is in contrast to the frugality of his office, a scientific workbench, not a celebrity's showroom. In addition to his academic pursues he likes to venture into real life, too. This volume is organized in two parts. The first part deals with measurement issues including the application of multidimensional scaling to substantive issues but where the method is center-stage. The second part in substantive in focus and deals with questions of the organization of firms and employee attitudes." (author's abstract). Contents: Peter Ph. Mohler: Sampling from a universe of items and the De-Machiavellization of questionnaire design (9-14); Hubert Feger: Some analytical foundations of multidimensional scaling for ordinal data (15-40); Patrick J.F. Groenen, Ivo A. van der Lans: Multidimensional scaling with regional restrictions for facet theory: an application to Levy's political protest data (41-64); Arie Cohen: A comparison between factor analysis and smallest space analysis of the comprehensive scoring system of the Rorschach (65-72); Wolfgang Bilsky: On the structure of motives: beyond the 'big three' (73-84); Shlomit Levy, Dov Elizur: Values of veteran Israelis and new immigrants from the former Soviet Union: a facet analysis (85-104); Simon L. Dolan, Christian Acosta-Flamma: Values and propensity to adopt new HRM web-based technologies as determinants of HR efficiency and effectiveness: a firm level resource-based analysis (105-124); Sanjay T. Menon: Non-hierarchical emergent structure: a case study in alternative management (125-138); Christiane Spitzm眉ller, Dana M. Glenn: Organizational survey response: previous findings and an integrative framework (139-162); Thomas Staufenbiel, Maren Kroll, Cornelius J. K枚nig: Could job insecurity (also) be a motivator? (163-174); Michael Braun, Miriam Baumg盲rtner: The effects of work values and job characteristics on job satisfaction (175-188)

    Statistical Computational Topology and Geometry for Understanding Data

    Get PDF
    Here we describe three projects involving data analysis which focus on engaging statistics with the geometry and/or topology of the data. The first project involves the development and implementation of kernel density estimation for persistence diagrams. These kernel densities consider neighborhoods for every feature in the center diagram and gives to each feature an independent, orthogonal direction. The creation of kernel densities in this realm yields a (previously unavailable) full characterization of the (random) geometry of a dataspace or data distribution. In the second project, cohomology is used to guide a search for kidney exchange cycles within a kidney paired donation pool. The same technique also produces a score function that helps to predict a patient-donor pair\u27s a priori advantage within a donation pool. The resulting allocation of cycles is determined to be equitable according to a strict analysis of the allocation distribution. In the last project, a previously formulated metric between surfaces called continuous Procrustes distance (CPD) is applied to species discrimination in fossils. This project involves both the application and a rigorous comparison of the metric with its primary competitor, discrete Procrustes distance. Besides comparing the separation power of discrete and continuous Procrustes distances, the effect of surface resolution on CPD is investigated in this study
    corecore