10,990 research outputs found

    High throughput powder diffraction: II Applications of clustering methods and multivariate data analysis

    Get PDF
    In high throughput crystallography is possible to accumulate over 1000 powder diffraction patterns on a series of related compounds, often polymorphs. We present a method that can analyse such data, automatically sort the patterns into related clusters or classes, characterise each cluster and identify any unusual samples containing, for example, unknown or unexpected polymorphs. Mixtures may be analysed quantitatively if a database of pure phases is available. A key component of the method is a set of visualisation tools based on dendrograms, cluster analysis, pie charts, principal component based score plots and metric multidimensional scaling. Applications are presented to pharmaceutical data, and inorganic compounds. The procedures have been incorporated into the PolySNAP commercial computer software

    A Statistical Toolbox For Mining And Modeling Spatial Data

    Get PDF
    Most data mining projects in spatial economics start with an evaluation of a set of attribute variables on a sample of spatial entities, looking for the existence and strength of spatial autocorrelation, based on the Moran’s and the Geary’s coefficients, the adequacy of which is rarely challenged, despite the fact that when reporting on their properties, many users seem likely to make mistakes and to foster confusion. My paper begins by a critical appraisal of the classical definition and rational of these indices. I argue that while intuitively founded, they are plagued by an inconsistency in their conception. Then, I propose a principled small change leading to corrected spatial autocorrelation coefficients, which strongly simplifies their relationship, and opens the way to an augmented toolbox of statistical methods of dimension reduction and data visualization, also useful for modeling purposes. A second section presents a formal framework, adapted from recent work in statistical learning, which gives theoretical support to our definition of corrected spatial autocorrelation coefficients. More specifically, the multivariate data mining methods presented here, are easily implementable on the existing (free) software, yield methods useful to exploit the proposed corrections in spatial data analysis practice, and, from a mathematical point of view, whose asymptotic behavior, already studied in a series of papers by Belkin & Niyogi, suggests that they own qualities of robustness and a limited sensitivity to the Modifiable Areal Unit Problem (MAUP), valuable in exploratory spatial data analysis

    Visualization of Categorical Response Models - from Data Glyphs to Parameter Glyphs

    Get PDF
    The multinomial logit model is the most widely used model for nominal multi-category responses. One problem with the model is that many parameters are involved, another that interpretation of parameters is much harder than for linear models because the model is non-linear. Both problems can profit from graphical representations. We propose to visualize the effect strengths by star plots, where one star collects all the parameters connected to one explanatory variable. In contrast to conventional star plots, which are used to represent data, the plots represent parameters and are considered as parameter glyphs. The set of stars for a fitted model makes the main features of the effects of explanatory variables on the response variable easily accessible. The method is extended to ordinal models and illustrated by several data sets

    DIMAL: Deep Isometric Manifold Learning Using Sparse Geodesic Sampling

    Full text link
    This paper explores a fully unsupervised deep learning approach for computing distance-preserving maps that generate low-dimensional embeddings for a certain class of manifolds. We use the Siamese configuration to train a neural network to solve the problem of least squares multidimensional scaling for generating maps that approximately preserve geodesic distances. By training with only a few landmarks, we show a significantly improved local and nonlocal generalization of the isometric mapping as compared to analogous non-parametric counterparts. Importantly, the combination of a deep-learning framework with a multidimensional scaling objective enables a numerical analysis of network architectures to aid in understanding their representation power. This provides a geometric perspective to the generalizability of deep learning.Comment: 10 pages, 11 Figure
    corecore