533 research outputs found

    Numerical Investigation of Graph Spectra and Information Interpretability of Eigenvalues

    Full text link
    We undertake an extensive numerical investigation of the graph spectra of thousands regular graphs, a set of random Erd\"os-R\'enyi graphs, the two most popular types of complex networks and an evolving genetic network by using novel conceptual and experimental tools. Our objective in so doing is to contribute to an understanding of the meaning of the Eigenvalues of a graph relative to its topological and information-theoretic properties. We introduce a technique for identifying the most informative Eigenvalues of evolving networks by comparing graph spectra behavior to their algorithmic complexity. We suggest that extending techniques can be used to further investigate the behavior of evolving biological networks. In the extended version of this paper we apply these techniques to seven tissue specific regulatory networks as static example and network of a na\"ive pluripotent immune cell in the process of differentiating towards a Th17 cell as evolving example, finding the most and least informative Eigenvalues at every stage.Comment: Forthcoming in 3rd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), Lecture Notes in Bioinformatics, 201

    Metrics for Graph Comparison: A Practitioner's Guide

    Full text link
    Comparison of graph structure is a ubiquitous task in data analysis and machine learning, with diverse applications in fields such as neuroscience, cyber security, social network analysis, and bioinformatics, among others. Discovery and comparison of structures such as modular communities, rich clubs, hubs, and trees in data in these fields yields insight into the generative mechanisms and functional properties of the graph. Often, two graphs are compared via a pairwise distance measure, with a small distance indicating structural similarity and vice versa. Common choices include spectral distances (also known as λ\lambda distances) and distances based on node affinities. However, there has of yet been no comparative study of the efficacy of these distance measures in discerning between common graph topologies and different structural scales. In this work, we compare commonly used graph metrics and distance measures, and demonstrate their ability to discern between common topological features found in both random graph models and empirical datasets. We put forward a multi-scale picture of graph structure, in which the effect of global and local structure upon the distance measures is considered. We make recommendations on the applicability of different distance measures to empirical graph data problem based on this multi-scale view. Finally, we introduce the Python library NetComp which implements the graph distances used in this work

    Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

    Full text link
    We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

    Morphology of three-body quantum states from machine learning

    Get PDF
    The relative motion of three impenetrable particles on a ring, in our case two identical fermions and one impurity, is isomorphic to a triangular quantum billiard. Depending on the ratio κ of the impurity and fermion masses, the billiards can be integrable or non-integrable (also referred to in the main text as chaotic). To set the stage, we first investigate the energy level distributions of the billiards as a function of 1/κ ∈ [0, 1] and find no evidence of integrable cases beyond the limiting values 1/κ = 1 and 1/κ = 0. Then, we use machine learning tools to analyze properties of probability distributions of individual quantum states. We find that convolutional neural networks can correctly classify integrable and non-integrable states. The decisive features of the wave functions are the normalization and a large number of zero elements, corresponding to the existence of a nodal line. The network achieves typical accuracies of 97%, suggesting that machine learning tools can be used to analyze and classify the morphology of probability densities obtained in theory or experiment

    Selection of principal variables through a modified Gram–Schmidt process with and without supervision

    Get PDF
    In various situations requiring empirical model building from highly multivariate measurements, modelling based on partial least squares regression (PLSR) may often provide efficient low-dimensional model solutions. In unsupervised situations, the same may be true for principal component analysis (PCA). In both cases, however, it is also of interest to identify subsets of the measured variables useful for obtaining sparser but still comparable models without significant loss of information and performance. In the present paper, we propose a voting approach for sparse overall maximisation of variance analogous to PCA and a similar alternative for deriving sparse regression models influenced closely related to the PLSR method. Both cases yield pivoting strategies for a modified Gram–Schmidt process and its corresponding (partial) QRfactorisation of the underlying data matrix to manage the variable selection process. The proposed methods include score and loading plot possibilities that are acknowledged for providing efficient interpretations of the related PCA and PLS models in chemometric applications.Selection of principal variables through a modified Gram–Schmidt process with and without supervisionpublishedVersio

    Contribuições multivariadas na decomposição de uma série temporal

    Get PDF
    One of the goals of time series analysis is to extract essential features from the series for exploratory or predictive purposes. The SSA is a method used for this intent, transforming the original series into a Hankel matrix, also called a trajectory matrix. Its only parameter is the so-called window length. The decomposition into singular values of the trajectory matrix allows the separation of the series components since the structure in terms of singular values and vectors is somehow associated with the trend, oscillatory component, and noise. In turn, the visualization of the steps of that method is little explored or lacks interpretability. In this work, we take advantage of the results of a particular decomposition into singular values using the NIPALS algorithm to implement a graphical display of the principal components using HJ-biplots, naming the method SSA-HJ-biplot. It is an exploratory tool whose main objective is to increase the visual interpretability of the SSA, facilitating the grouping step and, consequently, identifying characteristics of the time series. By exploring the properties of the HJ-biplots and adjusting the window length to half the series length, rows and columns of the trajectory matrix can be represented in the same SSA-HJ-biplot simultaneously and optimally. To circumvent the potential problem of structural changes in the time series, which can make it challenging to visualize the separation of the components, we propose a methodology for the detection of change points and the application of the SSA-HJ-biplot in homogeneous intervals, that is, between change points. This detection approach is based on sudden changes in the direction of the principal components, which are evaluated by a distance metric created for this purpose. Finally, we developed another visualization method based on SSA to estimate the dominant periodicities of a time series through geometric patterns, which we call the SSA Biplot Area. In this part of the research, we implemented a package in R called areabiplot, available on the Comprehensive R Archive Network (CRAN).Um dos objetivos da análise de séries temporais é extrair características essenciais da série para fins exploratórios ou preditivos. A Análise Espectral Singular (SSA) é um método utilizado para esse fim, transformando a série original em uma matriz de Hankel, também chamada de matriz trajetória. O seu único parâmetro é o chamado comprimento da janela. A decomposição em valores singulares da matriz trajetória permite a separação das componentes da série, uma vez que a estrutura em termos de valores e vetores singulares está de alguma forma associada à tendência, componente oscilatória e ruído. Por sua vez, a visualização das etapas daquele método é pouco explorada ou carece de interpretabilidade. Neste trabalho, aproveitamos os resultados de uma particular decomposição em valores singulares através do algoritmo NIPALS para implementar uma exibição gráfica das componentes principais usando HJ-biplots, nomeando-o método SSA-HJ-biplot. Trata-se de uma ferramenta de natureza exploratória e cujo principal objetivo é aumentar a interpretabilidade visual da SSA, facilitando o passo de agrupamento e, consequentemente, identificar características da série temporal. Ao explorar as propriedades dos HJ-biplots e ajustar o comprimento da janela para a metade do comprimento série, linhas e colunas da matriz trajetória podem ser representadas em um mesmo SSA-HJ-biplot simultaneamente e de maneira ótima. Para contornar o potencial problema de mudanças estruturais na série temporal, que podem dificultar a visualização da separação das componentes, propomos uma metodologia para a detecção de change points e a aplicação do SSA-HJ-biplot em intervalos homogéneos, ou seja, entre change points. Essa abordagem de detecção é baseada em mudanças bruscas na direção das componentes principais, que são avaliadas por uma métrica de distância criada para esse fim. Por fim, desenvolvemos um outro método de visualização baseado na SSA para estimar as periodicidades dominantes de uma série temporal por meio de padrões geométricos, ao que chamamos SSA Área biplot. Nesta parte da investigação, implementámos em R um pacote chamado areabiplot, disponível na Comprehensive R Archive Network (CRAN).Programa Doutoral em Matemátic
    corecore