Search CORE

533 research outputs found

Numerical Investigation of Graph Spectra and Information Interpretability of Eigenvalues

Author: A.N. Kolmogorov
D.J. Watts
G.J. Chaitin
I. Farkasa
J.-P. Delahaye
L.A. Levin
P. Erdös
S. Skiena
Publication venue
Publication date: 01/01/2015
Field of study

We undertake an extensive numerical investigation of the graph spectra of thousands regular graphs, a set of random Erd\"os-R\'enyi graphs, the two most popular types of complex networks and an evolving genetic network by using novel conceptual and experimental tools. Our objective in so doing is to contribute to an understanding of the meaning of the Eigenvalues of a graph relative to its topological and information-theoretic properties. We introduce a technique for identifying the most informative Eigenvalues of evolving networks by comparing graph spectra behavior to their algorithmic complexity. We suggest that extending techniques can be used to further investigate the behavior of evolving biological networks. In the extended version of this paper we apply these techniques to seven tissue specific regulatory networks as static example and network of a na\"ive pluripotent immune cell in the process of differentiating towards a Th17 cell as evolving example, finding the most and least informative Eigenvalues at every stage.Comment: Forthcoming in 3rd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), Lecture Notes in Bioinformatics, 201

arXiv.org e-Print Archive

Crossref

Metrics for Graph Comparison: A Practitioner's Guide

Author: Meyer Francois G.
Wills Peter
Publication venue
Publication date: 16/12/2019
Field of study

Comparison of graph structure is a ubiquitous task in data analysis and machine learning, with diverse applications in fields such as neuroscience, cyber security, social network analysis, and bioinformatics, among others. Discovery and comparison of structures such as modular communities, rich clubs, hubs, and trees in data in these fields yields insight into the generative mechanisms and functional properties of the graph. Often, two graphs are compared via a pairwise distance measure, with a small distance indicating structural similarity and vice versa. Common choices include spectral distances (also known as

\lambda

distances) and distances based on node affinities. However, there has of yet been no comparative study of the efficacy of these distance measures in discerning between common graph topologies and different structural scales. In this work, we compare commonly used graph metrics and distance measures, and demonstrate their ability to discern between common topological features found in both random graph models and empirical datasets. We put forward a multi-scale picture of graph structure, in which the effect of global and local structure upon the distance measures is considered. We make recommendations on the applicability of different distance measures to empirical graph data problem based on this multi-scale view. Finally, we introduce the Python library NetComp which implements the graph distances used in this work

arXiv.org e-Print Archive

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

Author: Canon Shane
Chhugani Jatin
Demmel James
Devarakonda Aditya
Gerhardt Lisa
Gittens Alex
Harrell Jim
Kottalam Jey
Krishnamurthy Venkat
Liu Jialin
Mahoney Michael W.
Maschhoff Kristyn
Prabhat
Racah Evan
Ringenburg Michael
Sharma Pramod
Yang Jiyan
Publication venue
Publication date: 12/05/2016
Field of study

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

arXiv.org e-Print Archive

eScholarship - University of California

Morphology of three-body quantum states from machine learning

Author: Hammer Hans Werner
Huber David
Marchukov Oleksandr V.
Volosniev Artem
Publication venue: 'IOP Publishing'
Publication date: 01/01/2021
Field of study

The relative motion of three impenetrable particles on a ring, in our case two identical fermions and one impurity, is isomorphic to a triangular quantum billiard. Depending on the ratio κ of the impurity and fermion masses, the billiards can be integrable or non-integrable (also referred to in the main text as chaotic). To set the stage, we first investigate the energy level distributions of the billiards as a function of 1/κ ∈ [0, 1] and find no evidence of integrable cases beyond the limiting values 1/κ = 1 and 1/κ = 0. Then, we use machine learning tools to analyze properties of probability distributions of individual quantum states. We find that convolutional neural networks can correctly classify integrable and non-integrable states. The decisive features of the wave functions are the normalization and a large number of zero elements, corresponding to the existence of a nodal line. The network achieves typical accuracies of 97%, suggesting that machine learning tools can be used to analyze and classify the morphology of probability densities obtained in theory or experiment

IST Austria: PubRep (Institute of Science and Technology)

Selection of principal variables through a modified Gram–Schmidt process with and without supervision

Author: Indahl Ulf Geir
Liland Kristian Hovde
Næs Tormod
Skogholt Joakim
Smilde Age K.
Publication venue
Publication date: 01/01/2023
Field of study

In various situations requiring empirical model building from highly multivariate measurements, modelling based on partial least squares regression (PLSR) may often provide efficient low-dimensional model solutions. In unsupervised situations, the same may be true for principal component analysis (PCA). In both cases, however, it is also of interest to identify subsets of the measured variables useful for obtaining sparser but still comparable models without significant loss of information and performance. In the present paper, we propose a voting approach for sparse overall maximisation of variance analogous to PCA and a similar alternative for deriving sparse regression models influenced closely related to the PLSR method. Both cases yield pivoting strategies for a modified Gram–Schmidt process and its corresponding (partial) QRfactorisation of the underlying data matrix to manage the variable selection process. The proposed methods include score and loading plot possibilities that are acknowledged for providing efficient interpretations of the related PCA and PLS models in chemometric applications.Selection of principal variables through a modified Gram–Schmidt process with and without supervisionpublishedVersio

Brage NMBU

NOFIMA Repository

Contribuições multivariadas na decomposição de uma série temporal

Author: Silva Alberto Oliveira da
Publication venue
Publication date: 30/01/2023
Field of study

One of the goals of time series analysis is to extract essential features from the series for exploratory or predictive purposes. The SSA is a method used for this intent, transforming the original series into a Hankel matrix, also called a trajectory matrix. Its only parameter is the so-called window length. The decomposition into singular values of the trajectory matrix allows the separation of the series components since the structure in terms of singular values and vectors is somehow associated with the trend, oscillatory component, and noise. In turn, the visualization of the steps of that method is little explored or lacks interpretability. In this work, we take advantage of the results of a particular decomposition into singular values using the NIPALS algorithm to implement a graphical display of the principal components using HJ-biplots, naming the method SSA-HJ-biplot. It is an exploratory tool whose main objective is to increase the visual interpretability of the SSA, facilitating the grouping step and, consequently, identifying characteristics of the time series. By exploring the properties of the HJ-biplots and adjusting the window length to half the series length, rows and columns of the trajectory matrix can be represented in the same SSA-HJ-biplot simultaneously and optimally. To circumvent the potential problem of structural changes in the time series, which can make it challenging to visualize the separation of the components, we propose a methodology for the detection of change points and the application of the SSA-HJ-biplot in homogeneous intervals, that is, between change points. This detection approach is based on sudden changes in the direction of the principal components, which are evaluated by a distance metric created for this purpose. Finally, we developed another visualization method based on SSA to estimate the dominant periodicities of a time series through geometric patterns, which we call the SSA Biplot Area. In this part of the research, we implemented a package in R called areabiplot, available on the Comprehensive R Archive Network (CRAN).Um dos objetivos da análise de séries temporais é extrair características essenciais da série para fins exploratórios ou preditivos. A Análise Espectral Singular (SSA) é um método utilizado para esse fim, transformando a série original em uma matriz de Hankel, também chamada de matriz trajetória. O seu único parâmetro é o chamado comprimento da janela. A decomposição em valores singulares da matriz trajetória permite a separação das componentes da série, uma vez que a estrutura em termos de valores e vetores singulares está de alguma forma associada à tendência, componente oscilatória e ruído. Por sua vez, a visualização das etapas daquele método é pouco explorada ou carece de interpretabilidade. Neste trabalho, aproveitamos os resultados de uma particular decomposição em valores singulares através do algoritmo NIPALS para implementar uma exibição gráfica das componentes principais usando HJ-biplots, nomeando-o método SSA-HJ-biplot. Trata-se de uma ferramenta de natureza exploratória e cujo principal objetivo é aumentar a interpretabilidade visual da SSA, facilitando o passo de agrupamento e, consequentemente, identificar características da série temporal. Ao explorar as propriedades dos HJ-biplots e ajustar o comprimento da janela para a metade do comprimento série, linhas e colunas da matriz trajetória podem ser representadas em um mesmo SSA-HJ-biplot simultaneamente e de maneira ótima. Para contornar o potencial problema de mudanças estruturais na série temporal, que podem dificultar a visualização da separação das componentes, propomos uma metodologia para a detecção de change points e a aplicação do SSA-HJ-biplot em intervalos homogéneos, ou seja, entre change points. Essa abordagem de detecção é baseada em mudanças bruscas na direção das componentes principais, que são avaliadas por uma métrica de distância criada para esse fim. Por fim, desenvolvemos um outro método de visualização baseado na SSA para estimar as periodicidades dominantes de uma série temporal por meio de padrões geométricos, ao que chamamos SSA Área biplot. Nesta parte da investigação, implementámos em R um pacote chamado areabiplot, disponível na Comprehensive R Archive Network (CRAN).Programa Doutoral em Matemátic

Repositório Institucional da Universidade de Aveiro

Recommended from our members

Interpretable Deep Learning: Beyond Feature-Importance with Concept-based Explanations

Author: Dimanov Botty
Publication venue: University of Cambridge
Publication date: 30/12/2020
Field of study

Deep Neural Network (DNN) models are challenging to interpret because of their highly complex and non-linear nature. This lack of interpretability (1) inhibits adoption within safety critical applications, (2) makes it challenging to debug existing models, and (3) prevents us from extracting valuable knowledge. Explainable AI (XAI) research aims to increase the transparency of DNN model behaviour to improve interpretability. Feature importance explanations are the most popular interpretability approaches. They show the importance of each input feature (e.g., pixel, patch, word vector) to the model’s prediction. However, we hypothesise that feature importance explanations have two main shortcomings concerning their inability to describe the complexity of a DNN behaviour with sufficient (1) fidelity and (2) richness. Fidelity and richness are essential because different tasks, users, and data types require specific levels of trust and understanding. The goal of this thesis is to showcase the shortcomings of feature importance explanations and to develop explanation techniques that describe the DNN behaviour with greater richness. We design an adversarial explanation attack to highlight the infidelity and inadequacy of feature importance explanations. Our attack modifies the parameters of a pre-trained model. It uses fairness as a proxy measure for the fidelity of an explanation method to demonstrate that the apparent importance of a feature does not reveal anything reliable about the fairness of a model. Hence, regulators or auditors should not rely on feature importance explanations to measure or enforce standards of fairness. As one solution, we formulate five different levels of the semantic richness of explanations to evaluate explanations and propose two function decomposition frameworks (DGINN and CME) to extract explanations from DNNs at a semantically higher level than feature importance explanations. Concept-based approaches provide explanations in terms of atomic human-understandable units (e.g., wheel or door) rather than individual raw features (e.g., pixels or characters). Our function decomposition frameworks can extract specific class representations from 5% of the network parameters and concept representations with an average-per-concept F1 score of 86%. Finally, the CME framework makes it possible to compare concept-based explanations, contributing to the scientific rigour of evaluating interpretability methods.The author would like to appreciate the generous sponsorship of the Engineering and Physical Sciences Research Council (EPSRC), The Department of Computer Science and Technology at the University of Cambridge, and Tenyks, Inc

Apollo (Cambridge)