533 research outputs found
Numerical Investigation of Graph Spectra and Information Interpretability of Eigenvalues
We undertake an extensive numerical investigation of the graph spectra of
thousands regular graphs, a set of random Erd\"os-R\'enyi graphs, the two most
popular types of complex networks and an evolving genetic network by using
novel conceptual and experimental tools. Our objective in so doing is to
contribute to an understanding of the meaning of the Eigenvalues of a graph
relative to its topological and information-theoretic properties. We introduce
a technique for identifying the most informative Eigenvalues of evolving
networks by comparing graph spectra behavior to their algorithmic complexity.
We suggest that extending techniques can be used to further investigate the
behavior of evolving biological networks. In the extended version of this paper
we apply these techniques to seven tissue specific regulatory networks as
static example and network of a na\"ive pluripotent immune cell in the process
of differentiating towards a Th17 cell as evolving example, finding the most
and least informative Eigenvalues at every stage.Comment: Forthcoming in 3rd International Work-Conference on Bioinformatics
and Biomedical Engineering (IWBBIO), Lecture Notes in Bioinformatics, 201
Metrics for Graph Comparison: A Practitioner's Guide
Comparison of graph structure is a ubiquitous task in data analysis and
machine learning, with diverse applications in fields such as neuroscience,
cyber security, social network analysis, and bioinformatics, among others.
Discovery and comparison of structures such as modular communities, rich clubs,
hubs, and trees in data in these fields yields insight into the generative
mechanisms and functional properties of the graph.
Often, two graphs are compared via a pairwise distance measure, with a small
distance indicating structural similarity and vice versa. Common choices
include spectral distances (also known as distances) and distances
based on node affinities. However, there has of yet been no comparative study
of the efficacy of these distance measures in discerning between common graph
topologies and different structural scales.
In this work, we compare commonly used graph metrics and distance measures,
and demonstrate their ability to discern between common topological features
found in both random graph models and empirical datasets. We put forward a
multi-scale picture of graph structure, in which the effect of global and local
structure upon the distance measures is considered. We make recommendations on
the applicability of different distance measures to empirical graph data
problem based on this multi-scale view. Finally, we introduce the Python
library NetComp which implements the graph distances used in this work
Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
We explore the trade-offs of performing linear algebra using Apache Spark,
compared to traditional C and MPI implementations on HPC platforms. Spark is
designed for data analytics on cluster computing platforms with access to local
disks and is optimized for data-parallel tasks. We examine three widely-used
and important matrix factorizations: NMF (for physical plausability), PCA (for
its ubiquity) and CX (for data interpretability). We apply these methods to
TB-sized problems in particle physics, climate modeling and bioimaging. The
data matrices are tall-and-skinny which enable the algorithms to map
conveniently into Spark's data-parallel model. We perform scaling experiments
on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide
tuning guidance to obtain high performance
Morphology of three-body quantum states from machine learning
The relative motion of three impenetrable particles on a ring, in our case two identical fermions and one impurity, is isomorphic to a triangular quantum billiard. Depending on the ratio κ of the impurity and fermion masses, the billiards can be integrable or non-integrable (also referred to in the main text as chaotic). To set the stage, we first investigate the energy level distributions of the billiards as a function of 1/κ ∈ [0, 1] and find no evidence of integrable cases beyond the limiting values 1/κ = 1 and 1/κ = 0. Then, we use machine learning tools to analyze properties of probability distributions of individual quantum states. We find that convolutional neural networks can correctly classify integrable and non-integrable states. The decisive features of the wave functions are the normalization and a large number of zero elements, corresponding to the existence of a nodal line. The network achieves typical accuracies of 97%, suggesting that machine learning tools can be used to analyze and classify the morphology of probability densities obtained in theory or experiment
Selection of principal variables through a modified Gram–Schmidt process with and without supervision
In various situations requiring empirical model building from highly multivariate measurements, modelling based on partial least squares regression (PLSR) may often provide efficient low-dimensional model solutions. In unsupervised situations, the same may be true for principal component analysis (PCA). In both cases, however, it is also of interest to identify subsets of the measured variables useful for obtaining sparser but still comparable models without significant loss of information and performance. In the present paper, we propose a voting approach for sparse overall maximisation of variance analogous to PCA and a similar alternative for deriving sparse regression models influenced closely related to the PLSR method. Both cases yield pivoting strategies for a modified Gram–Schmidt process and its corresponding (partial) QRfactorisation of the underlying data matrix to manage the variable selection process. The proposed methods include score and loading plot possibilities that are acknowledged for providing efficient interpretations of the related PCA and PLS models in chemometric applications.Selection of principal variables through a modified Gram–Schmidt process with and without supervisionpublishedVersio
Contribuições multivariadas na decomposição de uma série temporal
One of the goals of time series analysis is to extract essential features from the
series for exploratory or predictive purposes. The SSA is a method used for this
intent, transforming the original series into a Hankel matrix, also called a trajectory
matrix. Its only parameter is the so-called window length. The decomposition into
singular values of the trajectory matrix allows the separation of the series components
since the structure in terms of singular values and vectors is somehow
associated with the trend, oscillatory component, and noise. In turn, the visualization
of the steps of that method is little explored or lacks interpretability. In this
work, we take advantage of the results of a particular decomposition into singular
values using the NIPALS algorithm to implement a graphical display of the principal
components using HJ-biplots, naming the method SSA-HJ-biplot. It is an
exploratory tool whose main objective is to increase the visual interpretability of the
SSA, facilitating the grouping step and, consequently, identifying characteristics of
the time series. By exploring the properties of the HJ-biplots and adjusting the
window length to half the series length, rows and columns of the trajectory matrix
can be represented in the same SSA-HJ-biplot simultaneously and optimally. To
circumvent the potential problem of structural changes in the time series, which
can make it challenging to visualize the separation of the components, we propose
a methodology for the detection of change points and the application of the
SSA-HJ-biplot in homogeneous intervals, that is, between change points. This
detection approach is based on sudden changes in the direction of the principal
components, which are evaluated by a distance metric created for this purpose.
Finally, we developed another visualization method based on SSA to estimate the
dominant periodicities of a time series through geometric patterns, which we call
the SSA Biplot Area. In this part of the research, we implemented a package in R
called areabiplot, available on the Comprehensive R Archive Network (CRAN).Um dos objetivos da análise de séries temporais é extrair características essenciais
da série para fins exploratórios ou preditivos. A Análise Espectral Singular (SSA) é
um método utilizado para esse fim, transformando a série original em uma matriz
de Hankel, também chamada de matriz trajetória. O seu único parâmetro é o
chamado comprimento da janela. A decomposição em valores singulares da matriz
trajetória permite a separação das componentes da série, uma vez que a estrutura
em termos de valores e vetores singulares está de alguma forma associada à
tendência, componente oscilatória e ruído. Por sua vez, a visualização das etapas
daquele método é pouco explorada ou carece de interpretabilidade. Neste trabalho,
aproveitamos os resultados de uma particular decomposição em valores singulares
através do algoritmo NIPALS para implementar uma exibição gráfica das componentes
principais usando HJ-biplots, nomeando-o método SSA-HJ-biplot. Trata-se
de uma ferramenta de natureza exploratória e cujo principal objetivo é aumentar a
interpretabilidade visual da SSA, facilitando o passo de agrupamento e, consequentemente,
identificar características da série temporal. Ao explorar as propriedades
dos HJ-biplots e ajustar o comprimento da janela para a metade do comprimento
série, linhas e colunas da matriz trajetória podem ser representadas em um mesmo
SSA-HJ-biplot simultaneamente e de maneira ótima. Para contornar o potencial
problema de mudanças estruturais na série temporal, que podem dificultar a visualização
da separação das componentes, propomos uma metodologia para a detecção
de change points e a aplicação do SSA-HJ-biplot em intervalos homogéneos, ou
seja, entre change points. Essa abordagem de detecção é baseada em mudanças
bruscas na direção das componentes principais, que são avaliadas por uma métrica
de distância criada para esse fim. Por fim, desenvolvemos um outro método de visualização
baseado na SSA para estimar as periodicidades dominantes de uma série
temporal por meio de padrões geométricos, ao que chamamos SSA Área biplot.
Nesta parte da investigação, implementámos em R um pacote chamado areabiplot,
disponível na Comprehensive R Archive Network (CRAN).Programa Doutoral em Matemátic
Recommended from our members
Interpretable Deep Learning: Beyond Feature-Importance with Concept-based Explanations
Deep Neural Network (DNN) models are challenging to interpret because of their highly complex and non-linear nature. This lack of interpretability (1) inhibits adoption within safety critical applications, (2) makes it challenging to debug existing models, and (3) prevents us from extracting valuable knowledge. Explainable AI (XAI) research aims to increase the transparency of DNN model behaviour to improve interpretability. Feature importance explanations are the most popular interpretability approaches. They show the importance of each input feature (e.g., pixel, patch, word vector) to the model’s prediction. However, we hypothesise that feature importance explanations have two main shortcomings concerning their inability to describe the complexity of a DNN behaviour with sufficient (1) fidelity and (2) richness. Fidelity and richness are essential because different tasks, users, and data types require specific levels of trust and understanding.
The goal of this thesis is to showcase the shortcomings of feature importance explanations and to develop explanation techniques that describe the DNN behaviour with greater richness. We design an adversarial explanation attack to highlight the infidelity and inadequacy of feature importance explanations. Our attack modifies the parameters of a pre-trained model. It uses fairness as a proxy measure for the fidelity of an explanation method to demonstrate that the apparent importance of a feature does not reveal anything reliable about the fairness of a model. Hence, regulators or auditors should not rely on feature importance explanations to measure or enforce standards of fairness.
As one solution, we formulate five different levels of the semantic richness of explanations to evaluate explanations and propose two function decomposition frameworks (DGINN and CME) to extract explanations from DNNs at a semantically higher level than feature importance explanations. Concept-based approaches provide explanations in terms of atomic human-understandable units (e.g., wheel or door) rather than individual raw features (e.g., pixels or characters). Our function decomposition frameworks can extract specific class representations from 5% of the network parameters and concept representations with an average-per-concept F1 score of 86%. Finally, the CME framework makes it possible to compare concept-based explanations, contributing to the scientific rigour of evaluating interpretability methods.The author would like to appreciate the generous sponsorship of the Engineering and Physical Sciences Research Council (EPSRC), The Department of Computer Science and Technology at the University of Cambridge, and Tenyks, Inc
- …