361 research outputs found
Network Specialization to Explain the Performance of Sparse Neural Networks
Recently it has been shown that sparse neural networks perform better than dense networks with similar number of parameters. In addition, large overparameterized networks have been shown to contain sparse networks which, while trained in isolation, reach or exceed the performance of the large model. However, the methods to explain the success of sparse networks are still lacking. In this work I study the performance of sparse networks using network’s activation regions and patterns, concepts from the neural network expressivity literature.
I define network specialization, a novel concept that considers how distinctly a feed forward neural network (FFNN) has learned to processes high level features in the data. I propose Minimal Blanket Hypervolume (MBH) algorithm to measure the specialization of a FFNN. It finds parts of the input space that the network associates with some user-defined high level feature, and compares their hypervolume to the hypervolume of the input space. My hypothesis is that sparse networks specialize more to high level features than dense networks with the same number of hidden network parameters.
Network specialization and MBH also contribute to the interpretability of deep neural networks (DNNs). The capability to learn representations on several levels of abstraction is at the core of deep learning, and MBH enables numerical evaluation of how specialized a FFNN is w.r.t. any abstract concept (a high level feature) that can be embodied in an input. MBH can be applied to FFNNs in any problem domain, e.g. visual object recognition, natural language processing, or speech recognition. It also enables comparison between FFNNs with different architectures, since the metric is calculated in the common input space.
I test different pruning and initialization scenarios on the MNIST Digits and Fashion datasets. I find that sparse networks approximate more complex functions, exploit redundancy in the data, and specialize to high level features better than dense, fully parameterized networks with the same number of hidden network parameters
MaxCorrMGNN: A Multi-Graph Neural Network Framework for Generalized Multimodal Fusion of Medical Data for Outcome Prediction
With the emergence of multimodal electronic health records, the evidence for
an outcome may be captured across multiple modalities ranging from clinical to
imaging and genomic data. Predicting outcomes effectively requires fusion
frameworks capable of modeling fine-grained and multi-faceted complex
interactions between modality features within and across patients. We develop
an innovative fusion approach called MaxCorr MGNN that models non-linear
modality correlations within and across patients through
Hirschfeld-Gebelein-Renyi maximal correlation (MaxCorr) embeddings, resulting
in a multi-layered graph that preserves the identities of the modalities and
patients. We then design, for the first time, a generalized multi-layered graph
neural network (MGNN) for task-informed reasoning in multi-layered graphs, that
learns the parameters defining patient-modality graph connectivity and message
passing in an end-to-end fashion. We evaluate our model an outcome prediction
task on a Tuberculosis (TB) dataset consistently outperforming several
state-of-the-art neural, graph-based and traditional fusion techniques.Comment: To appear in ML4MHD workshop at ICML 202
A distributed neural network architecture for dynamic sensor selection with application to bandwidth-constrained body-sensor networks
We propose a dynamic sensor selection approach for deep neural networks
(DNNs), which is able to derive an optimal sensor subset selection for each
specific input sample instead of a fixed selection for the entire dataset. This
dynamic selection is jointly learned with the task model in an end-to-end way,
using the Gumbel-Softmax trick to allow the discrete decisions to be learned
through standard backpropagation. We then show how we can use this dynamic
selection to increase the lifetime of a wireless sensor network (WSN) by
imposing constraints on how often each node is allowed to transmit. We further
improve performance by including a dynamic spatial filter that makes the
task-DNN more robust against the fact that it now needs to be able to handle a
multitude of possible node subsets. Finally, we explain how the selection of
the optimal channels can be distributed across the different nodes in a WSN. We
validate this method on a use case in the context of body-sensor networks,
where we use real electroencephalography (EEG) sensor data to emulate an EEG
sensor network. We analyze the resulting trade-offs between transmission load
and task accuracy
Homological Neural Networks: A Sparse Architecture for Multivariate Complexity
The rapid progress of Artificial Intelligence research came with the
development of increasingly complex deep learning models, leading to growing
challenges in terms of computational complexity, energy efficiency and
interpretability. In this study, we apply advanced network-based information
filtering techniques to design a novel deep neural network unit characterized
by a sparse higher-order graphical architecture built over the homological
structure of underlying data. We demonstrate its effectiveness in two
application domains which are traditionally challenging for deep learning:
tabular data and time series regression problems. Results demonstrate the
advantages of this novel design which can tie or overcome the results of
state-of-the-art machine learning and deep learning models using only a
fraction of parameters
On the Robustness of Sparse Counterfactual Explanations to Adverse Perturbations
Counterfactual explanations (CEs) are a powerful means for understanding how
decisions made by algorithms can be changed. Researchers have proposed a number
of desiderata that CEs should meet to be practically useful, such as requiring
minimal effort to enact, or complying with causal models. We consider a further
aspect to improve the usability of CEs: robustness to adverse perturbations,
which may naturally happen due to unfortunate circumstances. Since CEs
typically prescribe a sparse form of intervention (i.e., only a subset of the
features should be changed), we study the effect of addressing robustness
separately for the features that are recommended to be changed and those that
are not. Our definitions are workable in that they can be incorporated as
penalty terms in the loss functions that are used for discovering CEs. To
experiment with robustness, we create and release code where five data sets
(commonly used in the field of fair and explainable machine learning) have been
enriched with feature-specific annotations that can be used to sample
meaningful perturbations. Our experiments show that CEs are often not robust
and, if adverse perturbations take place (even if not worst-case), the
intervention they prescribe may require a much larger cost than anticipated, or
even become impossible. However, accounting for robustness in the search
process, which can be done rather easily, allows discovering robust CEs
systematically. Robust CEs make additional intervention to contrast
perturbations much less costly than non-robust CEs. We also find that
robustness is easier to achieve for the features to change, posing an important
point of consideration for the choice of what counterfactual explanation is
best for the user. Our code is available at:
https://github.com/marcovirgolin/robust-counterfactuals
Generalization and Estimation Error Bounds for Model-based Neural Networks
Model-based neural networks provide unparalleled performance for various
tasks, such as sparse coding and compressed sensing problems. Due to the strong
connection with the sensing model, these networks are interpretable and inherit
prior structure of the problem. In practice, model-based neural networks
exhibit higher generalization capability compared to ReLU neural networks.
However, this phenomenon was not addressed theoretically. Here, we leverage
complexity measures including the global and local Rademacher complexities, in
order to provide upper bounds on the generalization and estimation errors of
model-based networks. We show that the generalization abilities of model-based
networks for sparse recovery outperform those of regular ReLU networks, and
derive practical design rules that allow to construct model-based networks with
guaranteed high generalization. We demonstrate through a series of experiments
that our theoretical insights shed light on a few behaviours experienced in
practice, including the fact that ISTA and ADMM networks exhibit higher
generalization abilities (especially for small number of training samples),
compared to ReLU networks
Identifying Interpretable Visual Features in Artificial and Biological Neural Systems
Single neurons in neural networks are often interpretable in that they
represent individual, intuitively meaningful features. However, many neurons
exhibit , i.e., they represent multiple unrelated
features. A recent hypothesis proposes that features in deep networks may be
represented in , i.e., on non-orthogonal axes by
multiple neurons, since the number of possible interpretable features in
natural data is generally larger than the number of neurons in a given network.
Accordingly, we should be able to find meaningful directions in activation
space that are not aligned with individual neurons. Here, we propose (1) an
automated method for quantifying visual interpretability that is validated
against a large database of human psychophysics judgments of neuron
interpretability, and (2) an approach for finding meaningful directions in
network activation space. We leverage these methods to discover directions in
convolutional neural networks that are more intuitively meaningful than
individual neurons, as we confirm and investigate in a series of analyses.
Moreover, we apply the same method to three recent datasets of visual neural
responses in the brain and find that our conclusions largely transfer to real
neural data, suggesting that superposition might be deployed by the brain. This
also provides a link with disentanglement and raises fundamental questions
about robust, efficient and factorized representations in both artificial and
biological neural systems
- …