14 research outputs found

    Causal connectivity of evolved neural networks during behavior

    Get PDF
    To show how causal interactions in neural dynamics are modulated by behavior, it is valuable to analyze these interactions without perturbing or lesioning the neural mechanism. This paper proposes a method, based on a graph-theoretic extension of vector autoregressive modeling and 'Granger causality,' for characterizing causal interactions generated within intact neural mechanisms. This method, called 'causal connectivity analysis' is illustrated via model neural networks optimized for controlling target fixation in a simulated head-eye system, in which the structure of the environment can be experimentally varied. Causal connectivity analysis of this model yields novel insights into neural mechanisms underlying sensorimotor coordination. In contrast to networks supporting comparatively simple behavior, networks supporting rich adaptive behavior show a higher density of causal interactions, as well as a stronger causal flow from sensory inputs to motor outputs. They also show different arrangements of 'causal sources' and 'causal sinks': nodes that differentially affect, or are affected by, the remainder of the network. Finally, analysis of causal connectivity can predict the functional consequences of network lesions. These results suggest that causal connectivity analysis may have useful applications in the analysis of neural dynamics

    Analysing Neural Network Topologies: a Game Theoretic Approach

    Get PDF
    Artificial Neural Networks have shown impressive success in very different application cases. Choosing a proper network architecture is a critical decision for a network\u2019s success, usually done in a manual manner. As a straightforward strategy, large, mostly fully connected architectures are selected, thereby relying on a good optimization strategy to find proper weights while at the same time avoiding overfitting. However, large parts of the final network are redundant. In the best case, large parts of the network become simply irrelevant for later inferencing. In the worst case, highly parameterized architectures hinder proper optimization and allow the easy creation of adverserial examples fooling the network. A first step in removing irrelevant architectural parts lies in identifying those parts, which requires measuring the contribution of individual components such as neurons. In previous work, heuristics based on using the weight distribution of a neuron as contribution measure have shown some success, but do not provide a proper theoretical understanding. Therefore, in our work we investigate game theoretic measures, namely the Shapley value (SV), in order to separate relevant from irrelevant parts of an artificial neural network. We begin by designing a coalitional game for an artificial neural network, where neurons form coalitions and the average contributions of neurons to coalitions yield to the Shapley value. In order to measure how well the Shapley value measures the contribution of individual neurons, we remove low-contributing neurons and measure its impact on the network performance. In our experiments we show that the Shapley value outperforms other heuristics for measuring the contribution of neurons

    Causal Measures of Structure and Plasticity in Simulated and Living Neural Networks

    Get PDF
    A major goal of neuroscience is to understand the relationship between neural structures and their function. Recording of neural activity with arrays of electrodes is a primary tool employed toward this goal. However, the relationships among the neural activity recorded by these arrays are often highly complex making it problematic to accurately quantify a network's structural information and then relate that structure to its function. Current statistical methods including cross correlation and coherence have achieved only modest success in characterizing the structural connectivity. Over the last decade an alternative technique known as Granger causality is emerging within neuroscience. This technique, borrowed from the field of economics, provides a strong mathematical foundation based on linear auto-regression to detect and quantify “causal” relationships among different time series. This paper presents a combination of three Granger based analytical methods that can quickly provide a relatively complete representation of the causal structure within a neural network. These are a simple pairwise Granger causality metric, a conditional metric, and a little known computationally inexpensive subtractive conditional method. Each causal metric is first described and evaluated in a series of biologically plausible neural simulations. We then demonstrate how Granger causality can detect and quantify changes in the strength of those relationships during plasticity using 60 channel spike train data from an in vitro cortical network measured on a microelectrode array. We show that these metrics can not only detect the presence of causal relationships, they also provide crucial information about the strength and direction of that relationship, particularly when that relationship maybe changing during plasticity. Although we focus on the analysis of multichannel spike train data the metrics we describe are applicable to any stationary time series in which causal relationships among multiple measures is desired. These techniques can be especially useful when the interactions among those measures are highly complex, difficult to untangle, and maybe changing over time

    Feature Selection via Coalitional Game Theory

    Full text link

    A POWER INDEX BASED FRAMEWORKFOR FEATURE SELECTION PROBLEMS

    Get PDF
    One of the most challenging tasks in the Machine Learning context is the feature selection. It consists in selecting the best set of features to use in the training and prediction processes. There are several benefits from pruning the set of actually operational features: the consequent reduction of the computation time, often a better quality of the prediction, the possibility to use less data to create a good predictor. In its most common form, the problem is called single-view feature selection problem, to distinguish it from the feature selection task in Multi-view learning. In the latter, each view corresponds to a set of features and one would like to enact feature selection on each view, subject to some global constraints. A related problem in the context of Multi-View Learning, is Feature Partitioning: it consists in splitting the set of features of a single large view into two or more views so that it becomes possible to create a good predictor based on each view. In this case, the best features must be distributed between the views, each view should contain synergistic features, while features that interfere disruptively must be placed in different views. In the semi-supervised multi-view task known as Co-training, one requires also that each predictor trained on an individual view is able to teach something to the other views: in classification tasks for instance, one view should learn to classify unlabelled examples based on the guess provided by the other views. There are several ways to address these problems. A set of techniques is inspired by Coalitional Game Theory. Such theory defines several useful concepts, among which two are of high practical importance: the concept of power index and the concept of interaction index. When used in the context of feature selection, they take the following meaning: the power index is a (context-dependent) synthesis measure of the prediction\u2019s capability of a feature, the interaction index is a (context-dependent) synthesis measure of the interaction (constructive/disruptive interference) between two features: it can be used to quantify how the collaboration between two features enhances their prediction capabilities. An important point is that the powerindex of a feature is different from the predicting power of the feature in isolation: it takes into account, by a suitable averaging, the context, i.e. the fact that the feature is acting, together with other features, to train a model. Similarly, the interaction index between two features takes into account the context, by suitably averaging the interaction with all the other features. In this work we address both the single-view and the multi-view problems as follows. The single-view feature selection problem, is formalized as the problem of maximization of a pseudo-boolean function, i.e. a real valued set function (that maps sets of features into a performance metric). Since one has to enact a search over (a considerable portion of) the Boolean lattice (without any special guarantees, except, perhaps, positivity) the problem is in general NP-hard. We address the problem producing candidate maximum coalitions through the selection of the subset of features characterized by the highest power indices and using the coalition to approximate the actual maximum. Although the exact computation of the power indices is an exponential task, the estimates of the power indices for the purposes of the present problem can be achieved in polynomial time. The multi-view feature selection problem is formalized as the generalization of the above set-up to the case of multi-variable pseudo-boolean functions. The multi-view splitting problem is formalized instead as the problem of maximization of a real function defined over the partition lattice. Also this problem is typically NP-hard. However, candidate solutions can be found by suitably partitioning the top power-index features and keeping in different views the pairs of features that are less interactive or negatively interactive. The sum of the power indices of the participating features can be used to approximate the prediction capability of the view (i.e. they can be used as a proxy for the predicting power). The sum of the feature pair interactivity across views can be used as proxy for the orthogonality of the views. Also the capability of a view to pass information (to teach) to other views, within a co-training procedure can benefit from the use of power indices based on a suitable definition of information transfer (a set of features { a coalition { classifies examples that are subsequently used in the training of a second set of features). As to the feature selection task, not only we demonstrate the use of state of the art power index concepts (e.g. Shapley Value and Banzhaf along the 2lines described above Value), but we define new power indices, within the more general class of probabilistic power indices, that contains the Shapley and the Banzhaf Values as special cases. Since the number of features to select is often a predefined parameter of the problem, we also introduce some novel power indices, namely k-Power Index (and its specializations k-Shapley Value, k-Banzhaf Value): they help selecting the features in a more efficient way. For the feature partitioning, we use the more general class of probabilistic interaction indices that contains the Shapley and Banzhaf Interaction Indices as members. We also address the problem of evaluating the teaching ability of a view, introducing a suitable teaching capability index. The last contribution of the present work consists in comparing the Game Theory approach to the classical Greedy Forward Selection approach for feature selection. In the latter the candidate is obtained by aggregating one feature at time to the current maximal coalition, by choosing always the feature with the maximal marginal contribution. In this case we show that in typical cases the two methods are complementary, and that when used in conjunction they reduce one another error in the estimate of the maximum value. Moreover, the approach based on game theory has two advantages: it samples the space of all possible features\u2019 subsets, while the greedy algorithm scans a selected subspace excluding totally the rest of it, and it is able, for each feature, to assign a score that describes a context-aware measure of importance in the prediction process

    Dimension Reduction and Data Augmentation Methods for the Physical Sciences

    Get PDF
    Data is fundamental to how we understand the world around us. It is using data in which we develop understanding and interact with our surroundings. With the increasing volume of data and increasingly data driven world, the development of Machine Learning tools to analyse diverse and complex data are becoming more important. In the Physical Sciences, data typically contains many observed characteristics yet at the same time a low number of total observations because of the cost associated with gathering such data. One example is in that of Molecular Physics, where highly complex systems of equations must be simulated, or expensive machinery must be used to observe a given nanoparticle. This work seeks develop tools and workflows in order to provide effective methods to analyse this sort of data. Dimension reduction and data augmentation are two well known methods for solving problems associated with learning using high dimensional data. We demonstrate workflows which make use of well known methodologies such as PCA for dimension reduction and SMOGN for data augmentation. There are however particular requirements associated with data analysis in different fields. We therefore consider the fitness of these methods in context and provide alternative methods to analyse this data with. Our methodologies demonstrate significant advantages depending on the requirements of a given analysis task. We introduce Hyper-Dimension Reduction methods which significantly outperform PCA when using our tested learning models. When data transformations such as PCA are not sufficiently interpretable, feature selection may be used which retains all the meanings of the original data. However, state of the art feature selection algorithms scale poorly with the number of features, we introduce a method which significantly reduces the computational complexity of the feature selection task on any data. We demonstrate that data augmentation can increase the stability of models by providing regularisation however existing methods retain understanding of the original data poorly. We explore the idea of using inherent errors in the data in order to carry out the data augmentation task
    corecore