23 research outputs found

    Understanding polysemanticity in neural networks through coding theory

    Full text link
    Despite substantial efforts, neural network interpretability remains an elusive goal, with previous research failing to provide succinct explanations of most single neurons' impact on the network output. This limitation is due to the polysemantic nature of most neurons, whereby a given neuron is involved in multiple unrelated network states, complicating the interpretation of that neuron. In this paper, we apply tools developed in neuroscience and information theory to propose both a novel practical approach to network interpretability and theoretical insights into polysemanticity and the density of codes. We infer levels of redundancy in the network's code by inspecting the eigenspectrum of the activation's covariance matrix. Furthermore, we show how random projections can reveal whether a network exhibits a smooth or non-differentiable code and hence how interpretable the code is. This same framework explains the advantages of polysemantic neurons to learning performance and explains trends found in recent results by Elhage et al.~(2022). Our approach advances the pursuit of interpretability in neural networks, providing insights into their underlying structure and suggesting new avenues for circuit-level interpretability

    Feature regularization and learning for human activity recognition.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Durban.Feature extraction is an essential component in the design of human activity recognition model. However, relying on extracted features alone for learning often makes the model a suboptimal model. Therefore, this research work seeks to address such potential problem by investigating feature regularization. Feature regularization is used for encapsulating discriminative patterns that are needed for better and efficient model learning. Firstly, a within-class subspace regularization approach is proposed for eigenfeatures extraction and regularization in human activity recognition. In this ap- proach, the within-class subspace is modelled using more eigenvalues from the reliable subspace to obtain a four-parameter modelling scheme. This model enables a better and true estimation of the eigenvalues that are distorted by the small sample size effect. This regularization is done in one piece, thereby avoiding undue complexity of modelling eigenspectrum differently. The whole eigenspace is used for performance evaluation because feature extraction and dimensionality reduction are done at a later stage of the evaluation process. Results show that the proposed approach has better discriminative capacity than several other subspace approaches for human activity recognition. Secondly, with the use of likelihood prior probability, a new regularization scheme that improves the loss function of deep convolutional neural network is proposed. The results obtained from this work demonstrate that a well regularized feature yields better class discrimination in human activity recognition. The major contribution of the thesis is the development of feature extraction strategies for determining discriminative patterns needed for efficient model learning

    Human sensing indoors in RF utilising unlabeled sensor streams

    Get PDF
    Indoor human sensing in radio frequencies is crucial for non-invasive, privacy-preserving digital healthcare, and machine learning is the backbone of such systems. Changes in the environment affect negatively the quality of learned mappings, which necessitates a semi-supervised approach that makes use of the unlabeled data stream to allow the learner to refine their hypothesis with time.We first explore the ambulation classification problem with frequency modulated continuous wave (FMCW) radar, replacing manual feature engineering by inductive bias in architectural choices of the neural network. We demonstrate that key ambulations: walk, bend, sit to stand and stand to sit can be distinguished with high accuracy. We then apply variational autoencoders to explore unsupervised localisation in synthetic grayscale images, finding that the goal is achievable with the choice of encoder that encodes temporal structure.Next, we evaluate temporal contrastive learning as the method of using unlabeled sensor streams in fingerprinting localisation, finding that it is a reliable method of defining a notion of pairwise distance on the data in that it improves the classification using the nearest neighbour classifier by both reducing the number of other-class items in same-class clusters, and increasing the pairwise distance contrast. Compared to the state of the art in fingerprinting localisation indoors, our contribution is that we successfully address the unsupervised domain adaptation problem.Finally, we raise the hypothesis that some knowledge can be shared between learners in different houses in a privacy-preserving manner. We adapt federated learning (FL) to the multi-residence indoor localisation scenario, which has not been done before, and propose a localfine-tuning algorithm with acceptance based on local validation error improvement. We find the tuned FL each client has a better personalised model compared to benchmark FL while keeping learning dynamics smooth for all clients

    Architectural designs of Echo State Network

    Get PDF
    It investigates systematically the reservoir construction of Echo State Network (ESN). This thesis proposes two very simple deterministic ESN organisation (Simple Cycle reservoir (SCR) and Cycle Reservoir with Jumps (CRJ). Simple Cycle reservoir (SCR) is sufficient to obtain performances comparable to those of the classical ESN. While Cycle Reservoir with Jumps (CRJ) significantly outperform the those of the classical ESN. This thesis also studies and discusses three reservoir characterisations - short-term memory capacity (MC), eigen-spectrum of the reservoir weight matrix and Lyapunov Exponent with their relation to the ESN performance. It also designs and utilises an ensemble of ESNs with diverse reservoirs whose collective readout is obtained through Negative Correlation Learning (NCL) of ensemble of Multi-Layer Perceptrons (MLP), where each individual MPL realises the readout from a single ESN. Finally, this thesis investigates the relation between two quantitative measures characterising short term memory in input driven dynamical systems, namely the short term memory capacity (MC), and the Fisher memory curve (FMC)

    Semantic Models for Machine Learning

    No full text
    In this thesis we present approaches to the creation and usage of semantic models by the analysis of the data spread in the feature space. We aim to introduce the general notion of using feature selection techniques in machine learning applications. The applied approaches obtain new feature directions on data, such that machine learning applications would show an increase in performance. We review three principle methods that are used throughout the thesis. Firstly Canonical Correlation Analysis (CCA), which is a method of correlating linear relationships between two multidimensional variables. CCA can be seen as using complex labels as a way of guiding feature selection towards the underlying semantics. CCA makes use of two views of the same semantic object to extract a representation of the semantics. Secondly Partial Least Squares (PLS), a method similar to CCA. It selects feature directions that are useful for the task at hand, though PLS only uses one view of an object and the label as the corresponding pair. PLS could be thought of as a method that looks for directions that are good for distinguishing the different labels. The third method is the Fisher kernel. A method that aims to extract more information of a generative model than simply by their output probabilities. The aim is to analyse how the Fisher score depends on the model and which aspects of the model are important in determining the Fisher score. We focus our theoretical investigation primarily on CCA and its kernel variant. Providing a theoretical analysis of the method's stability using Rademacher complexity, hence deriving the error bound for new data. We conclude the thesis by applying the described approaches to problems in the various fields of image, text, music application and medical analysis, describing several novel applications on relevant real-world data. The aim of the thesis is to provide a theoretical understanding of semantic models, while also providing a good application foundation on how these models can be practically used

    Influencers in Dynamic Financial Networks

    Get PDF
    To monitor risk in temporal financial networks, an understanding of how individual behaviours affect the temporal evolution of networks is needed. This is typically achieved using centrality and importance metrics, which rank nodes in terms of their position in the network. This approach works well for static networks, that do not change over time, but does not consider the dynamics of the network. In addition to this, current methods are often unable to capture the complex, often sparse and disconnected structures of financial transaction networks. This thesis addresses these gaps by considering importance from a dynamical perspective, first by using spectral perturbations to derive measures of importance for nodes and edges, then adapting these methods to incorporate a structural awareness. I complement these methods with a generative model for transaction networks that captures how individual behaviours give rise to the key properties of these networks, offering new methods to add to the regulatory toolkit. My contributions are made across three studies which complement each other in their findings. Study 1: \begin{itemize} \item I define a structural importance metric for the edges of a network, based on perturbing the adjacency matrix and observing the resultant change in its largest eigenvalues. \item I combine this with a model of network evolution where this metric controls the scale and probabilities of subsequent edge changes. This allows me to consider how edge importance relates to subsequent edge behaviour. \item I use this model alongside an exercise to predict subsequent change from edge importance. Using this I demonstrate how the model parameters are related to the capability of predicting whether an edge will change from its importance. \end{itemize} Study 2: \begin{itemize} \item I extend my measure of edge importance to measure the importance of nodes, and to capture complex community structures through the use of additional components of the eigenspectrum. \item While computed from a static network, my measure of node importance outperforms other centrality measures as a predictor of nodes subsequently transacting. This implies that static representations of temporal networks can contain information about their dynamics. \end{itemize} Study 3: \begin{itemize} \item I contrast the snapshot based methods used in the first two studies by modelling the dynamic of transactions between counterparties using both univariate and multivariate Hawkes processes, which capture the non-linear `bursty’ behaviour of transaction sequences. \item I find that the frequency of transactions between counterparties increases the likelihood of them to transact in the future, and that univariate and multivariate Hawkes processes show promise as generative models for transaction sequences. \item Hawkes processes also perform well when used to model buys and sells through a central clearing counterparty when considered as a bivariate process, but not as well when these are modelled as individual univariate processes. This indicates that mutual excitation between buys and sells is present in these markets. \end{itemize} The observations presented in this thesis provide new insights into the behaviour of equities markets, which until now have mainly been studied via price information. The metrics I propose offer a new potential to identify important traders and transactions in complex trading networks. The models I propose provide a null model over which a user could detect outlying transactions and could also be used to generate synthetic data for sharing purposes

    Semantic models for machine learning

    Get PDF
    In this thesis we present approaches to the creation and usage of semantic models by the analysis of the data spread in the feature space. We aim to introduce the general notion of using feature selection techniques in machine learning applications. The applied approaches obtain new feature directions on data, such that machine learning applications would show an increase in performance. We review three principle methods that are used throughout the thesis. Firstly Canonical Correlation Analysis (CCA), which is a method of correlating linear relationships between two multidimensional variables. CCA can be seen as using complex labels as a way of guiding feature selection towards the underlying semantics. CCA makes use of two views of the same semantic object to extract a representation of the semantics. Secondly Partial Least Squares (PLS), a method similar to CCA. It selects feature directions that are useful for the task at hand, though PLS only uses one view of an object and the label as the corresponding pair. PLS could be thought of as a method that looks for directions that are good for distinguishing the different labels. The third method is the Fisher kernel. A method that aims to extract more information of a generative model than simply by their output probabilities. The aim is to analyse how the Fisher score depends on the model and which aspects of the model are important in determining the Fisher score. We focus our theoretical investigation primarily on CCA and its kernel variant. Providing a theoretical analysis of the method's stability using Rademacher complexity, hence deriving the error bound for new data. We conclude the thesis by applying the described approaches to problems in the various fields of image, text, music application and medical analysis, describing several novel applications on relevant real-world data. The aim of the thesis is to provide a theoretical understanding of semantic models, while also providing a good application foundation on how these models can be practically used.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Manifold Learning Approaches to Compressing Latent Spaces of Unsupervised Feature Hierarchies

    Get PDF
    Field robots encounter dynamic unstructured environments containing a vast array of unique objects. In order to make sense of the world in which they are placed, they collect large quantities of unlabelled data with a variety of sensors. Producing robust and reliable applications depends entirely on the ability of the robot to understand the unlabelled data it obtains. Deep Learning techniques have had a high level of success in learning powerful unsupervised representations for a variety of discriminative and generative models. Applying these techniques to problems encountered in field robotics remains a challenging endeavour. Modern Deep Learning methods are typically trained with a substantial labelled dataset, while datasets produced in a field robotics context contain limited labelled training data. The primary motivation for this thesis stems from the problem of applying large scale Deep Learning models to field robotics datasets that are label poor. While the lack of labelled ground truth data drives the desire for unsupervised methods, the need for improving the model scaling is driven by two factors, performance and computational requirements. When utilising unsupervised layer outputs as representations for classification, the classification performance increases with layer size. Scaling up models with multiple large layers of features is problematic, as the sizes of subsequent hidden layers scales with the size of the previous layer. This quadratic scaling, and the associated time required to train such networks has prevented adoption of large Deep Learning models beyond cluster computing. The contributions in this thesis are developed from the observation that parameters or filter el- ements learnt in Deep Learning systems are typically highly structured, and contain related ele- ments. Firstly, the structure of unsupervised filters is utilised to construct a mapping from the high dimensional filter space to a low dimensional manifold. This creates a significantly smaller repre- sentation for subsequent feature learning. This mapping, and its effect on the resulting encodings, highlights the need for the ability to learn highly overcomplete sets of convolutional features. Driven by this need, the unsupervised pretraining of Deep Convolutional Networks is developed to include a number of modern training and regularisation methods. These pretrained models are then used to provide initialisations for supervised convolutional models trained on low quantities of labelled data. By utilising pretraining, a significant increase in classification performance on a number of publicly available datasets is achieved. In order to apply these techniques to outdoor 3D Laser Illuminated Detection And Ranging data, we develop a set of resampling techniques to provide uniform input to Deep Learning models. The features learnt in these systems outperform the high effort hand engineered features developed specifically for 3D data. The representation of a given signal is then reinterpreted as a combination of modes that exist on the learnt low dimensional filter manifold. From this, we develop an encoding technique that allows the high dimensional layer output to be represented as a combination of low dimensional components. This allows the growth of subsequent layers to only be dependent on the intrinsic dimensionality of the filter manifold and not the number of elements contained in the previous layer. Finally, the resulting unsupervised convolutional model, the encoding frameworks and the em- bedding methodology are used to produce a new unsupervised learning stratergy that is able to encode images in terms of overcomplete filter spaces, without producing an explosion in the size of the intermediate parameter spaces. This model produces classification results on par with state of the art models, yet requires significantly less computational resources and is suitable for use in the constrained computation environment of a field robot

    Dynamical structure in neural population activity

    Get PDF
    The question of how the collective activity of neural populations in the brain gives rise to complex behaviour is fundamental to neuroscience. At the core of this question lie considerations about how neural circuits can perform computations that enable sensory perception, motor control, and decision making. It is thought that such computations are implemented by the dynamical evolution of distributed activity in recurrent circuits. Thus, identifying and interpreting dynamical structure in neural population activity is a key challenge towards a better understanding of neural computation. In this thesis, I make several contributions in addressing this challenge. First, I develop two novel methods for neural data analysis. Both methods aim to extract trajectories of low-dimensional computational state variables directly from the unbinned spike-times of simultaneously recorded neurons on single trials. The first method separates inter-trial variability in the low-dimensional trajectory from variability in the timing of progression along its path, and thus offers a quantification of inter-trial variability in the underlying computational process. The second method simultaneously learns a low-dimensional portrait of the underlying nonlinear dynamics of the circuit, as well as the system's fixed points and locally linearised dynamics around them. This approach facilitates extracting interpretable low-dimensional hypotheses about computation directly from data. Second, I turn to the question of how low-dimensional dynamical structure may be embedded within a high-dimensional neurobiological circuit with excitatory and inhibitory cell-types. I analyse how such circuit-level features shape population activity, with particular focus on responses to targeted optogenetic perturbations of the circuit. Third, I consider the problem of implementing multiple computations in a single dynamical system. I address this in the framework of multi-task learning in recurrently connected networks and demonstrate that a careful organisation of low-dimensional, activity-defined subspaces within the network can help to avoid interference across tasks

    Intra-host HIV-1 evolution and the co-receptor switch

    Get PDF
    The course of an infection with the human immunodeficiency virus type 1 (HIV-1) is characterised by three phases: primary infection, chronic infection and acquired immunodeficiency syndrome (AIDS). These stages are defined based on levels of the number of CD4-positive T-helper cells (CD4+). This characteristic three-staged classification is also reflected in the course of the viral divergence and in the emergence of viral diversity. It is known that the V3 loop, a region encoded in the HIV envelope gene, is important for T cell infection. The CD4 receptor of the cells is used as primary receptor for viral cell entry, and the CCR5 or CXCR4 are the most important co-receptors that are necessary for cell entry. In about half of all patients, HIV switches from CCR5 towards CXCR4 usage during the late stage of infection, which hints at the onset of AIDS. Since the co-receptor tropism is determined by the V3 loop sequence, an understanding of the mechanisms of its evolution and of the circumstances leading to the co-receptor switch is of high interest. In the first part of the present work, we analysed longitudinal patient data, comprising information on CD4+ cell count, viral load, medication, coinfections and V3 loop sequences. We examined the correlations among the clinical and evolutionary data as well as the co-receptor usage over time, guided by different questions: Is the course of disease one-directional? Can successful drug therapy influence co-receptor usage? What are the genetic differences between CCR5- and CXCR4-tropic viruses? Due to the weak statistical support of our data, we only found few indications that successful HAART therapy influences the course of disease and the direction of the coreceptor switch. We hypothesise that successful therapy can pause or roll back the course of infection, enabling the CD4+ cells to recover to high levels of immune pressure. A suppression of the viral load further can displace X4-tropic viral variants in the viral population in favour of R5-tropic variants. In the second part of this work, we derived a fitness function to approximate the replication capacity of R5 and X4-tropic viruses. Based on a set of V3 loop sequences gathered from the Los Alamos HIV data base, the fitness function is composed of two components: the main fitness term describes the amino acid preferences found in the R5 and the X4 consensus sequence, and the additional epistatic term describes the effects of double mutations. While the impact of the main and epistatic fitness contribution can be influenced by a weighting parameter, an additional parameter controls the importance of available CCR5 and CXCR4 positive target cells. The fitness function enabled us to observe the differences of the underlying R5 and X4 fitness landscapes. A comparison of the sequence data set showed that the R5-tropic viral sequences were highly conserved, in contrast to the X4 sequences. Network analyses confirmed the higher sequence variability of the X4 sequences, which we found to be distributed over a larger sequence space. Interestingly, our analyses revealed that the most weakly conserved sequence positions of the X4 data set were very sensible to mutations. Upon an alteration of the most weakly conserved nt positions, the X4 sequences showed an increased probability to acquire stop condons and to loose their replicative capacity. The last part of the work describes an in silico approach of the V3 loop evolution based on the R5 and X4 fitness function. Simulations enable us to mimic the sequence evolution in silico, and to monitor the course of the viral diversity and divergence as well as the mean fitness of the simulated viral population over time. First results indicated that our simulation is able to imitate the evolutionary course of the viral diversity and divergence of an HIV infection. In our simulations, the sequence evolution followed a chemically sensible course. Amino acids that differed from the favoured chemical properties were first replaced by amino acids belonging to the favourable chemical class and finally converged into the dominant amino acid in the specific sequence position. The present project was designed to prepare the ground for deeper insights into the evolutionary dynamics of the HIV V3 loop. Our work enabled us to gain broader knowledge of the properties of R5- and X4-tropic viral sequences
    corecore