190 research outputs found
A three-stage learning algorithm for deep multilayer perceptron with effective weight initialisation based on sparse auto-encoder
A three-stage learning algorithm for deep multilayer perceptron (DMLP) with effective weight initialisation based on sparse auto-encoder is proposed in this paper, which aims to overcome difficulties in training deep neural networks with limited training data in high-dimensional feature space. At the first stage, unsupervised learning is adopted using sparse auto-encoder to obtain the initial weights of the feature extraction layers of the DMLP. At the second stage, error back-propagation is used to train the DMLP by fixing the weights obtained at the first stage for its feature extraction layers. At the third stage, all the weights of the DMLP obtained at the second stage are refined by error back-propagation. Network structures and values of learning parameters are determined through cross-validation, and test datasets unseen in the cross-validation are used to evaluate the performance of the DMLP trained using the three-stage learning algorithm. Experimental results show that the proposed method is effective in combating overfitting in training deep neural networks
Effective Features and Machine Learning Methods for Document Classification
Document classification has been involved in a variety of applications, such as phishing and fraud detection, news categorisation, and information retrieval. This thesis aims to provide novel solutions to several important problems presented by document classification. First, an improved Principal Components Analysis (PCA), based on similarity and correlation criteria instead of covariance, is proposed, which aims to capture low-dimensional feature subset that facilitates improved performance in text classification. The experimental results have demonstrated the advantages and usefulness of the proposed method for text classification in high-dimensional feature space in terms of the number of features required to achieve the best classification accuracy. Second, two hybrid feature-subset selection methods are proposed based on the combination (via either union or intersection) of the results of both supervised (in one method) and unsupervised (in the other method) filter approaches prior to the use of a wrapper, leading to low-dimensional feature subset that can achieve both high classification accuracy and good interpretability, and spend less processing time than most current methods. The experimental results have demonstrated the effectiveness of the proposed methods for feature subset selection in high-dimensional feature space in terms of the number of selected features and the processing time spent to achieve the best classification accuracy. Third, a class-specific (supervised) pre-trained approach based on a sparse autoencoder is proposed for acquiring low-dimensional interesting structure of relevant features, which can be used for high-performance document classification. The experimental results have demonstrated the merit of this proposed method for document classification in high-dimensional feature space, in terms of the limited number of features required to achieve good classification accuracy. Finally, deep classifier structures associated with a stacked autoencoder (SAE) for higher-level feature extraction are investigated, aiming to overcome the difficulties experienced in training deep neural networks with limited training data in high-dimensional feature space, such as overfitting and vanishing/exploding gradients. This investigation has resulted in a three-stage learning algorithm for training deep neural networks. In comparison with support vector machines (SVMs) combined with SAE and Deep Multilayer Perceptron (DMLP) with random weight initialisation, the experimental results have shown the advantages and effectiveness of the proposed three-stage learning algorithm
Learning to process with spikes and to localise pulses
In the last few decades, deep learning with artificial neural networks (ANNs) has emerged as one of the most widely used techniques in tasks such as classification and regression, achieving competitive results and in some cases even surpassing human-level performance. Nonetheless, as ANN architectures are optimised towards empirical results and departed from their biological precursors, how exactly human brains process information using these short electrical pulses called spikes remains a mystery. Hence, in this thesis, we explore the problem of learning to process with spikes and to localise pulses.
We first consider spiking neural networks (SNNs), a type of ANN that more closely mimic biological neural networks in that neurons communicate with one another using spikes. This unique architecture allows us to look into the role of heterogeneity in learning. Since it is conjectured that the information is encoded by the timing of spikes, we are particularly interested in the heterogeneity of time constants of neurons. We then trained SNNs for classification tasks on a range of visual and auditory neuromorphic datasets, which contain streams of events (spike times) instead of the conventional frame-based data, and show that the overall performance is improved by allowing the neurons to have different time constants, especially on tasks with richer temporal structure. We also find that the learned time constants are distributed similarly to those experimentally observed in some mammalian cells. Besides, we demonstrate that learning with heterogeneity improves robustness against hyperparameter mistuning. These results suggest that heterogeneity may be more than the byproduct of noisy processes and perhaps serves a key role in learning in changing environments, yet heterogeneity has been overlooked in basic artificial models.
While neuromorphic datasets, which are often captured by neuromorphic devices that closely model the corresponding biological systems, have enabled us to explore the more biologically plausible SNNs, there still exists a gap in understanding how spike times encode information in actual biological neural networks like human brains, as such data is difficult to acquire due to the trade-off between the timing precision and the number of cells simultaneously recorded electrically. Instead, what we usually obtain is the low-rate discrete samples of trains of filtered spikes. Hence, in the second part of the thesis, we focus on a different type of problem involving pulses, that is to retrieve the precise pulse locations from these low-rate samples. We make use of the finite rate of innovation (FRI) sampling theory, which states that perfect reconstruction is possible for classes of continuous non-bandlimited signals that have a small number of free parameters. However, existing FRI methods break down under very noisy conditions due to the so-called subspace swap event. Thus, we present two novel model-based learning architectures: Deep Unfolded Projected Wirtinger Gradient Descent (Deep Unfolded PWGD) and FRI Encoder-Decoder Network (FRIED-Net). The former is based on the existing iterative denoising algorithm for subspace-based methods, while the latter models directly the relationship between the samples and the locations of the pulses using an autoencoder-like network. Using a stream of K Diracs as an example, we show that both algorithms are able to overcome the breakdown inherent in the existing subspace-based methods. Moreover, we extend our FRIED-Net framework beyond conventional FRI methods by considering when the shape is unknown. We show that the pulse shape can be learned using backpropagation. This coincides with the application of spike detection from real-world calcium imaging data, where we achieve competitive results. Finally, we explore beyond canonical FRI signals and demonstrate that FRIED-Net is able to reconstruct streams of pulses with different shapes.Open Acces
Recommended from our members
The resurgence of structure in deep neural networks
Machine learning with deep neural networks ("deep learning") allows for learning complex features directly from raw input data, completely eliminating hand-crafted, "hard-coded" feature extraction from the learning pipeline. This has lead to state-of-the-art performance being achieved across several---previously disconnected---problem domains, including computer vision, natural language processing, reinforcement learning and generative modelling. These success stories nearly universally go hand-in-hand with availability of immense quantities of labelled training examples ("big data") exhibiting simple grid-like structure (e.g. text or images), exploitable through convolutional or recurrent layers. This is due to the extremely large number of degrees-of-freedom in neural networks, leaving their generalisation ability vulnerable to effects such as overfitting.
However, there remain many domains where extensive data gathering is not always appropriate, affordable, or even feasible. Furthermore, data is generally organised in more complicated kinds of structure---which most existing approaches would simply discard. Examples of such tasks are abundant in the biomedical space; with e.g. small numbers of subjects available for any given clinical study, or relationships between proteins specified via interaction networks. I hypothesise that, if deep learning is to reach its full potential in such environments, we need to reconsider "hard-coded" approaches---integrating assumptions about inherent structure in the input data directly into our architectures and learning algorithms, through structural inductive biases. In this dissertation, I directly validate this hypothesis by developing three structure-infused neural network architectures (operating on sparse multimodal and graph-structured data), and a structure-informed learning algorithm for graph neural networks, demonstrating significant outperformance of conventional baseline models and algorithms.The work depicted in this dissertation was in part supported by funding from the European Union's Horizon 2020 research and innovation programme PROPAG-AGEING under grant agreement No 634821
Deep Representation-aligned Graph Multi-view Clustering for Limited Labeled Multi-modal Health Data
Today, many fields are characterised by having extensive quantities of data from a wide range of dissimilar sources and domains. One such field is medicine, in which data contain exhaustive combinations of spatial, temporal, linear, and relational data. Often lacking expert-assessed labels, much of this data would require analysis within the fields of unsupervised or semi-supervised learning. Thus, reasoned by the notion that higher view-counts provide more ways to recognise commonality across views, contrastive multi-view clustering may be utilised to train a model to suppress redundancy and otherwise medically irrelevant information. Yet, standard multi-view clustering approaches do not account for relational graph data. Recent developments aim to solve this by utilising various graph operations including graph-based attention. And within deep-learning graph-based multi-view clustering on a sole view-invariant affinity graph, representation alignment remains unexplored.
We introduce Deep Representation-Aligned Graph Multi-View Clustering (DRAGMVC), a novel attention-based graph multi-view clustering model. Comparing maximal performance, our model surpassed the state-of-the-art in eleven out of twelve metrics on Cora, CiteSeer, and PubMed. The model considers view alignment on a sample-level by employing contrastive loss and relational data through a novel take on graph attention embeddings in which we use a Markov chain prior to increase the receptive field of each layer. For clustering, a graph-induced DDC module is used. GraphSAINT sampling is implemented to control our mini-batch space to capitalise on our Markov prior.
Additionally, we present the MIMIC pleural effusion graph multi-modal dataset, consisting of two modalities registering 3520 chest X-ray images along with two static views registered within a one-day time frame: vital signs and lab tests. These making up the, in total, three views of the dataset. We note a significant improvement in terms of separability, view mixing, and clustering performance comparing DRAGMVC to preceding non-graph multi-view clustering models, suggesting a possible, largely unexplored use case of unsupervised graph multi-view clustering on graph-induced, multi-modal, and complex medical data
Recommended from our members
Interpreting Deep Learning for cell differentiation. Supervised and Unsupervised models viewed through the lens of information and perturbation theory.
"Predicting the future isn't magic, it's artificial intelligence" Dave Waters.
In the last decades there has been an unprecedented growth in the field of machine learning, and particularly within deep learning models. The combination of big data and computational power has nurtured the evolution of a variety of new methods to predict and interpret future scenarios. These data centric models can achieve exceptional performances on specific tasks, with their prediction boundaries continuously expanding towards new and more complex challenges.
However, the model complexity often translates into a lack of interpretability from a scientific c perspective, it is not trivial to identify the factors involved in final outcomes.
Explainability may not always be a requirement for some machine learning tasks, specially when it comes in detriment of performance power. But for some applications, such as biological discoveries or medical diagnostics, understanding the output and determining factors that influence decisions is essential.
In this thesis we develop both a supervised and unsupervised approach to map from genotype to phenotype. We emphasise the importance of interpretability and feature extraction from the models, by identifying relevant genes for cell differentiation. We then continue to explore the rules and mechanisms behind the models from a theoretical perspective. Using information theory to explain the learning process and applying
perturbation theory to transform the results into a generalisable representation.
We start by building a supervised approach to mapping cell profiles from genotype to phenotype, using single cell RNA-Seq data. We leverage non-linearities among gene expressions to identify cellular levels of differentiation. The ambiguity and even absence of labels in most biological studies instigated the development of novel unsupervised techniques, leading to a new general and biologically interpretable framework based on Variational Autoencoders.
The application and validation of the methods has proven to be successful, but questions regarding the learning process and generative nature of the results remained unanswered. I use information theory to define a new approach to interpret training and the converged solutions of our models.
The variational and generative nature of Autoencoders provides a platform to develop general models. Their results should extrapolate and allow generalisation beyond the boundaries of the observed data. To this extent, we introduce for the first time a new interpretation of the embedded generative functions through Perturbation Theory. The embedding multiplicity is addressed by transforming the distributions into a new set of generalisable functions, while characterising their energy spectrum
under a particular energy landscape.
We outline the combination of theoretical and machine learning based methods, for moving towards interpretable and generalisable models. Developing a theoretical framework to map from genotype to phenotype, we provide both supervised and unsupervised tools to operate over single cell RNA-Seq. data. We have generated a pipeline to identify relevant genes and cell types through Variational Autoencoders (VAEs),
validating reconstructed gene expressions to prove the generative performance of the embeddings. The new interpretation of the information learned and extracted by the models de fines a label independent evaluation, particularly useful for unsupervised
learning. Lastly, we introduce a novel transformation of the generative embeddings based on quantum and perturbation theory.
Our contributions can and have been extended to new datasets, according to the nature of the tasks being explored. For instance, the combination of unsupervised learning and information theory can be applied to a variety of biological or medical data. We have trained several VAE models with additional cancer and metabolic data, proving to extract meaningful representations of the data. The perturbation theory transformation of the embedding can also lead to future research on the generative potential of Variational Autoencoders through a physics perspective, combining statistical and quantum mechanics.
We believe that machine learning will only continue its fast expansion and growth through the development of more generalisable more interpretable models.
"Prediction is very difficult, especially if it's about the future" Niels Boh
Analysis of physiological signals using machine learning methods
Technological advances in data collection enable scientists to suggest novel approaches, such as Machine Learning algorithms, to process and make sense of this information. However, during this process of collection, data loss and damage can occur for reasons such as faulty device sensors or miscommunication. In the context of time-series data such as multi-channel bio-signals, there is a possibility of losing a whole channel. In such cases, existing research suggests imputing the missing parts when the majority of data is available. One way of understanding and classifying complex signals is by using deep neural networks. The hyper-parameters of such models have been optimised using the process of back propagation. Over time, improvements have been suggested to enhance this algorithm. However, an essential drawback of the back propagation can be the sensitivity to noisy data. This thesis proposes two novel approaches to address the missing data challenge and back propagation drawbacks: First, suggesting a gradient-free model in order to discover the optimal hyper-parameters of a deep neural network. The complexity of deep networks and high-dimensional optimisation parameters presents challenges to find a suitable network structure and hyper-parameter configuration. This thesis proposes the use of a minimalist swarm optimiser, Dispersive Flies Optimisation(DFO), to enable the selected model to achieve better results in comparison with the traditional back propagation algorithm in certain conditions such as limited number of training samples. The DFO algorithm offers a robust search process for finding and determining the hyper-parameter configurations. Second, imputing whole missing bio-signals within a multi-channel sample. This approach comprises two experiments, namely the two-signal and five-signal imputation models. The first experiment attempts to implement and evaluate the performance of a model mapping bio-signals from A toB and vice versa. Conceptually, this is an extension to transfer learning using CycleGenerative Adversarial Networks (CycleGANs). The second experiment attempts to suggest a mechanism imputing missing signals in instances where multiple data channels are available for each sample. The capability to map to a target signal through multiple source domains achieves a more accurate estimate for the target domain. The results of the experiments performed indicate that in certain circumstances, such as having a limited number of samples, finding the optimal hyper-parameters of a neural network using gradient-free algorithms outperforms traditional gradient-based algorithms, leading to more accurate classification results. In addition, Generative Adversarial Networks could be used to impute the missing data channels in multi-channel bio-signals, and the generated data used for further analysis and classification tasks
Deep Learning with Limited Labels for Medical Imaging
Recent advancements in deep learning-based AI technologies provide an automatic tool to revolutionise medical image computing. Training a deep learning model requires a large amount of labelled data. Acquiring labels for medical images is extremely challenging due to the high cost in terms of both money and time, especially for the pixel-wise segmentation task of volumetric medical scans. However, obtaining unlabelled medical scans is relatively easier compared to acquiring labels for those images.
This work addresses the pervasive issue of limited labels in training deep learning models for medical imaging. It begins by exploring different strategies of entropy regularisation in the joint training of labelled and unlabelled data to reduce the time and cost associated with manual labelling for medical image segmentation. Of particular interest are consistency regularisation and pseudo labelling. Specifically, this work proposes a well-calibrated semi-supervised segmentation framework that utilises consistency regularisation on different morphological feature perturbations, representing a significant step towards safer AI in medical imaging. Furthermore, it reformulates pseudo labelling in semi-supervised learning as an Expectation-Maximisation framework. Building upon this new formulation, the work explains the empirical successes of pseudo labelling and introduces a generalisation of the technique, accompanied by variational inference to learn its true posterior distribution. The applications of pseudo labelling in segmentation tasks are also presented. Lastly, this work explores unsupervised deep learning for parameter estimation of diffusion MRI signals, employing a hierarchical variational clustering framework and representation learning
- …