40 research outputs found

    Studies on dimension reduction and feature spaces :

    Get PDF
    Today's world produces and stores huge amounts of data, which calls for methods that can tackle both growing sizes and growing dimensionalities of data sets. Dimension reduction aims at answering the challenges posed by the latter. Many dimension reduction methods consist of a metric transformation part followed by optimization of a cost function. Several classes of cost functions have been developed and studied, while metrics have received less attention. We promote the view that metrics should be lifted to a more independent role in dimension reduction research. The subject of this work is the interaction of metrics with dimension reduction. The work is built on a series of studies on current topics in dimension reduction and neural network research. Neural networks are used both as a tool and as a target for dimension reduction. When the results of modeling or clustering are represented as a metric, they can be studied using dimension reduction, or they can be used to introduce new properties into a dimension reduction method. We give two examples of such use: visualizing results of hierarchical clustering, and creating supervised variants of existing dimension reduction methods by using a metric that is built on the feature space of a neural network. Combining clustering with dimension reduction results in a novel way for creating space-efficient visualizations, that tell both about hierarchical structure and about distances of clusters. We study feature spaces used in a recently developed neural network architecture called extreme learning machine. We give a novel interpretation for such neural networks, and recognize the need to parameterize extreme learning machines with the variance of network weights. This has practical implications for use of extreme learning machines, since the current practice emphasizes the role of hidden units and ignores the variance. A current trend in the research of deep neural networks is to use cost functions from dimension reduction methods to train the network for supervised dimension reduction. We show that equally good results can be obtained by training a bottlenecked neural network for classification or regression, which is faster than using a dimension reduction cost. We demonstrate that, contrary to the current belief, using sparse distance matrices for creating fast dimension reduction methods is feasible, if a proper balance between short-distance and long-distance entries in the sparse matrix is maintained. This observation opens up a promising research direction, with possibility to use modern dimension reduction methods on much larger data sets than which are manageable today

    An automatic feature extraction method and its application in fault diagnosis

    Get PDF
    The main challenge of fault diagnosis is to extract excellent fault feature, but these methods usually depend on the manpower and prior knowledge. It is desirable to automatically extract useful feature from input data in an unsupervised way. Hence, an automatic feature extraction method is presented in this paper. The proposed method first captures fault feature from the raw vibration signal by sparse filtering. Considering that the learned feature is high-dimensional data which cannot achieve visualization, t-distributed stochastic neighbor embedding (t-SNE) is further selected as the dimensionality reduction tool to map the learned feature into a three-dimensional feature vector. Consequently, the effectiveness of the proposed method is verified using gearbox and bearing experimental datas. The classification results show that the hybrid method of sparse filtering and t-SNE can well extract discriminative information from the raw vibration signal and can clearly distinguish different fault types. Through comparison analysis, it is also validated that the proposed method is superior to the other methods

    Machine Learning Methods for Brain Image Analysis

    Get PDF
    Understanding how the brain functions and quantifying compound interactions between complex synaptic networks inside the brain remain some of the most challenging problems in neuroscience. Lack or abundance of data, shortage of manpower along with heterogeneity of data following from various species all served as an added complexity to the already perplexing problem. The ability to process vast amount of brain data need to be performed automatically, yet with an accuracy close to manual human-level performance. These automated methods essentially need to generalize well to be able to accommodate data from different species. Also, novel approaches and techniques are becoming a necessity to reveal the correlations between different data modalities in the brain at the global level. In this dissertation, I mainly focus on two problems: automatic segmentation of brain electron microscopy (EM) images and stacks, and integrative analysis of the gene expression and synaptic connectivity in the brain. I propose to use deep learning algorithms for the 2D segmentation of EM images. I designed an automated pipeline with novel insights that was able to achieve state-of-the-art performance on the segmentation of the \textit{Drosophila} brain. I also propose a novel technique for 3D segmentation of EM image stacks that can be trained end-to-end with no prior knowledge of the data. This technique was evaluated in an ongoing online challenge for 3D segmentation of neurites where it achieved accuracy close to a second human observer. Later, I employed ensemble learning methods to perform the first systematic integrative analysis of the genome and connectome in the mouse brain at both the regional- and voxel-level. I show that the connectivity signals can be predicted from the gene expression signatures with an extremely high accuracy. Furthermore, I show that only a certain fraction of genes are responsible for this predictive aspect. Rich functional and cellular analysis of these genes are detailed to validate these findings

    Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning

    Get PDF
    Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to choose simpler, less powerful models, e.g. linear models, in order to process more training examples in a limited time. In this work, we introduce parallelism to the training of non-linear models by leveraging a different tradeoff--approximation. We demonstrate various techniques by which non-linear models can be made amenable to larger data sets and significantly more training parallelism by strategically introducing approximation in certain optimization steps. For gradient boosted regression tree ensembles, we replace precise selection of tree splits with a coarse-grained, approximate split selection, yielding both faster sequential training and a significant increase in parallelism, in the distributed setting in particular. For metric learning with nearest neighbor classification, rather than explicitly train a neighborhood structure we leverage the implicit neighborhood structure induced by task-specific random forest classifiers, yielding a highly parallel method for metric learning. For support vector machines, we follow existing work to learn a reduced basis set with extremely high parallelism, particularly on GPUs, via existing linear algebra libraries. We believe these optimization tradeoffs are widely applicable wherever machine learning is put in practice in large scale settings. By carefully introducing approximation, we also introduce significantly higher parallelism and consequently can process more training examples for more iterations than competing exact methods. While seemingly learning the model with less precision, this tradeoff often yields noticeably higher accuracy under a restricted training time budget

    Neural network based image representation for small scale object recognition

    Get PDF
    Object recognition can be abstractedly viewed as a two-stage process. The features learning stage selects key information that can represent the input image in a compact, robust, and discriminative manner in some feature space. Then the classification stage learns the rules to differentiate object classes based on the representations of their images in feature space. Consequently, if the first stage can produce a highly separable features set, simple and cost-effective classifiers can be used to make the recognition system more applicable in practice. Features, or representations, used to be engineered manually with different assumptions about the data population to limit the complexity in a manageable range. As more practical problems are tackled, those assumptions are no longer valid, and so are the representations built on them. More parameters and test cases have to be considered in those new challenges, that causes manual engineering to become too complicated. Machine learning approaches ease those difficulties by allowing computer to learn to identify the appropriate representation automatically. As the number of parameters increases with the divergence of data, it is always beneficial to eliminate irrelevant information from input data to reduce the complexity of learning. Chapter 3 of the thesis reports the study case where removal of colour leads to an improvement in recognition accuracy. Deep learning appears to be a very strong representation learner with new achievements coming in monthly basic. While training the phase of deep structures requires huge amount of data, tremendous calculation, and careful calibration, the inferencing phase is affordable and straightforward. Utilizing knowledge in trained deep networks is therefore promising for efficient feature extraction in smaller systems. Many approaches have been proposed under the name of “transfer learning”, aimed to take advantage of that “deep knowledge”. However, the results achieved so far could be classified as a learning room for improvement. Chapter 4 presents a new method to utilize a trained deep convolutional structure as a feature extractor and achieved state-of-the-art accuracy on the Washington RGBD dataset. Despite some good results, the potential of transfer learning is just barely exploited. On one hand, a dimensionality reduction can be used to make the deep neural network representation even more computationally efficient and allow a wider range of use cases. Inspired by the structure of the network itself, a new random orthogonal projection method for the dimensionality reduction is presented in the first half of Chapter 5. The t-SNE mimicking neural network for low-dimensional embedding is also discussed in this part with promising results. In another approach, feature encoding can be used to improve deep neural network features for classification applications. Thanks to the spatially organized structure, deep neural network features can be considered as local image descriptors, and thus the traditional feature encoding approaches such as the Fisher vector can be applied to improve those features. This method combines the advantages of both discriminative learning and generative learning to boost the features performance in difficult scenarios such as when data is noisy or incomplete. The problem of high dimensionality in deep neural network features is alleviated with the use of the Fisher vector based on sparse coding, where infinite number of Gaussian mixtures was used to model the feature space. In the second half of Chapter 5, the regularized Fisher encoding was shown to be effective in improving classification results on difficult classes. Also, the low cost incremental k-means learning was shown to be a potential dictionary learning approach that can be used to replace the slow and computationally expensive sparse coding method

    THE SPATIAL INDUCTIVE BIAS OF DEEP LEARNING

    Get PDF
    In the past few years, Deep Learning has become the method of choice for producing state-of-the-art results on machine learning problems involving images, text, and speech. The explosion of interest in these techniques has resulted in a large number of successful applications of deep learning, but relatively few studies exploring the nature of and reason for that success. This dissertation is motivated by a desire to understand and reproduce the performance characteristics of deep learning systems, particularly Convolutional Neural Networks (CNNs). One factor in the success of CNNs is that they have an inductive bias that assumes a certain type of spatial structure is present in the data. We give a formal definition of how this type of spatial structure can be characterised, along with some statistical tools for testing whether spatial structure is present in a given dataset. These tools are applied to several standard image datasets, and the results are analyzed. We demonstrate that CNNs rely heavily on the presence of such structure, and then show several ways that a similar bias can be introduced into other methods. The first is a partition-based method for training Restricted Boltzmann Machines and Deep Belief Networks, which is able to speed up convergence significantly without changing the overall representational power of the network. The second is a deep partitioned version of Principal Component Analysis, which demonstrates that a spatial bias can be useful even in a model that is non-connectionist and completely linear. The third is a variation on projective Random Forests, which shows that we can introduce a spatial bias with only minor changes to the algorithm, and no externally imposed partitioning is required. In each case, we can show that introducing a spatial bias results in improved performance on spatial data

    Pattern Recognition for Complex Heterogeneous Time-Series Data: An Analysis of Microbial Community Dynamics

    Get PDF
    Microbial life is the most wide-spread and the most abundant life form on earth. They exist in complex and diverse communities in environments from the deep ocean trenches to Himalayan snowfields. Microbial life is essential for other forms of life as well. Scientific studies of microbial activity include diverse communities such as plant root microbiome, insect gut microbiome and human skin microbiome. In the human body alone, the number of microbial life forms supersedes the number of human body cells. Hence it is essential to understand microbial community dynamics. With the advent of 16S rRNA sequencing, we have access to a plethora of data on the microbiome, warrantying a shift from in-vitro analysis to in-silico analysis. This thesis focuses on challenges in analysing microbial community dynamics through complex, heterogeneous and temporal data. Firstly, we look at the mathematical modelling of microbial community dynamics and inference of microbial interaction networks by analysing longitudinal sequencing data. We look at the problem with the aims of minimising the assumptions involved and improving the accuracy of the inferred interaction networks. Secondly, we explore the temporally dynamic nature of microbial interaction networks. We look at the fallacies of static microbial interaction networks and approaches suitable for modelling temporally dynamic microbial interaction networks. Thirdly, we study multiple temporal microbial datasets from similar environments to understand macro and micro patterns apparent in these communities. We explore the individuality and conformity of microbial communities through visualisation techniques. Finally, we explore the possibility and challenges in representing heterogeneous microbial temporal activity in unique signatures. In summary, in this work, we have explored various aspects of complex, heterogeneous and time-series data through microbial temporal abundance datasets and have enhanced the knowledge about these complex and diverse communities through a pattern recognition approach

    Solving Machine Learning Problems with Biological Principles

    Get PDF
    Spiking neural networks (SNNs) have been proposed both as models of cortical computation and as candidates for solving problems in machine learning. While increasing recent works have improved their performances in benchmark discriminative tasks, most of them learn by surrogates of backpropagation where biological features such as spikes are regarded more as defects than merits. In this thesis, we explore the enerative abilities of SNNs with built-in biological mechanisms. When sampling from high-dimensional multimodal distributions, models based on general Markov chain Monte Carlo methods often have the mixing problem that the sampler is easy to get trapped in local minima. Inspired from traditional annealing or tempering approaches, we demonstrate that increasing the rate of background Poisson noise in an SNN can flatten the energy landscape and facilitate mixing of the system. In addition, we show that with synaptic short-term plasticity (STP) the SNN can achieve more efficient mixing by local modulation of active attractors and eventually outperforming traditional benchmark models. We reveal diverse sampling statistics of SNNs induced by STP and finally study its implementation on conventional machine learning methods. Our work thereby highlights important computational consequences of biological features that might otherwise appear as artifacts of evolution

    Classification of Supernovae and Stars in the Era of Big Data and Artificial Intelligence

    Get PDF
    In recent years, artificial intelligence (AI) has been applied in many fields of research. It is particularly well suited to astronomy, in which very large datasets from sky surveys cover a wide range of observations. The upcoming Legacy Survey of Space and Time (LSST) presents unprecedented big data challenges, requiring state-of-the-art methods to produce, process and analyse information. Observations of Type Ia supernovae help constrain cosmological parameters such as the dark energy equation of state, and AI will be instrumental in the next generation of cosmological measurements due to limited spectroscopic resources. AI also has the ability to improve our astrophysical understanding by perceiving patterns in data which may not be obvious to humans. In this thesis we investigate how advanced AI methods can be used in classification tasks: to identify Type Ia supernovae for cosmology from photometry using supervised learning; by determining a low-dimensional representation of stellar spectra, and inferring astrophysical concepts through unsupervised learning. In preparation for photometric classification of transients from LSST we run tests with different training samples. Using estimates of the depth to which the 4-metre Multi-Object Spectroscopic Telescope (4MOST) Time-Domain Extragalactic Survey (TiDES) can classify transients, we simulate a magnitude-limited training sample reaching rAB = 22.5 mag. We run our simulations with the software snmachine, a photometric classification pipeline using machine learning. The machine-learning algorithms struggle to classify supernovae when the training sample is magnitude-limited as its features are not representative of the test set. In contrast, representative training samples perform very well, particularly when redshift information is included. Classification performance noticeably improves when we combine the magnitude-limited training sample with a simulated realistic sample of faint, high-redshift supernovae observed from larger spectroscopic facilities; the algorithms' range of average area under ROC curve (AUC) scores over 10 runs increases from 0.547-0.628 to 0.946-0.969 and purity of the classified sample reaches 95% in all runs for 2 of the 4 algorithms. By creating new, artificial light curves using the augmentation software avocado, we achieve a purity in our classified sample of 95% in all 10 runs performed for all machine-learning algorithms considered. We also reach a highest average AUC score of 0.986 with the artificial neural network algorithm. Having real faint supernovae to complement our magnitude-limited sample is a crucial requirement in optimisation of a 4MOST spectroscopic sample. However, our results are a proof of concept that augmentation is also necessary to achieve the best classification results. During our investigation into an optimised training sample, we assumed that every training object has the correct class label. Spectroscopy is a reliable method to confirm object classification and is used to define our training sample. However, it is not necessarily perfect and we therefore consider the impact of potential misclassifications of training objects. Taking the predicted error rates in spectroscopic classification from the literature, we apply contamination to a TiDES training sample using simulated LSST data. With the recurrent neural network from the software SuperNNova, we determine appropriate hyperparameters using a perfect, uncontaminated TiDES training sample and then train a model on its contaminated counterpart to study its effects on photometric classification. We find that a contaminated training sample produces very little difference in classification performance, even when increasing contamination to 5%. Contamination causes more objects of both Type Ia and non-Ia to be classified as Ia, increasing efficiency, but decreasing purity, with changes of less than 1% on average. Similarly, we see a decrease of 0.1% in average accuracy, and no clear difference in AUC score, only varying at the fourth significant figure. These results are promising for photometric classification. Contaminated training appears to have little impact and propagation to cosmological measurements is expected to be minimal. In a separate study, we apply deep learning to data in the European Southern Observatory (ESO) archive using an autoencoder neural network with the aim of improving similarity-based searches using the network's own interpretation of the data. We train the network to reconstruct stellar spectra by passing them through an information bottleneck, creating a low-dimensional representation of the data. We find that this representation includes several informative dimensions and, comparing to known astrophysical labels, see clear correlations for two key nodes; the network learns concepts of radial velocity and effective temperature, completely unsupervised. The interpretation of the other informative nodes appears ambiguous, leaving room for future investigation. The results presented in this thesis emphasise the practical capabilities of AI in an astronomical context: Classification of astrophysical objects can be conducted through supervised learning using known labels, as well as unsupervised learning in a physics-agnostic process
    corecore