22,330 research outputs found

    EEG machine learning with Higuchi fractal dimension and Sample Entropy as features for successful detection of depression

    Full text link
    Reliable diagnosis of depressive disorder is essential for both optimal treatment and prevention of fatal outcomes. In this study, we aimed to elucidate the effectiveness of two non-linear measures, Higuchi Fractal Dimension (HFD) and Sample Entropy (SampEn), in detecting depressive disorders when applied on EEG. HFD and SampEn of EEG signals were used as features for seven machine learning algorithms including Multilayer Perceptron, Logistic Regression, Support Vector Machines with the linear and polynomial kernel, Decision Tree, Random Forest, and Naive Bayes classifier, discriminating EEG between healthy control subjects and patients diagnosed with depression. We confirmed earlier observations that both non-linear measures can discriminate EEG signals of patients from healthy control subjects. The results suggest that good classification is possible even with a small number of principal components. Average accuracy among classifiers ranged from 90.24% to 97.56%. Among the two measures, SampEn had better performance. Using HFD and SampEn and a variety of machine learning techniques we can accurately discriminate patients diagnosed with depression vs controls which can serve as a highly sensitive, clinically relevant marker for the diagnosis of depressive disorders.Comment: 34 pages, 4 Figures, 2 table

    Artificial Intelligence Techniques for Steam Generator Modelling

    Full text link
    This paper investigates the use of different Artificial Intelligence methods to predict the values of several continuous variables from a Steam Generator. The objective was to determine how the different artificial intelligence methods performed in making predictions on the given dataset. The artificial intelligence methods evaluated were Neural Networks, Support Vector Machines, and Adaptive Neuro-Fuzzy Inference Systems. The types of neural networks investigated were Multi-Layer Perceptions, and Radial Basis Function. Bayesian and committee techniques were applied to these neural networks. Each of the AI methods considered was simulated in Matlab. The results of the simulations showed that all the AI methods were capable of predicting the Steam Generator data reasonably accurately. However, the Adaptive Neuro-Fuzzy Inference system out performed the other methods in terms of accuracy and ease of implementation, while still achieving a fast execution time as well as a reasonable training time.Comment: 23 page

    Statistical Model Building, Machine Learning, and the Ah-Ha Moment

    Full text link
    The Committee of Presidents of Statistical Societies (COPSS) will celebrate its 50th Anniversary in 2013. As part of its celebration, COPSS intends to publish a book with contributions from the past recipients of its four awards, namely the Fisher Lecture Award, the President's Award, the Elizabeth Scott Award, and the FN David Award. The theme of the book is Past, Present and Future of Statistical Science. As a winner of the Elizabeth Scott Award, I have been invited to contribute. We were given several topics to choose from and I have chosen to focus on "Statistical Career: Your reflection on your own career, lessons and experience you have learned, and advice you would like to provide to young statisticians if sought." This article is my contribution

    Online Machine Learning in Big Data Streams

    Full text link
    The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail

    Nonnegative Restricted Boltzmann Machines for Parts-based Representations Discovery and Predictive Model Stabilization

    Full text link
    The success of any machine learning system depends critically on effective representations of data. In many cases, it is desirable that a representation scheme uncovers the parts-based, additive nature of the data. Of current representation learning schemes, restricted Boltzmann machines (RBMs) have proved to be highly effective in unsupervised settings. However, when it comes to parts-based discovery, RBMs do not usually produce satisfactory results. We enhance such capacity of RBMs by introducing nonnegativity into the model weights, resulting in a variant called nonnegative restricted Boltzmann machine (NRBM). The NRBM produces not only controllable decomposition of data into interpretable parts but also offers a way to estimate the intrinsic nonlinear dimensionality of data, and helps to stabilize linear predictive models. We demonstrate the capacity of our model on applications such as handwritten digit recognition, face recognition, document classification and patient readmission prognosis. The decomposition quality on images is comparable with or better than what produced by the nonnegative matrix factorization (NMF), and the thematic features uncovered from text are qualitatively interpretable in a similar manner to that of the latent Dirichlet allocation (LDA). The stability performance of feature selection on medical data is better than RBM and competitive with NMF. The learned features, when used for classification, are more discriminative than those discovered by both NMF and LDA and comparable with those by RBM

    When Gaussian Process Meets Big Data: A Review of Scalable GPs

    Full text link
    The vast quantity of information brought by big data as well as the evolving computer hardware encourages success stories in the machine learning community. In the meanwhile, it poses challenges for the Gaussian process (GP) regression, a well-known non-parametric and interpretable Bayesian model, which suffers from cubic complexity to data size. To improve the scalability while retaining desirable prediction quality, a variety of scalable GPs have been presented. But they have not yet been comprehensively reviewed and analyzed in order to be well understood by both academia and industry. The review of scalable GPs in the GP community is timely and important due to the explosion of data size. To this end, this paper is devoted to the review on state-of-the-art scalable GPs involving two main categories: global approximations which distillate the entire data and local approximations which divide the data for subspace learning. Particularly, for global approximations, we mainly focus on sparse approximations comprising prior approximations which modify the prior but perform exact inference, posterior approximations which retain exact prior but perform approximate inference, and structured sparse approximations which exploit specific structures in kernel matrix; for local approximations, we highlight the mixture/product of experts that conducts model averaging from multiple local experts to boost predictions. To present a complete review, recent advances for improving the scalability and capability of scalable GPs are reviewed. Finally, the extensions and open issues regarding the implementation of scalable GPs in various scenarios are reviewed and discussed to inspire novel ideas for future research avenues.Comment: 20 pages, 6 figure

    Baseline CNN structure analysis for facial expression recognition

    Full text link
    We present a baseline convolutional neural network (CNN) structure and image preprocessing methodology to improve facial expression recognition algorithm using CNN. To analyze the most efficient network structure, we investigated four network structures that are known to show good performance in facial expression recognition. Moreover, we also investigated the effect of input image preprocessing methods. Five types of data input (raw, histogram equalization, isotropic smoothing, diffusion-based normalization, difference of Gaussian) were tested, and the accuracy was compared. We trained 20 different CNN models (4 networks x 5 data input types) and verified the performance of each network with test images from five different databases. The experiment result showed that a three-layer structure consisting of a simple convolutional and a max pooling layer with histogram equalization image input was the most efficient. We describe the detailed training procedure and analyze the result of the test accuracy based on considerable observation.Comment: 6 pages, RO-MAN2016 Conferenc

    Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning

    Full text link
    As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by machine learning algorithms. In this paper, we perform the first systematic study of poisoning attacks and their countermeasures for linear regression models. In poisoning attacks, attackers deliberately influence the training data to manipulate the results of a predictive model. We propose a theoretically-grounded optimization framework specifically designed for linear regression and demonstrate its effectiveness on a range of datasets and models. We also introduce a fast statistical attack that requires limited knowledge of the training process. Finally, we design a new principled defense method that is highly resilient against all poisoning attacks. We provide formal guarantees about its convergence and an upper bound on the effect of poisoning attacks when the defense is deployed. We evaluate extensively our attacks and defenses on three realistic datasets from health care, loan assessment, and real estate domains.Comment: Preprint of the work accepted for publication at the 39th IEEE Symposium on Security and Privacy, San Francisco, CA, USA, May 21-23, 201

    Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

    Full text link
    How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as n−βn^{-\beta} where nn is the number of training examples and β\beta an exponent that depends on both data and algorithm. In this work we measure β\beta when applying kernel methods to real datasets. For MNIST we find β≈0.4\beta\approx 0.4 and for CIFAR10 β≈0.1\beta\approx 0.1, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically β\beta for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies, β\beta depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than nn. Using this idea we predict relate the exponent β\beta to an exponent aa describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract aa from real data by performing kernel PCA, leading to β≈0.36\beta\approx0.36 for MNIST and β≈0.07\beta\approx0.07 for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data.Comment: We added (i) the prediction of the exponent β\beta for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks"

    A Novel Feature Extraction Method for Scene Recognition Based on Centered Convolutional Restricted Boltzmann Machines

    Full text link
    Scene recognition is an important research topic in computer vision, while feature extraction is a key step of object recognition. Although classical Restricted Boltzmann machines (RBM) can efficiently represent complicated data, it is hard to handle large images due to its complexity in computation. In this paper, a novel feature extraction method, named Centered Convolutional Restricted Boltzmann Machines (CCRBM), is proposed for scene recognition. The proposed model is an improved Convolutional Restricted Boltzmann Machines (CRBM) by introducing centered factors in its learning strategy to reduce the source of instabilities. First, the visible units of the network are redefined using centered factors. Then, the hidden units are learned with a modified energy function by utilizing a distribution function, and the visible units are reconstructed using the learned hidden units. In order to achieve better generative ability, the Centered Convolutional Deep Belief Networks (CCDBN) is trained in a greedy layer-wise way. Finally, a softmax regression is incorporated for scene recognition. Extensive experimental evaluations using natural scenes, MIT-indoor scenes, and Caltech 101 datasets show that the proposed approach performs better than other counterparts in terms of stability, generalization, and discrimination. The CCDBN model is more suitable for natural scene image recognition by virtue of convolutional property.Comment: 22 pages, 11 figure
    • …
    corecore