22,330 research outputs found
EEG machine learning with Higuchi fractal dimension and Sample Entropy as features for successful detection of depression
Reliable diagnosis of depressive disorder is essential for both optimal
treatment and prevention of fatal outcomes. In this study, we aimed to
elucidate the effectiveness of two non-linear measures, Higuchi Fractal
Dimension (HFD) and Sample Entropy (SampEn), in detecting depressive disorders
when applied on EEG. HFD and SampEn of EEG signals were used as features for
seven machine learning algorithms including Multilayer Perceptron, Logistic
Regression, Support Vector Machines with the linear and polynomial kernel,
Decision Tree, Random Forest, and Naive Bayes classifier, discriminating EEG
between healthy control subjects and patients diagnosed with depression. We
confirmed earlier observations that both non-linear measures can discriminate
EEG signals of patients from healthy control subjects. The results suggest that
good classification is possible even with a small number of principal
components. Average accuracy among classifiers ranged from 90.24% to 97.56%.
Among the two measures, SampEn had better performance. Using HFD and SampEn and
a variety of machine learning techniques we can accurately discriminate
patients diagnosed with depression vs controls which can serve as a highly
sensitive, clinically relevant marker for the diagnosis of depressive
disorders.Comment: 34 pages, 4 Figures, 2 table
Artificial Intelligence Techniques for Steam Generator Modelling
This paper investigates the use of different Artificial Intelligence methods
to predict the values of several continuous variables from a Steam Generator.
The objective was to determine how the different artificial intelligence
methods performed in making predictions on the given dataset. The artificial
intelligence methods evaluated were Neural Networks, Support Vector Machines,
and Adaptive Neuro-Fuzzy Inference Systems. The types of neural networks
investigated were Multi-Layer Perceptions, and Radial Basis Function. Bayesian
and committee techniques were applied to these neural networks. Each of the AI
methods considered was simulated in Matlab. The results of the simulations
showed that all the AI methods were capable of predicting the Steam Generator
data reasonably accurately. However, the Adaptive Neuro-Fuzzy Inference system
out performed the other methods in terms of accuracy and ease of
implementation, while still achieving a fast execution time as well as a
reasonable training time.Comment: 23 page
Statistical Model Building, Machine Learning, and the Ah-Ha Moment
The Committee of Presidents of Statistical Societies (COPSS) will celebrate
its 50th Anniversary in 2013. As part of its celebration, COPSS intends to
publish a book with contributions from the past recipients of its four awards,
namely the Fisher Lecture Award, the President's Award, the Elizabeth Scott
Award, and the FN David Award. The theme of the book is Past, Present and
Future of Statistical Science. As a winner of the Elizabeth Scott Award, I have
been invited to contribute. We were given several topics to choose from and I
have chosen to focus on "Statistical Career: Your reflection on your own
career, lessons and experience you have learned, and advice you would like to
provide to young statisticians if sought." This article is my contribution
Online Machine Learning in Big Data Streams
The area of online machine learning in big data streams covers algorithms
that are (1) distributed and (2) work from data streams with only a limited
possibility to store past data. The first requirement mostly concerns software
architectures and efficient algorithms. The second one also imposes nontrivial
theoretical restrictions on the modeling methods: In the data stream model,
older data is no longer available to revise earlier suboptimal modeling
decisions as the fresh data arrives.
In this article, we provide an overview of distributed software architectures
and libraries as well as machine learning models for online learning. We
highlight the most important ideas for classification, regression,
recommendation, and unsupervised modeling from streaming data, and we show how
they are implemented in various distributed data stream processing systems.
This article is a reference material and not a survey. We do not attempt to
be comprehensive in describing all existing methods and solutions; rather, we
give pointers to the most important resources in the field. All related
sub-fields, online algorithms, online learning, and distributed data processing
are hugely dominant in current research and development with conceptually new
research results and software components emerging at the time of writing. In
this article, we refer to several survey results, both for distributed data
processing and for online machine learning. Compared to past surveys, our
article is different because we discuss recommender systems in extended detail
Nonnegative Restricted Boltzmann Machines for Parts-based Representations Discovery and Predictive Model Stabilization
The success of any machine learning system depends critically on effective
representations of data. In many cases, it is desirable that a representation
scheme uncovers the parts-based, additive nature of the data. Of current
representation learning schemes, restricted Boltzmann machines (RBMs) have
proved to be highly effective in unsupervised settings. However, when it comes
to parts-based discovery, RBMs do not usually produce satisfactory results. We
enhance such capacity of RBMs by introducing nonnegativity into the model
weights, resulting in a variant called nonnegative restricted Boltzmann machine
(NRBM). The NRBM produces not only controllable decomposition of data into
interpretable parts but also offers a way to estimate the intrinsic nonlinear
dimensionality of data, and helps to stabilize linear predictive models. We
demonstrate the capacity of our model on applications such as handwritten digit
recognition, face recognition, document classification and patient readmission
prognosis. The decomposition quality on images is comparable with or better
than what produced by the nonnegative matrix factorization (NMF), and the
thematic features uncovered from text are qualitatively interpretable in a
similar manner to that of the latent Dirichlet allocation (LDA). The stability
performance of feature selection on medical data is better than RBM and
competitive with NMF. The learned features, when used for classification, are
more discriminative than those discovered by both NMF and LDA and comparable
with those by RBM
When Gaussian Process Meets Big Data: A Review of Scalable GPs
The vast quantity of information brought by big data as well as the evolving
computer hardware encourages success stories in the machine learning community.
In the meanwhile, it poses challenges for the Gaussian process (GP) regression,
a well-known non-parametric and interpretable Bayesian model, which suffers
from cubic complexity to data size. To improve the scalability while retaining
desirable prediction quality, a variety of scalable GPs have been presented.
But they have not yet been comprehensively reviewed and analyzed in order to be
well understood by both academia and industry. The review of scalable GPs in
the GP community is timely and important due to the explosion of data size. To
this end, this paper is devoted to the review on state-of-the-art scalable GPs
involving two main categories: global approximations which distillate the
entire data and local approximations which divide the data for subspace
learning. Particularly, for global approximations, we mainly focus on sparse
approximations comprising prior approximations which modify the prior but
perform exact inference, posterior approximations which retain exact prior but
perform approximate inference, and structured sparse approximations which
exploit specific structures in kernel matrix; for local approximations, we
highlight the mixture/product of experts that conducts model averaging from
multiple local experts to boost predictions. To present a complete review,
recent advances for improving the scalability and capability of scalable GPs
are reviewed. Finally, the extensions and open issues regarding the
implementation of scalable GPs in various scenarios are reviewed and discussed
to inspire novel ideas for future research avenues.Comment: 20 pages, 6 figure
Baseline CNN structure analysis for facial expression recognition
We present a baseline convolutional neural network (CNN) structure and image
preprocessing methodology to improve facial expression recognition algorithm
using CNN. To analyze the most efficient network structure, we investigated
four network structures that are known to show good performance in facial
expression recognition. Moreover, we also investigated the effect of input
image preprocessing methods. Five types of data input (raw, histogram
equalization, isotropic smoothing, diffusion-based normalization, difference of
Gaussian) were tested, and the accuracy was compared. We trained 20 different
CNN models (4 networks x 5 data input types) and verified the performance of
each network with test images from five different databases. The experiment
result showed that a three-layer structure consisting of a simple convolutional
and a max pooling layer with histogram equalization image input was the most
efficient. We describe the detailed training procedure and analyze the result
of the test accuracy based on considerable observation.Comment: 6 pages, RO-MAN2016 Conferenc
Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning
As machine learning becomes widely used for automated decisions, attackers
have strong incentives to manipulate the results and models generated by
machine learning algorithms. In this paper, we perform the first systematic
study of poisoning attacks and their countermeasures for linear regression
models. In poisoning attacks, attackers deliberately influence the training
data to manipulate the results of a predictive model. We propose a
theoretically-grounded optimization framework specifically designed for linear
regression and demonstrate its effectiveness on a range of datasets and models.
We also introduce a fast statistical attack that requires limited knowledge of
the training process. Finally, we design a new principled defense method that
is highly resilient against all poisoning attacks. We provide formal guarantees
about its convergence and an upper bound on the effect of poisoning attacks
when the defense is deployed. We evaluate extensively our attacks and defenses
on three realistic datasets from health care, loan assessment, and real estate
domains.Comment: Preprint of the work accepted for publication at the 39th IEEE
Symposium on Security and Privacy, San Francisco, CA, USA, May 21-23, 201
Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
How many training data are needed to learn a supervised task? It is often
observed that the generalization error decreases as where is
the number of training examples and an exponent that depends on both
data and algorithm. In this work we measure when applying kernel
methods to real datasets. For MNIST we find and for CIFAR10
, for both regression and classification tasks, and for
Gaussian or Laplace kernels. To rationalize the existence of non-trivial
exponents that can be independent of the specific kernel used, we study the
Teacher-Student framework for kernels. In this scheme, a Teacher generates data
according to a Gaussian random field, and a Student learns them via kernel
regression. With a simplifying assumption -- namely that the data are sampled
from a regular lattice -- we derive analytically for translation
invariant kernels, using previous results from the kriging literature. Provided
that the Student is not too sensitive to high frequencies, depends only
on the smoothness and dimension of the training data. We confirm numerically
that these predictions hold when the training points are sampled at random on a
hypersphere. Overall, the test error is found to be controlled by the magnitude
of the projection of the true function on the kernel eigenvectors whose rank is
larger than . Using this idea we predict relate the exponent to an
exponent describing how the coefficients of the true function in the
eigenbasis of the kernel decay with rank. We extract from real data by
performing kernel PCA, leading to for MNIST and
for CIFAR10, in good agreement with observations. We argue
that these rather large exponents are possible due to the small effective
dimension of the data.Comment: We added (i) the prediction of the exponent for real data
using kernel PCA; (ii) the generalization of our results to non-Gaussian data
from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in
Kernel Regression and Wide Neural Networks"
A Novel Feature Extraction Method for Scene Recognition Based on Centered Convolutional Restricted Boltzmann Machines
Scene recognition is an important research topic in computer vision, while
feature extraction is a key step of object recognition. Although classical
Restricted Boltzmann machines (RBM) can efficiently represent complicated data,
it is hard to handle large images due to its complexity in computation. In this
paper, a novel feature extraction method, named Centered Convolutional
Restricted Boltzmann Machines (CCRBM), is proposed for scene recognition. The
proposed model is an improved Convolutional Restricted Boltzmann Machines
(CRBM) by introducing centered factors in its learning strategy to reduce the
source of instabilities. First, the visible units of the network are redefined
using centered factors. Then, the hidden units are learned with a modified
energy function by utilizing a distribution function, and the visible units are
reconstructed using the learned hidden units. In order to achieve better
generative ability, the Centered Convolutional Deep Belief Networks (CCDBN) is
trained in a greedy layer-wise way. Finally, a softmax regression is
incorporated for scene recognition. Extensive experimental evaluations using
natural scenes, MIT-indoor scenes, and Caltech 101 datasets show that the
proposed approach performs better than other counterparts in terms of
stability, generalization, and discrimination. The CCDBN model is more suitable
for natural scene image recognition by virtue of convolutional property.Comment: 22 pages, 11 figure
- …