83 research outputs found
On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units - Steepest Gradient Descent and Natural Gradient Descent -
The permutation symmetry of the hidden units in multilayer perceptrons causes
the saddle structure and plateaus of the learning dynamics in gradient learning
methods. The correlation of the weight vectors of hidden units in a teacher
network is thought to affect this saddle structure, resulting in a prolonged
learning time, but this mechanism is still unclear. In this paper, we discuss
it with regard to soft committee machines and on-line learning using
statistical mechanics. Conventional gradient descent needs more time to break
the symmetry as the correlation of the teacher weight vectors rises. On the
other hand, no plateaus occur with natural gradient descent regardless of the
correlation for the limit of a low learning rate. Analytical results support
these dynamics around the saddle point.Comment: 7 pages, 6 figure
Model-based kernel sum rule: kernel Bayesian inference with probabilistic model
Kernel Bayesian inference is a principled approach to nonparametric inference in probabilistic graphical models, where probabilistic relationships between variables are learned from data in a nonparametric manner. Various algorithms of kernel Bayesian inference have been developed by combining kernelized basic probabilistic operations such as the kernel sum rule and kernel Bayesā rule. However, the current framework is fully nonparametric, and it does not allow a user to flexibly combine nonparametric and model-based inferences. This is inefficient when there are good probabilistic models (or simulation models) available for some parts of a graphical model; this is in particular true in scientific fields where āmodelsā are the central topic of study. Our contribution in this paper is to introduce a novel approach, termed the model-based kernel sum rule (Mb-KSR), to combine a probabilistic model and kernel Bayesian inference. By combining the Mb-KSR with the existing kernelized probabilistic rules, one can develop various algorithms for hybrid (i.e., nonparametric and model-based) inferences. As an illustrative example, we consider Bayesian filtering in a state space model, where typically there exists an accurate probabilistic model for the state transition process. We propose a novel filtering method that combines model-based inference for the state transition process and data-driven, nonparametric inference for the observation generating process. We empirically validate our approach with synthetic and real-data experiments, the latter being the problem of vision-based mobile robot localization in robotics, which illustrates the effectiveness of the proposed hybrid approach
Hilbert Space Representations of Probability Distributions
Many problems in unsupervised learning require the analysis of features of probability distributions. At the most fundamental level, we might wish to determine whether two distributions are the same, based on samples from each - this is known as the two-sample or homogeneity problem. We use kernel methods to address this problem, by mapping probability distributions to elements in a reproducing kernel Hilbert space (RKHS). Given a sufficiently rich RKHS, these representations are unique: thus comparing feature space representations allows us to compare distributions without ambiguity. Applications include testing whether cancer subtypes are distinguishable on the basis of DNA microarray data, and whether low frequency oscillations measured at an electrode in the cortex have a different distribution during a neural spike. A more difficult problem is to discover whether two random variables drawn from a joint distribution are independent. It turns out that any dependence between pairs of random variables can be encoded in a cross-covariance operator between appropriate RKHS representations of the variables, and we may test independence by looking at a norm of the operator. We demonstrate this independence test by establishing dependence between an English text and its French translation, as opposed to French text on the same topic but otherwise unrelated. Finally, we show that this operator norm is itself a difference in feature means
Detecting Generalized Synchronization Between Chaotic Signals: A Kernel-based Approach
A unified framework for analyzing generalized synchronization in coupled
chaotic systems from data is proposed. The key of the proposed approach is the
use of the kernel methods recently developed in the field of machine learning.
Several successful applications are presented, which show the capability of the
kernel-based approach for detecting generalized synchronization. It is also
shown that the dynamical change of the coupling coefficient between two chaotic
systems can be captured by the proposed approach.Comment: 20 pages, 15 figures. massively revised as a full paper; issues on
the choice of parameters by cross validation, tests by surrogated data, etc.
are added as well as additional examples and figure
High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso
The goal of supervised feature selection is to find a subset of input
features that are responsible for predicting output values. The least absolute
shrinkage and selection operator (Lasso) allows computationally efficient
feature selection based on linear dependency between input features and output
values. In this paper, we consider a feature-wise kernelized Lasso for
capturing non-linear input-output dependency. We first show that, with
particular choices of kernel functions, non-redundant features with strong
statistical dependence on output values can be found in terms of kernel-based
independence measures. We then show that the globally optimal solution can be
efficiently computed; this makes the approach scalable to high-dimensional
problems. The effectiveness of the proposed method is demonstrated through
feature selection experiments with thousands of features.Comment: 18 page
Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces
Reproducing kernel Hilbert spaces (RKHSs) play an important role in many
statistics and machine learning applications ranging from support vector
machines to Gaussian processes and kernel embeddings of distributions.
Operators acting on such spaces are, for instance, required to embed
conditional probability distributions in order to implement the kernel Bayes
rule and build sequential data models. It was recently shown that transfer
operators such as the Perron-Frobenius or Koopman operator can also be
approximated in a similar fashion using covariance and cross-covariance
operators and that eigenfunctions of these operators can be obtained by solving
associated matrix eigenvalue problems. The goal of this paper is to provide a
solid functional analytic foundation for the eigenvalue decomposition of RKHS
operators and to extend the approach to the singular value decomposition. The
results are illustrated with simple guiding examples
Isometric Sliced Inverse Regression for Nonlinear Manifolds Learning
[[abstract]]Sliced inverse regression (SIR) was developed to find effective linear dimension-reduction directions for exploring the intrinsic structure of the high-dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction, which is a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to K-means clustering results, and the classical SIR algorithm is applied. We show that the isometric SIR (ISOSIR) can reveal the geometric structure of a nonlinear manifold dataset (e.g., the Swiss roll). We report and discuss this novel method in comparison to several existing dimension-reduction techniques for data visualization and classification problems. The results show that ISOSIR is a promising nonlinear feature extractor for classification applications.[[incitationindex]]SCI[[booktype]]ē“ę¬[[booktype]]é»å
Learning, Memory, and the Role of Neural Network Architecture
The performance of information processing systems, from artificial neural networks to natural neuronal ensembles, depends heavily on the underlying system architecture. In this study, we compare the performance of parallel and layered network architectures during sequential tasks that require both acquisition and retention of information, thereby identifying tradeoffs between learning and memory processes. During the task of supervised, sequential function approximation, networks produce and adapt representations of external information. Performance is evaluated by statistically analyzing the error in these representations while varying the initial network state, the structure of the external information, and the time given to learn the information. We link performance to complexity in network architecture by characterizing local error landscape curvature. We find that variations in error landscape structure give rise to tradeoffs in performance; these include the ability of the network to maximize accuracy versus minimize inaccuracy and produce specific versus generalizable representations of information. Parallel networks generate smooth error landscapes with deep, narrow minima, enabling them to find highly specific representations given sufficient time. While accurate, however, these representations are difficult to generalize. In contrast, layered networks generate rough error landscapes with a variety of local minima, allowing them to quickly find coarse representations. Although less accurate, these representations are easily adaptable. The presence of measurable performance tradeoffs in both layered and parallel networks has implications for understanding the behavior of a wide variety of natural and artificial learning systems
Longitudinal Evaluation of an N-Ethyl-N-Nitrosourea-Created Murine Model with Normal Pressure Hydrocephalus
Normal-pressure hydrocephalus (NPH) is a neurodegenerative disorder that usually occurs late in adult life. Clinically, the cardinal features include gait disturbances, urinary incontinence, and cognitive decline.Herein we report the characterization of a novel mouse model of NPH (designated p23-ST1), created by N-ethyl-N-nitrosourea (ENU)-induced mutagenesis. The ventricular size in the brain was measured by 3-dimensional micro-magnetic resonance imaging (3D-MRI) and was found to be enlarged. Intracranial pressure was measured and was found to fall within a normal range. A histological assessment and tracer flow study revealed that the cerebral spinal fluid (CSF) pathway of p23-ST1 mice was normal without obstruction. Motor functions were assessed using a rotarod apparatus and a CatWalk gait automatic analyzer. Mutant mice showed poor rotarod performance and gait disturbances. Cognitive function was evaluated using auditory fear-conditioned responses with the mutant displaying both short- and long-term memory deficits. With an increase in urination frequency and volume, the mutant showed features of incontinence. Nissl substance staining and cell-type-specific markers were used to examine the brain pathology. These studies revealed concurrent glial activation and neuronal loss in the periventricular regions of mutant animals. In particular, chronically activated microglia were found in septal areas at a relatively young age, implying that microglial activation might contribute to the pathogenesis of NPH. These defects were transmitted in an autosomal dominant mode with reduced penetrance. Using a whole-genome scan employing 287 single-nucleotide polymorphic (SNP) markers and further refinement using six additional SNP markers and four microsatellite markers, the causative mutation was mapped to a 5.3-cM region on chromosome 4.Our results collectively demonstrate that the p23-ST1 mouse is a novel mouse model of human NPH. Clinical observations suggest that dysfunctions and alterations in the brains of patients with NPH might occur much earlier than the appearance of clinical signs. p23-ST1 mice provide a unique opportunity to characterize molecular changes and the pathogenic mechanism of NPH
Efficient Learning and Feature Selection in High Dimensional Regression
We present a novel algorithm for efficient learning and feature selection in high-dimensional regression problems. We arrive at this model through a modification of the standard regression model, enabling us to derive a probabilistic version of the well-known statistical regression technique of backfitting. Using the expectation-maximization algorithm, along with variational approximation methods to overcome intractability, we extend our algorithm to include automatic relevance detection of the input features. This variational Bayesian least squares (VBLS) approach retains its simplicity as a linear model, but offers a novel statistically robust black-box approach to generalized linear regression with high-dimensional inputs. It can be easily extended to nonlinear regression and classification problems. In particular, we derive the framework of sparse Bayesian learning, the relevance vector machine, with VBLS at its core, offering significant computational and robustness advantages for this class of methods. The iterative nature of VBLS makes it most suitable for real-time incremental learning, which is crucial especially in the application domain of robotics, brain-machine interfaces, and neural prosthetics, where real-time learning of models for control is needed. We evaluate our algorithm on synthetic and neurophysiological data sets, as well as on standard regression and classification benchmark data sets, comparing it with other competitive statistical approaches and demonstrating its suitability as a drop-in replacement for other generalized linear regression techniques
- ā¦