83 research outputs found

    On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units - Steepest Gradient Descent and Natural Gradient Descent -

    Full text link
    The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.Comment: 7 pages, 6 figure

    Model-based kernel sum rule: kernel Bayesian inference with probabilistic model

    Get PDF
    Kernel Bayesian inference is a principled approach to nonparametric inference in probabilistic graphical models, where probabilistic relationships between variables are learned from data in a nonparametric manner. Various algorithms of kernel Bayesian inference have been developed by combining kernelized basic probabilistic operations such as the kernel sum rule and kernel Bayesā€™ rule. However, the current framework is fully nonparametric, and it does not allow a user to flexibly combine nonparametric and model-based inferences. This is inefficient when there are good probabilistic models (or simulation models) available for some parts of a graphical model; this is in particular true in scientific fields where ā€œmodelsā€ are the central topic of study. Our contribution in this paper is to introduce a novel approach, termed the model-based kernel sum rule (Mb-KSR), to combine a probabilistic model and kernel Bayesian inference. By combining the Mb-KSR with the existing kernelized probabilistic rules, one can develop various algorithms for hybrid (i.e., nonparametric and model-based) inferences. As an illustrative example, we consider Bayesian filtering in a state space model, where typically there exists an accurate probabilistic model for the state transition process. We propose a novel filtering method that combines model-based inference for the state transition process and data-driven, nonparametric inference for the observation generating process. We empirically validate our approach with synthetic and real-data experiments, the latter being the problem of vision-based mobile robot localization in robotics, which illustrates the effectiveness of the proposed hybrid approach

    Hilbert Space Representations of Probability Distributions

    Get PDF
    Many problems in unsupervised learning require the analysis of features of probability distributions. At the most fundamental level, we might wish to determine whether two distributions are the same, based on samples from each - this is known as the two-sample or homogeneity problem. We use kernel methods to address this problem, by mapping probability distributions to elements in a reproducing kernel Hilbert space (RKHS). Given a sufficiently rich RKHS, these representations are unique: thus comparing feature space representations allows us to compare distributions without ambiguity. Applications include testing whether cancer subtypes are distinguishable on the basis of DNA microarray data, and whether low frequency oscillations measured at an electrode in the cortex have a different distribution during a neural spike. A more difficult problem is to discover whether two random variables drawn from a joint distribution are independent. It turns out that any dependence between pairs of random variables can be encoded in a cross-covariance operator between appropriate RKHS representations of the variables, and we may test independence by looking at a norm of the operator. We demonstrate this independence test by establishing dependence between an English text and its French translation, as opposed to French text on the same topic but otherwise unrelated. Finally, we show that this operator norm is itself a difference in feature means

    Detecting Generalized Synchronization Between Chaotic Signals: A Kernel-based Approach

    Full text link
    A unified framework for analyzing generalized synchronization in coupled chaotic systems from data is proposed. The key of the proposed approach is the use of the kernel methods recently developed in the field of machine learning. Several successful applications are presented, which show the capability of the kernel-based approach for detecting generalized synchronization. It is also shown that the dynamical change of the coupling coefficient between two chaotic systems can be captured by the proposed approach.Comment: 20 pages, 15 figures. massively revised as a full paper; issues on the choice of parameters by cross validation, tests by surrogated data, etc. are added as well as additional examples and figure

    High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

    Full text link
    The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a feature-wise kernelized Lasso for capturing non-linear input-output dependency. We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments with thousands of features.Comment: 18 page

    Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces

    Full text link
    Reproducing kernel Hilbert spaces (RKHSs) play an important role in many statistics and machine learning applications ranging from support vector machines to Gaussian processes and kernel embeddings of distributions. Operators acting on such spaces are, for instance, required to embed conditional probability distributions in order to implement the kernel Bayes rule and build sequential data models. It was recently shown that transfer operators such as the Perron-Frobenius or Koopman operator can also be approximated in a similar fashion using covariance and cross-covariance operators and that eigenfunctions of these operators can be obtained by solving associated matrix eigenvalue problems. The goal of this paper is to provide a solid functional analytic foundation for the eigenvalue decomposition of RKHS operators and to extend the approach to the singular value decomposition. The results are illustrated with simple guiding examples

    Isometric Sliced Inverse Regression for Nonlinear Manifolds Learning

    Get PDF
    [[abstract]]Sliced inverse regression (SIR) was developed to find effective linear dimension-reduction directions for exploring the intrinsic structure of the high-dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction, which is a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to K-means clustering results, and the classical SIR algorithm is applied. We show that the isometric SIR (ISOSIR) can reveal the geometric structure of a nonlinear manifold dataset (e.g., the Swiss roll). We report and discuss this novel method in comparison to several existing dimension-reduction techniques for data visualization and classification problems. The results show that ISOSIR is a promising nonlinear feature extractor for classification applications.[[incitationindex]]SCI[[booktype]]ē“™ęœ¬[[booktype]]電子

    Learning, Memory, and the Role of Neural Network Architecture

    Get PDF
    The performance of information processing systems, from artificial neural networks to natural neuronal ensembles, depends heavily on the underlying system architecture. In this study, we compare the performance of parallel and layered network architectures during sequential tasks that require both acquisition and retention of information, thereby identifying tradeoffs between learning and memory processes. During the task of supervised, sequential function approximation, networks produce and adapt representations of external information. Performance is evaluated by statistically analyzing the error in these representations while varying the initial network state, the structure of the external information, and the time given to learn the information. We link performance to complexity in network architecture by characterizing local error landscape curvature. We find that variations in error landscape structure give rise to tradeoffs in performance; these include the ability of the network to maximize accuracy versus minimize inaccuracy and produce specific versus generalizable representations of information. Parallel networks generate smooth error landscapes with deep, narrow minima, enabling them to find highly specific representations given sufficient time. While accurate, however, these representations are difficult to generalize. In contrast, layered networks generate rough error landscapes with a variety of local minima, allowing them to quickly find coarse representations. Although less accurate, these representations are easily adaptable. The presence of measurable performance tradeoffs in both layered and parallel networks has implications for understanding the behavior of a wide variety of natural and artificial learning systems

    Longitudinal Evaluation of an N-Ethyl-N-Nitrosourea-Created Murine Model with Normal Pressure Hydrocephalus

    Get PDF
    Normal-pressure hydrocephalus (NPH) is a neurodegenerative disorder that usually occurs late in adult life. Clinically, the cardinal features include gait disturbances, urinary incontinence, and cognitive decline.Herein we report the characterization of a novel mouse model of NPH (designated p23-ST1), created by N-ethyl-N-nitrosourea (ENU)-induced mutagenesis. The ventricular size in the brain was measured by 3-dimensional micro-magnetic resonance imaging (3D-MRI) and was found to be enlarged. Intracranial pressure was measured and was found to fall within a normal range. A histological assessment and tracer flow study revealed that the cerebral spinal fluid (CSF) pathway of p23-ST1 mice was normal without obstruction. Motor functions were assessed using a rotarod apparatus and a CatWalk gait automatic analyzer. Mutant mice showed poor rotarod performance and gait disturbances. Cognitive function was evaluated using auditory fear-conditioned responses with the mutant displaying both short- and long-term memory deficits. With an increase in urination frequency and volume, the mutant showed features of incontinence. Nissl substance staining and cell-type-specific markers were used to examine the brain pathology. These studies revealed concurrent glial activation and neuronal loss in the periventricular regions of mutant animals. In particular, chronically activated microglia were found in septal areas at a relatively young age, implying that microglial activation might contribute to the pathogenesis of NPH. These defects were transmitted in an autosomal dominant mode with reduced penetrance. Using a whole-genome scan employing 287 single-nucleotide polymorphic (SNP) markers and further refinement using six additional SNP markers and four microsatellite markers, the causative mutation was mapped to a 5.3-cM region on chromosome 4.Our results collectively demonstrate that the p23-ST1 mouse is a novel mouse model of human NPH. Clinical observations suggest that dysfunctions and alterations in the brains of patients with NPH might occur much earlier than the appearance of clinical signs. p23-ST1 mice provide a unique opportunity to characterize molecular changes and the pathogenic mechanism of NPH

    Efficient Learning and Feature Selection in High Dimensional Regression

    Get PDF
    We present a novel algorithm for efficient learning and feature selection in high-dimensional regression problems. We arrive at this model through a modification of the standard regression model, enabling us to derive a probabilistic version of the well-known statistical regression technique of backfitting. Using the expectation-maximization algorithm, along with variational approximation methods to overcome intractability, we extend our algorithm to include automatic relevance detection of the input features. This variational Bayesian least squares (VBLS) approach retains its simplicity as a linear model, but offers a novel statistically robust black-box approach to generalized linear regression with high-dimensional inputs. It can be easily extended to nonlinear regression and classification problems. In particular, we derive the framework of sparse Bayesian learning, the relevance vector machine, with VBLS at its core, offering significant computational and robustness advantages for this class of methods. The iterative nature of VBLS makes it most suitable for real-time incremental learning, which is crucial especially in the application domain of robotics, brain-machine interfaces, and neural prosthetics, where real-time learning of models for control is needed. We evaluate our algorithm on synthetic and neurophysiological data sets, as well as on standard regression and classification benchmark data sets, comparing it with other competitive statistical approaches and demonstrating its suitability as a drop-in replacement for other generalized linear regression techniques
    • ā€¦
    corecore