Search CORE

83 research outputs found

On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units - Steepest Gradient Descent and Natural Gradient Descent -

Author: Amari S.
Amari S.
Fukumizu K.
Rattray M.
Saad D.
Yang H. H.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 30/11/2002
Field of study

The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.Comment: 7 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Model-based kernel sum rule: kernel Bayesian inference with probabilistic model

Author: Fukumizu K
Gretton A
Kanagawa M
Nishiyama Y
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/06/2022
Field of study

Kernel Bayesian inference is a principled approach to nonparametric inference in probabilistic graphical models, where probabilistic relationships between variables are learned from data in a nonparametric manner. Various algorithms of kernel Bayesian inference have been developed by combining kernelized basic probabilistic operations such as the kernel sum rule and kernel Bayes’ rule. However, the current framework is fully nonparametric, and it does not allow a user to flexibly combine nonparametric and model-based inferences. This is inefficient when there are good probabilistic models (or simulation models) available for some parts of a graphical model; this is in particular true in scientific fields where “models” are the central topic of study. Our contribution in this paper is to introduce a novel approach, termed the model-based kernel sum rule (Mb-KSR), to combine a probabilistic model and kernel Bayesian inference. By combining the Mb-KSR with the existing kernelized probabilistic rules, one can develop various algorithms for hybrid (i.e., nonparametric and model-based) inferences. As an illustrative example, we consider Bayesian filtering in a state space model, where typically there exists an accurate probabilistic model for the state transition process. We propose a novel filtering method that combines model-based inference for the state transition process and data-driven, nonparametric inference for the observation generating process. We empirically validate our approach with synthetic and real-data experiments, the latter being the problem of vision-based mobile robot localization in robotics, which illustrates the effectiveness of the proposed hybrid approach

UCL Discovery

Hilbert Space Representations of Probability Distributions

Author: Borgwardt K.
Fukumizu K.
Gretton A.
Rasch M.
Schölkopf B.
Smola A.
Song L.
Teo C.
Publication venue
Publication date: 01/10/2007
Field of study

Many problems in unsupervised learning require the analysis of features of probability distributions. At the most fundamental level, we might wish to determine whether two distributions are the same, based on samples from each - this is known as the two-sample or homogeneity problem. We use kernel methods to address this problem, by mapping probability distributions to elements in a reproducing kernel Hilbert space (RKHS). Given a sufficiently rich RKHS, these representations are unique: thus comparing feature space representations allows us to compare distributions without ambiguity. Applications include testing whether cancer subtypes are distinguishable on the basis of DNA microarray data, and whether low frequency oscillations measured at an electrode in the cortex have a different distribution during a neural spike. A more difficult problem is to discover whether two random variables drawn from a joint distribution are independent. It turns out that any dependence between pairs of random variables can be encoded in a cross-covariance operator between appropriate RKHS representations of the variables, and we may test independence by looking at a norm of the operator. We demonstrate this independence test by establishing dependence between an English text and its French translation, as opposed to French text on the same topic but otherwise unrelated. Finally, we show that this operator norm is itself a difference in feature means

MPG.PuRe

Detecting Generalized Synchronization Between Chaotic Signals: A Kernel-based Approach

Author: Akaho S
Buja A
Fukumizu K Bach F R Gretton A
Hiromichi Suetani
Kazuyuki Aihara
Melzer T Reiter M Bischof H
Pikovsky A
Schölkopf B
Schölkopf B
Shawe-Taylor J
Silverman B W
Stone M
Suetani H
Wahba G
Yukito Iba
Publication venue: 'IOP Publishing'
Publication date: 14/08/2006
Field of study

A unified framework for analyzing generalized synchronization in coupled chaotic systems from data is proposed. The key of the proposed approach is the use of the kernel methods recently developed in the field of machine learning. Several successful applications are presented, which show the capability of the kernel-based approach for detecting generalized synchronization. It is also shown that the dynamical change of the coupling coefficient between two chaotic systems can be captured by the proposed approach.Comment: 20 pages, 15 figures. massively revised as a full paper; issues on the choice of parameters by cross validation, tests by surrogated data, etc. are added as well as additional examples and figure

arXiv.org e-Print Archive

Crossref

High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

Author: Bach F.
Cortes C.
Cover T. M.
Eric P. Xing
Fukumizu K.
Leonid Sigal
Li F.
Liu H.
Makoto Yamada
Masaeli M.
Masashi Sugiyama
Nocedal J.
Raskutti G.
Rodriguez-Lujan I.
Schölkopf B.
Seeger M.
Song L.
Tibshirani R.
Tomioka R.
Wittawat Jitkrittum
Xing E. P.
Zhao Z.
Publication venue: 'MIT Press - Journals'
Publication date: 03/01/2019
Field of study

The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a feature-wise kernelized Lasso for capturing non-linear input-output dependency. We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments with thousands of features.Comment: 18 page

arXiv.org e-Print Archive

Crossref

Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces

Author: A Berlinet
A Lasota
B Schölkopf
C Baker
G Froyland
GH Golub
H Engl
I Mezić
I Steinwart
J Shawe-Taylor
J Weidmann
K Fukumizu
K Muandet
M Reed
N Aronszajn
P Koltai
R Eubank
S Klus
S Klus
S Klus
T Kato
T Melzer
Publication venue
Publication date: 01/01/2020
Field of study

Reproducing kernel Hilbert spaces (RKHSs) play an important role in many statistics and machine learning applications ranging from support vector machines to Gaussian processes and kernel embeddings of distributions. Operators acting on such spaces are, for instance, required to embed conditional probability distributions in order to implement the kernel Bayes rule and build sequential data models. It was recently shown that transfer operators such as the Perron-Frobenius or Koopman operator can also be approximated in a similar fashion using covariance and cross-covariance operators and that eigenfunctions of these operators can be obtained by solving associated matrix eigenvalue problems. The goal of this paper is to provide a solid functional analytic foundation for the eigenvalue decomposition of RKHS operators and to extend the approach to the singular value decomposition. The results are illustrated with simple guiding examples

arXiv.org e-Print Archive

Crossref

Heriot Watt Pure

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Isometric Sliced Inverse Regression for Nonlinear Manifolds Learning

Author: A.A. Alizadeh
A.J. Smola
B. Schölkopf
C. Williams
C.G. Li
C.H. Chen
C.H. Chen
C.H. Chen
C.M. Setodji
D. Singh
D.L. Donoho
E. Bura
H.M. Wu
H.M. Wu
H.M. Wu
H.M. Wu
Han-Ming Wu
J. Ham
J. Khan
J. Nilsson
J.A. Hartigan
J.B. Tenenbaum
K. Fukumizu
K.C. Li
K.Q. Weinberger
L. Li
L. Li
L. Ni
L.K. Saul
M. Aizerman
M. Balasubramanian
M. Belkin
M. Dettling
M. Garber
M. Vlachos
O. Samko
Q. Wu
R.D. Cook
R.D. Cook
R.D. Cook
R.D. Cook
R.D. Cook
R.D. Cook
R.R. Coifman
S. Roweis
S.L. Pomeroy
T. Hastie
T. Hastie
T. Hsing
T.F. Cox
T.R. Golub
U. Alon
U. Gather
W. Bian
W. Zhong
Wei-Ting Yao
X. Gaoa
X. Geng
Y. Bengio
Y.J. Lee
Y.J. Tien
Y.R. Yeh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

[[abstract]]Sliced inverse regression (SIR) was developed to find effective linear dimension-reduction directions for exploring the intrinsic structure of the high-dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction, which is a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to K-means clustering results, and the classical SIR algorithm is applied. We show that the isometric SIR (ISOSIR) can reveal the geometric structure of a nonlinear manifold dataset (e.g., the Swiss roll). We report and discuss this novel method in comparison to several existing dimension-reduction techniques for data visualization and classification problems. The results show that ISOSIR is a promising nonlinear feature extractor for classification applications.[[incitationindex]]SCI[[booktype]]紙本[[booktype]]電子

Crossref

Tamkang University Institutional Repository

Learning, Memory, and the Role of Neural Network Architecture

Author: A Robins
A Robins
ABL Tort
AI Galushkin
AJ Robinson
AK Jain
Ann M. Hermundstad
AR McIntosh
AT Reid
C Gaiteri
CA Atencio
CJ Honey
CJ Honey
D Dominguez
D Meunier
D Meunier
D Ress
Danielle S. Bassett
DJ Felleman
DS Bassett
DS Bassett
E Bullmore
E Marder
E Polak
F Cucker
G Tononi
G Zhang
GE Hinton
GG Turrigiano
GG Turrigiano
H Kim
H Larochelle
H Markram
H Oshima
HB Bakoglu
HC Fu
HE Atallah
IL Cohen
J Alstott
J Scholz
Jean M. Carlson
K Fukumizu
K Fukushima
Kevin S. Brown
KS Brown
KS Brown
L Chittka
LF Abbott
LM Bettencourt
M Egmont-Petersen
M Kaiser
M McCloskey
M Rubinov
MJD Powell
MV Sanchez-Vives
NE Sharkey
O Bousquet
OK Ersoy
Olaf Sporns
P Auer
P Bush
P Hagmann
PA Mello
PR Roelfsema
R Bogacz
R Fletcher
R Fletcher
R Ratcliff
R Rojas
RP Allred
S Achard
T Kenet
T Xu
TP Vogels
V van Veen
VB Mountcastle
Y Bengio
Y Bengio
ZJ Chen
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The performance of information processing systems, from artificial neural networks to natural neuronal ensembles, depends heavily on the underlying system architecture. In this study, we compare the performance of parallel and layered network architectures during sequential tasks that require both acquisition and retention of information, thereby identifying tradeoffs between learning and memory processes. During the task of supervised, sequential function approximation, networks produce and adapt representations of external information. Performance is evaluated by statistically analyzing the error in these representations while varying the initial network state, the structure of the external information, and the time given to learn the information. We link performance to complexity in network architecture by characterizing local error landscape curvature. We find that variations in error landscape structure give rise to tradeoffs in performance; these include the ability of the network to maximize accuracy versus minimize inaccuracy and produce specific versus generalizable representations of information. Parallel networks generate smooth error landscapes with deep, narrow minima, enabling them to find highly specific representations given sufficient time. While accurate, however, these representations are difficult to generalize. In contrast, layered networks generate rough error landscapes with a variety of local minima, allowing them to quickly find coarse representations. Although less accurate, these representations are easily adaptable. The presence of measurable performance tradeoffs in both layered and parallel networks has implications for understanding the behavior of a wide variety of natural and artificial learning systems

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Longitudinal Evaluation of an N-Ethyl-N-Nitrosourea-Created Murine Model with Normal Pressure Hydrocephalus

Normal-pressure hydrocephalus (NPH) is a neurodegenerative disorder that usually occurs late in adult life. Clinically, the cardinal features include gait disturbances, urinary incontinence, and cognitive decline.Herein we report the characterization of a novel mouse model of NPH (designated p23-ST1), created by N-ethyl-N-nitrosourea (ENU)-induced mutagenesis. The ventricular size in the brain was measured by 3-dimensional micro-magnetic resonance imaging (3D-MRI) and was found to be enlarged. Intracranial pressure was measured and was found to fall within a normal range. A histological assessment and tracer flow study revealed that the cerebral spinal fluid (CSF) pathway of p23-ST1 mice was normal without obstruction. Motor functions were assessed using a rotarod apparatus and a CatWalk gait automatic analyzer. Mutant mice showed poor rotarod performance and gait disturbances. Cognitive function was evaluated using auditory fear-conditioned responses with the mutant displaying both short- and long-term memory deficits. With an increase in urination frequency and volume, the mutant showed features of incontinence. Nissl substance staining and cell-type-specific markers were used to examine the brain pathology. These studies revealed concurrent glial activation and neuronal loss in the periventricular regions of mutant animals. In particular, chronically activated microglia were found in septal areas at a relatively young age, implying that microglial activation might contribute to the pathogenesis of NPH. These defects were transmitted in an autosomal dominant mode with reduced penetrance. Using a whole-genome scan employing 287 single-nucleotide polymorphic (SNP) markers and further refinement using six additional SNP markers and four microsatellite markers, the causative mutation was mapped to a 5.3-cM region on chromosome 4.Our results collectively demonstrate that the p23-ST1 mouse is a novel mouse model of human NPH. Clinical observations suggest that dysfunctions and alterations in the brains of patients with NPH might occur much earlier than the appearance of clinical signs. p23-ST1 mice provide a unique opportunity to characterize molecular changes and the pathogenic mechanism of NPH

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

National Taiwan University Repository

Efficient Learning and Feature Selection in High Dimensional Regression

Author: Aaron D'Souza
Bishop C. M.
Fahlman S. E.
Fukumizu K.
Gelman A.
Ghahramani Z.
Ghahramani Z.
Ghahramani Z.
Goodman J.
Gray A. G.
Hastie T.
Jeffreys H.
Jo-Anne Ting
Keerthi S. S.
Komarek P.
Lee S.
Lokhorst J.
Moore A.
Omohundro S. M.
Parisi G.
Rustagi J.
Schaal S.
Sethu Vijayakumar
Stefan Schaal
Tibshirani R.
Ting J.
Tipping M. E.
Vijayakumar S.
Wang L.
Williams C. K. I.
Wold H.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2010
Field of study

We present a novel algorithm for efficient learning and feature selection in high-dimensional regression problems. We arrive at this model through a modification of the standard regression model, enabling us to derive a probabilistic version of the well-known statistical regression technique of backfitting. Using the expectation-maximization algorithm, along with variational approximation methods to overcome intractability, we extend our algorithm to include automatic relevance detection of the input features. This variational Bayesian least squares (VBLS) approach retains its simplicity as a linear model, but offers a novel statistically robust black-box approach to generalized linear regression with high-dimensional inputs. It can be easily extended to nonlinear regression and classification problems. In particular, we derive the framework of sparse Bayesian learning, the relevance vector machine, with VBLS at its core, offering significant computational and robustness advantages for this class of methods. The iterative nature of VBLS makes it most suitable for real-time incremental learning, which is crucial especially in the application domain of robotics, brain-machine interfaces, and neural prosthetics, where real-time learning of models for control is needed. We evaluate our algorithm on synthetic and neurophysiological data sets, as well as on standard regression and classification benchmark data sets, comparing it with other competitive statistical approaches and demonstrating its suitability as a drop-in replacement for other generalized linear regression techniques

CiteSeerX

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer