Search CORE

29,025 research outputs found

Quaternion Information Theoretic Learning Adaptive Algorithms for Nonlinear Adaptive

Author: Safarian Carlo
Publication venue: Scholar Commons
Publication date: 01/01/2019
Field of study

Information Theoretic Learning (ITL) is gaining popularity for designing adaptive filters for a non-stationary or non-Gaussian environment [1] [2] . ITL cost functions such as the Minimum Error Entropy (MEE) have been applied to both linear and nonlinear adaptive filtering with better overall performance compared with the typical mean squared error (MSE) and least-squares type adaptive filtering, especially for nonlinear systems in higher-order statistic noise environments [3]. Quaternion valued data processing is beneficial in applications such as robotics and image processing, particularly for performing transformations in 3-dimensional space. Particularly the benefit for quaternion valued processing includes performing data transformations in a 3 or 4-dimensional space in a more convenient fashion than using vector algebra [4, 5, 6, 7, 8]. Adaptive filtering in quaterion domain operates intrinsically based on the augmented statistics which the quaternion input vector covariance is taken into account naturally and as a result it incorporates component-wise real valued cross-correlation or the coupling within the dimensions of the quaternion input [9]. The generalized Hamilton-real calculus (GHR) for the quaternion data simplified product and chain rules and allows us to calculate the gradient and Hessian of quaternion based cost function of the learning algorithms eciently [10][11] . The quaternion reproducing kernel Hilbert spaces and its uniqueness provide a mathematical foundation to develop the quaternion value kernel learning algorithms [12]. The reproducing property of the feature space replace the inner product of feature samples with kernel evaluation. In this dissertation, we first propose a kernel adaptive filter for quaternion data based on minimum error entropy cost function. The new algorithm is based on error entropy function and is referred to as the quaternion kernel minimum error entropy (QKMEE) algorithm [13]. We apply generalized Hamilton-real (GHR) calculus that is applicable to quaternion Hilbert space for evaluating the cost function gradient to develop the QKMEE algorithm. The minimum error entropy (MEE) algorithm [3, 14, 15] minimizes Renyis quadratic entropy of the error between the lter output and desired response or indirectly maximizing the error information potential. ITL methodology improves the performance of adaptive algorithm in biased or non-Gaussian signals and noise enviorments compared to the mean squared error (MSE) criterion algorithms such as the kernel least mean square algorithm. Second, we develop a kernel adaptive filter for quaternion data based on normalized minimum error entropy cost function [14]. We apply generalized Hamilton-real GHR) calculus that is applicable to Hilbert space for evaluating the cost function gradient to develop the quaternion kernel normalized minimum error entropy (QKNMEE) algorithm [16]. The new proposed algorithm enhanced QKMEE algorithm where the filter update stepsize selection will be independent of the input power and the kernel size. Third, we develop a kernel adaptive lter for quaternion domain data, based on information theoretic learning cost function which could be useful for quaternion based kernel applications of nonlinear filtering. The new algorithm is based on error entropy function with fiducial point and is referred to as the quaternion kernel minimum error entropy with fiducial point (QKMEEF) algorithm [17]. In our previous work we developed quaternion kernel adaptive lter based on minimum error entropy referred to as the QKMEE algorithm [13]. Since entropy does not change with the mean of the distribution, the algorithm may converge to a set of optimal weights without having zero mean error. Traditionally, to make the zero mean output error, the output during testing session was biased with the mean of errors of training session. However, for non-symmetric or heavy tails error PDF the estimation of error mean is problematic [18]. The minimum error entropy criterion, minimizes Renyi\u27s quadratic entropy of the error between the filter output and desired response or indirectly maximizing the error information potential [19]. Here, the approach is applied to quaternions. Adaptive filtering in quaterion domain intrinsically incorporates component-wise real valued cross-correlation or the coupling within the dimensions of the quaternion input. We apply generalized Hamilton-real (GHR) calculus that is applicable to Hilbert space for evaluating the cost function gradient to develop the Quaternion Minimum Error Entropy Algorithm with Fiducial point. Simulation results are used to show the behavior of the new algorithm (QKMEEF) when signal is non-Gaussian in presence of unimodal noise versus bi-modal noise distributions. Simulation results also show that the new algorithm QKMEEF can track and predict the 4-Dimensional non-stationary process signals where there are correlations between components better than quadruple real-valued KMEEF and Quat-KLMS algorithms. Fourth, we develop a kernel adaptive filter for quaternion data, using stochastic information gradient (SIG) cost function based on the information theoretic learning (ITL) approach. The new algorithm (QKSIG) is useful for quaternion-based kernel applications of nonlinear ltering [20]. Adaptive filtering in quaterion domain intrinsically incorporates component-wise real valued cross-correlation or the coupling within the dimensions of the quaternion input. We apply generalized Hamilton-real (GHR) calculus that is applicable to quaternion Hilbert space for evaluating the cost function gradient. The QKSIG algorithm minimizes Shannon\u27s entropy of the error between the filter output and desired response and minimizes the divergence between the joint densities of input-desired and input-output pairs. The SIG technique reduces the computational complexity of the error entropy estimation. Here, ITL with SIG approach is applied to quaternion adaptive filtering for three different reasons. First, it reduces the algorithm computational complexity compared to our previous work quaternion kernel minimum error entropy algorithm (QKMEE). Second, it improves the filtering performance by considering the coupling within the dimensions of the quaternion input. Third, it performs better in biased or non-Gaussian signal and noise environments due to ITL approach. We present convergence analysis and steady-state performance analysis results of the new algorithm (QKSIG). Simulation results are used to show the behavior of the new algorithm QKSIG in quaternion non-Gaussian signal and noise environments compared to the existing ones such as quadruple real-valued kernel stochastic information gradient (KSIG) and quaternion kernel LMS (QKLMS) algorithms. Fifth, we develop a kernel adaptive filter for quaternion data, based on stochastic information gradient (SIG) cost function with self adjusting step-size. The new algorithm (QKSIG-SAS) is based on the information theoretic learning (ITL) approach. The new algorithm (QKSIG-SAS) has faster speed of convergence as compared to our previous work QKSIG algorithm

Scholar Commons - Santa Clara University

Efficient Algorithms for Searching the Minimum Information Partition in Integrated Information Theory

Author: Kanai Ryota
Kitazono Jun
Oizumi Masafumi
Publication venue: 'MDPI AG'
Publication date: 13/02/2018
Field of study

The ability to integrate information in the brain is considered to be an essential property for cognition and consciousness. Integrated Information Theory (IIT) hypothesizes that the amount of integrated information (

\Phi

) in the brain is related to the level of consciousness. IIT proposes that to quantify information integration in a system as a whole, integrated information should be measured across the partition of the system at which information loss caused by partitioning is minimized, called the Minimum Information Partition (MIP). The computational cost for exhaustively searching for the MIP grows exponentially with system size, making it difficult to apply IIT to real neural data. It has been previously shown that if a measure of

\Phi

satisfies a mathematical property, submodularity, the MIP can be found in a polynomial order by an optimization algorithm. However, although the first version of

\Phi

is submodular, the later versions are not. In this study, we empirically explore to what extent the algorithm can be applied to the non-submodular measures of

\Phi

by evaluating the accuracy of the algorithm in simulated data and real neural data. We find that the algorithm identifies the MIP in a nearly perfect manner even for the non-submodular measures. Our results show that the algorithm allows us to measure

\Phi

in large systems within a practical amount of time

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Shaping the learning landscape in neural networks around wide flat minima

Author: Baldassi Carlo
Pittorino Fabrizio
Zecchina Riccardo
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2020
Field of study

Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overfitting. How these two features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far reaching open question. In this paper we study basic non-convex one- and two-layer neural network models which learn random patterns, and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy driven greedy and message passing algorithms which focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Archivio istituzionale della ricerca - Politecnico di Milano