94 research outputs found
Uncertain natural frequency analysis of composite plates including effect of noise – A polynomial neural network approach
Acknowledgement SN and SS gratefully acknowledge the financial support from Lloyd’s Register Foundation Centre during this work.Peer reviewedPostprin
Chaotic Time Series Forecasting Using Higher Order Neural Networks
This study presents a novel application and comparison of higher order neural networks (HONNs) to forecast benchmark chaotic time series. Two models of HONNs were implemented, namely functional link neural network (FLNN) and pi-sigma neural network (PSNN). These models were tested on two benchmark time series; the monthly smoothed sunspot numbers and the Mackey-Glass time-delay differential equation time series. The forecasting performance of the HONNs is compared against the performance of different models previously used in the literature such as fuzzy and neural networks models. Simulation results showed that FLNN and PSNN offer good performance compared to many previously used hybrid models
A Comprehensive Survey on Pi-Sigma Neural Network for Time Series Prediction
Prediction of time series grabs received much attention because of its effect on the vast range of real life applications. This paper presents a survey of time series applications using Higher Order Neural Network (HONN) model. The basic motivation behind using HONN is the ability to expand the input space, to solve complex problems it becomes more efficient and perform high learning abilities of the time series forecasting. Pi-Sigma Neural Network (PSNN) includes indirectly the capabilities of higher order networks using product cells as the output units and less number of weights. The goal of this research is to present the reader awareness about PSNN for time series prediction, to highlight some benefits and challenges using PSNN. Possible fields of PSNN applications in comparison with existing methods are presented and future directions are also explored in advantage with the properties of error feedback and recurrent networks
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
The widely used stochastic gradient methods for minimizing nonconvex
composite objective functions require the Lipschitz smoothness of the
differentiable part. But the requirement does not hold true for problem classes
including quadratic inverse problems and training neural networks. To address
this issue, we investigate a family of stochastic Bregman proximal gradient
(SBPG) methods, which only require smooth adaptivity of the differentiable
part. SBPG replaces the upper quadratic approximation used in SGD with the
Bregman proximity measure, resulting in a better approximation model that
captures the non-Lipschitz gradients of the nonconvex objective. We formulate
the vanilla SBPG and establish its convergence properties under nonconvex
setting without finite-sum structure. Experimental results on quadratic inverse
problems testify the robustness of SBPG. Moreover, we propose a momentum-based
version of SBPG (MSBPG) and prove it has improved convergence properties. We
apply MSBPG to the training of deep neural networks with a polynomial kernel
function, which ensures the smooth adaptivity of the loss function.
Experimental results on representative benchmarks demonstrate the effectiveness
and robustness of MSBPG in training neural networks. Since the additional
computation cost of MSBPG compared with SGD is negligible in large-scale
optimization, MSBPG can potentially be employed as an universal open-source
optimizer in the future.Comment: 37 page
Recent Advances in Randomized Methods for Big Data Optimization
In this thesis, we discuss and develop randomized algorithms for big data problems. In particular, we study the finite-sum optimization with newly emerged variance- reduction optimization methods (Chapter 2), explore the efficiency of second-order information applied to both convex and non-convex finite-sum objectives (Chapter 3) and employ the fast first-order method in power system problems (Chapter 4).In Chapter 2, we propose two variance-reduced gradient algorithms – mS2GD and SARAH. mS2GD incorporates a mini-batching scheme for improving the theoretical complexity and practical performance of SVRG/S2GD, aiming to minimize a strongly convex function represented as the sum of an average of a large number of smooth con- vex functions and a simple non-smooth convex regularizer. While SARAH, short for StochAstic Recursive grAdient algoritHm and using a stochastic recursive gradient, targets at minimizing the average of a large number of smooth functions for both con- vex and non-convex cases. Both methods fall into the category of variance-reduction optimization, and obtain a total complexity of O((n+κ)log(1/ε)) to achieve an ε-accuracy solution for strongly convex objectives, while SARAH also maintains a sub-linear convergence for non-convex problems. Meanwhile, SARAH has a practical variant SARAH+ due to its linear convergence of the expected stochastic gradients in inner loops.In Chapter 3, we declare that randomized batches can be applied with second- order information, as to improve upon convergence in both theory and practice, with a framework of L-BFGS as a novel approach to finite-sum optimization problems. We provide theoretical analyses for both convex and non-convex objectives. Meanwhile, we propose LBFGS-F as a variant where Fisher information matrix is used instead of Hessian information, and prove it applicable to a distributed environment within the popular applications of least-square and cross-entropy losses.In Chapter 4, we develop fast randomized algorithms for solving polynomial optimization problems on the applications of alternating-current optimal power flows (ACOPF) in power system field. The traditional research on power system problem focuses on solvers using second-order method, while no randomized algorithms have been developed. First, we propose a coordinate-descent algorithm as an online solver, applied for solving time-varying optimization problems in power systems. We bound the difference between the current approximate optimal cost generated by our algorithm and the optimal cost for a relaxation using the most recent data from above by a function of the properties of the instance and the rate of change to the instance over time. Second, we focus on a steady-state problem in power systems, and study means of switching from solving a convex relaxation to Newton method working on a non-convex (augmented) Lagrangian of the problem
Recommended from our members
Fundamental Results on Asynchronous Parallel Optimization Algorithms
In this thesis, we present a body of work on the performance and convergence properties of asynchronous-parallel algorithms completed over the course of my doctorate degree (Hannah, Feng, and Wotao Yin 2018; Hannah and Wotao Yin 2017b; T. Sun, Hannah, and Wotao Yin 2017; Hannah and Wotao Yin 2017a). Asynchronous algorithms eliminate the costly synchronization penalty of traditional synchronous-parallel algorithms. They do this by having computing nodes utilize the most recently available information to compute updates. However, it’s not immediately clear whether the trade-off of eliminating synchronization penalty at the cost of using outdated information is favorable.We first give a comprehensive theoretical justification of the performance advantages of asynchronous algorithms, which we summarize as "Faster Iterations, Same Quality" (Hannah and Wotao Yin 2017a). Under a well-justified model, we show that asynchronous algorithms complete "Faster Iterations". Using renewal theory, we demonstrate how network delays, heterogeneous sub-problem difficulty and computing power greatly hinder synchronous algorithms, but have no impact on their asynchronous counterparts. We next prove the first exact convergence rate results for a variety of synchronous algorithms including synchronous ARock and synchronous randomized block coordinate descent (sync-RBCD). This allows us to make a fair comparison between these algorithms and their asynchronous counterparts. Finally, we show that a variety of asynchronous algorithms have a convergence rate that essentially matches the previously derived exact rates for synchronous counterparts so long as the delays are not too large. Hence asynchronous algorithms complete faster iteration that are of the "Same Quality" as synchronous algorithms. Therefore we conclude that a wide variety of asynchronous algorithms will always outcompete their synchronous counterparts if the delays are not too large, and especially at scale.Next, we present the first asynchonous Nesterov-accelerated algorithm that attains a speedup: A2BCD (Hannah, Feng, and Wotao Yin 2018). We first prove that A2BCD attains NU_ACDM’s complexity to highest order. NU_ACDM is a state-of-the-art accelerated coordinate descent algorithm (Allen-Zhu, Qu, et al. 2016). Then we show that both A2BCD and NU_ACDM both have optimal complexity. Hence because A2BCD has faster iterations, and optimal complexity, it should be the fastest coordinate descent algorithm. We verify this with numerical experiments comparing A2BCD with NU_ACDM. We find that A2BCD is up to 4-5x faster than NU_ACDM, and hence conclude that our algorithm is the current fastest coordinate descent algorithm that exists. Finally, we derive a second-order ODE, which is the continuous-time limit of A2BCD. The ODE analysis motivates and clarifies our proof strategy.Lastly, we present earlier foundational work that comprises the basis of the technical innovations that made the previous results possible (Hannah and Wotao Yin 2017b). We show that ARock and its many special cases may converge even under unbounded delays (both stochastic and deterministic). These results sidestep longstanding impossibility results derived in the 1980s by making slightly stronger assumptions. They were also an early demonstration of the power of meticulous Lyapunov-function construction techniques pioneered in this body of work
Deep Machine Learning with Spatio-Temporal Inference
Deep Machine Learning (DML) refers to methods which utilize hierarchies of more than one or two layers of computational elements to achieve learning. DML may draw upon biomemetic models, or may be simply biologically-inspired. Regardless, these architectures seek to employ hierarchical processing as means of mimicking the ability of the human brain to process a myriad of sensory data and make meaningful decisions based on this data. In this dissertation we present a novel DML architecture which is biologically-inspired in that (1) all processing is performed hierarchically; (2) all processing units are identical; and (3) processing captures both spatial and temporal dependencies in the observations to organize and extract features suitable for supervised learning. We call this architecture Deep Spatio-Temporal Inference Network (DeSTIN). In this framework, patterns observed in pixel data at the lowest layer of the hierarchy are organized and fit to generalizations using decomposition algorithms. Subsequent spatial layers draw upon previous layers, their own temporal observations and beliefs, and the observations and beliefs of parent nodes to extract features suitable for supervised learning using standard classifiers such as feedforward neural networks. Hence, DeSTIN is viewed as an unsupervised feature extraction scheme in the sense that rather than relying on human engineering to determine features for a particular problem, DeSTIN naturally constructs features of interest by representing salient regularities in the patterns observed. Detailed discussion and analysis of the DeSTIN framework is provided, including focus on its key components of generalization through online clustering and temporal inference. We present a variety of implementation details, including static and dynamic learning formulations, and function approximation methods. Results on standardized datasets of handwritten digits as well as face and optic nerve detection are presented, illustrating the efficacy of the proposed approach
International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book
The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions.
This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
Fast algorithms for smooth and monotone covariance matrix estimation
In this thesis the problem of interest is, within the setting of financial risk management, covariance matrix estimation from limited number of high dimensional independent identically distributed (i.i.d.) multivariate samples when the random variables of interest have a natural spatial indexing along a low-dimensional manifold, e.g., along a line. Sample covariance matrix estimate is fraught with peril in this context. A variety of approaches to improve the covariance estimates have been developed by exploiting knowledge of structure in the data, which, however, in general impose very strict structure. We instead exploit another formulation which assumes that the covariance matrix is smooth and monotone with respect to the spatial indexing. Originally the formulation is derived from the estimation problem within a convex-optimization framework, and the resulting semidefinite-programming problem (SDP) is solved by an interior-point method (IPM). However, solving SDP via an IPM can become unduly computationally expensive for large covariance matrices. Motivated by this observation, this thesis develops highly efficient first-order solvers for smooth and monotone covariance matrix estimation. We propose two types of solvers for covariance matrix estimation: first based on projected gradients, and then based on recently developed optimal first order methods. Given such numerical algorithms, we present a comprehensive experimental analysis. We first demonstrate the benefits of imposing smoothness and monotonicity constraints in covariance matrix estimation in a number of scenarios, involving limited, missing, and asynchronous data. We then demonstrate the potential computational benefits offered by first order methods through a detailed comparison to solution of the problem via IPMs
- …