7,934 research outputs found

    Linear Convergence of Variance-Reduced Stochastic Gradient without Strong Convexity

    Full text link
    Stochastic gradient algorithms estimate the gradient based on only one or a few samples and enjoy low computational cost per iteration. They have been widely used in large-scale optimization problems. However, stochastic gradient algorithms are usually slow to converge and achieve sub-linear convergence rates, due to the inherent variance in the gradient computation. To accelerate the convergence, some variance-reduced stochastic gradient algorithms, e.g., proximal stochastic variance-reduced gradient (Prox-SVRG) algorithm, have recently been proposed to solve strongly convex problems. Under the strongly convex condition, these variance-reduced stochastic gradient algorithms achieve a linear convergence rate. However, many machine learning problems are convex but not strongly convex. In this paper, we introduce Prox-SVRG and its projected variant called Variance-Reduced Projected Stochastic Gradient (VRPSG) to solve a class of non-strongly convex optimization problems widely used in machine learning. As the main technical contribution of this paper, we show that both VRPSG and Prox-SVRG achieve a linear convergence rate without strong convexity. A key ingredient in our proof is a Semi-Strongly Convex (SSC) inequality which is the first to be rigorously proved for a class of non-strongly convex problems in both constrained and regularized settings. Moreover, the SSC inequality is independent of algorithms and may be applied to analyze other stochastic gradient algorithms besides VRPSG and Prox-SVRG, which may be of independent interest. To the best of our knowledge, this is the first work that establishes the linear convergence rate for the variance-reduced stochastic gradient algorithms on solving both constrained and regularized problems without strong convexity.Comment: 18 page

    Second order asymptotical regularization methods for inverse problems in partial differential equations

    Full text link
    We develop Second Order Asymptotical Regularization (SOAR) methods for solving inverse source problems in elliptic partial differential equations with both Dirichlet and Neumann boundary data. We show the convergence results of SOAR with the fixed damping parameter, as well as with a dynamic damping parameter, which is a continuous analog of Nesterov's acceleration method. Moreover, by using Morozov's discrepancy principle together with a newly developed total energy discrepancy principle, we prove that the approximate solution of SOAR weakly converges to an exact source function as the measurement noise goes to zero. A damped symplectic scheme, combined with the finite element method, is developed for the numerical implementation of SOAR, which yields a novel iterative regularization scheme for solving inverse source problems. Several numerical examples are given to show the accuracy and the acceleration effect of SOAR. A comparison with the state-of-the-art methods is also provided

    A new class of accelerated regularization methods, with application to bioluminescence tomography

    Full text link
    In this paper we propose a new class of iterative regularization methods for solving ill-posed linear operator equations. The prototype of these iterative regularization methods is in the form of second order evolution equation with a linear vanishing damping term, which can be viewed not only as an extension of the asymptotical regularization, but also as a continuous analog of the Nesterov's acceleration scheme. New iterative regularization methods are derived from this continuous model in combination with damped symplectic numerical schemes. The regularization property as well as convergence rates and acceleration effects under the H\"older-type source conditions of both continuous and discretized methods are proven. The second part of this paper is concerned with the application of the newly developed accelerated iterative regularization methods to the diffusion-based bioluminescence tomography, which is modeled as an inverse source problem in elliptic partial differential equations with both Dirichlet and Neumann boundary data. A relaxed mathematical formulation is proposed so that the discrepancy principle can be applied to the iterative scheme without the usage of Sobolev embedding constants. Several numerical examples, as well as a comparison with the state-of-the-art methods, are given to show the accuracy and the acceleration effect of the new methods

    Multi-Stage Multi-Task Feature Learning

    Full text link
    Multi-task sparse feature learning aims to improve the generalization performance by exploiting the shared features among tasks. It has been successfully applied to many applications including computer vision and biomedical informatics. Most of the existing multi-task sparse feature learning algorithms are formulated as a convex sparse regularization problem, which is usually suboptimal, due to its looseness for approximating an â„“0\ell_0-type regularizer. In this paper, we propose a non-convex formulation for multi-task sparse feature learning based on a novel non-convex regularizer. To solve the non-convex optimization problem, we propose a Multi-Stage Multi-Task Feature Learning (MSMTFL) algorithm; we also provide intuitive interpretations, detailed convergence and reproducibility analysis for the proposed algorithm. Moreover, we present a detailed theoretical analysis showing that MSMTFL achieves a better parameter estimation error bound than the convex formulation. Empirical studies on both synthetic and real-world data sets demonstrate the effectiveness of MSMTFL in comparison with the state of the art multi-task sparse feature learning algorithms.Comment: The short version appears in NIPS 201

    Performance of two-dimensional tidal turbine arrays in free surface flow

    Full text link
    Encouraged by recent studies on the performance of tidal turbine arrays, we extend the classical momentum actuator disc theory to include the free surface effects and allow the vertical arrangement of turbines. Most existing literatures concern one dimensional arrays with single turbine in the vertical direction, while the arrays in this work are two dimensional (with turbines in both the vertical and lateral directions) and also partially block the channel which width is far larger than height. The vertical mixing of array scale flow is assumed to take place much faster than lateral one. This assumption has been verified by numerical simulations. Fixing the total turbine area and utilized width, the comparison between two-dimensional and traditional one-dimensional arrays is investigated. The results suggest that the two dimensional arrangements of smaller turbines are preferred to one dimensional arrays from both the power coefficient and efficiency perspectives. When channel dynamics are considered, the power increase would be partly offset according to the parameters of the channel but the optimal arrangement is unchangeable. Furthermore, we consider how to arrange finite number of turbines in a channel. It is shown that an optimal distribution of turbines in two directions is found. Finally, the scenario of arranging turbines in infinite flow, which is the limiting condition of small blockages, is analysed. A new maximum power coefficient 0.869 occurs when Fr=0.2Fr=0.2, greatly increasing the peak power compared with existing results.Comment: 36 pages, 16 figure

    Dilemmatic Deliberations In Kierkegaard’s Fear and Trembling

    Get PDF
    My central claim in this paper is that Kierkegaard’s Fear and Trembling is governed by the basic aim to articulate a real dilemma, and to elicit its proper recognition as such. I begin by indicating how Kierkegaard’s works are shaped in general by this aim, and what the aim involves. I then show how the dilemmaticstructure of Fear and Trembling is obscured in a recent dispute between Michelle Kosch and John Lippitt regarding the basic aims and upshot of the book. Finally, I consider two critical questions: Why does Kierkegaard present his dilemmatic reasoning in the form of a “dialectical lyric”? And why does he write a book that aims only to articulate a dilemma, and not also to resolve it

    Magnetic extraction of energy from accretion disc around a rotating black hole

    Full text link
    An analytical expression for the disc power is derived based on an equivalent circuit in black hole (BH) magnetosphere with a mapping relation between the radial coordinate of the disc and that of unknown astrophysical load. It turns out that this disc power is comparable with two other disc powers derived in the Poynting flux and hydrodynamic regimes, respectively. In addition, the relative importance of the disc power relative to the BZ power is discussed. It is shown that the BZ power is generally dominated by the disc power except some extreme cases. Furthermore, we show that the disc power derived in our model can be well fitted with the jet power of M87.Comment: 7 pages, 5 figure

    Acoustic-To-Word Model Without OOV

    Full text link
    Recently, the acoustic-to-word model based on the Connectionist Temporal Classification (CTC) criterion was shown as a natural end-to-end model directly targeting words as output units. However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Therefore, such word-based CTC model can only recognize the frequent words modeled by the network output nodes. It also cannot easily handle the hot-words which emerge after the model is trained. In this study, we improve the acoustic-to-word model with a hybrid CTC model which can predict both words and characters at the same time. With a shared-hidden-layer structure and modular design, the alignments of words generated from the word-based CTC and the character-based CTC are synchronized. Whenever the acoustic-to-word model emits an OOV token, we back off that OOV segment to the word output generated from the character-based CTC, hence solving the OOV or hot-words issue. Evaluated on a Microsoft Cortana voice assistant task, the proposed model can reduce the errors introduced by the OOV output token in the acoustic-to-word model by 30%

    A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems

    Full text link
    Non-convex sparsity-inducing penalties have recently received considerable attentions in sparse learning. Recent theoretical investigations have demonstrated their superiority over the convex counterparts in several sparse learning settings. However, solving the non-convex optimization problems associated with non-convex penalties remains a big challenge. A commonly used approach is the Multi-Stage (MS) convex relaxation (or DC programming), which relaxes the original non-convex problem to a sequence of convex problems. This approach is usually not very practical for large-scale problems because its computational cost is a multiple of solving a single convex problem. In this paper, we propose a General Iterative Shrinkage and Thresholding (GIST) algorithm to solve the nonconvex optimization problem for a large class of non-convex penalties. The GIST algorithm iteratively solves a proximal operator problem, which in turn has a closed-form solution for many commonly used penalties. At each outer iteration of the algorithm, we use a line search initialized by the Barzilai-Borwein (BB) rule that allows finding an appropriate step size quickly. The paper also presents a detailed convergence analysis of the GIST algorithm. The efficiency of the proposed algorithm is demonstrated by extensive experiments on large-scale data sets

    Advancing Acoustic-to-Word CTC Model

    Full text link
    The acoustic-to-word model based on the connectionist temporal classification (CTC) criterion was shown as a natural end-to-end (E2E) model directly targeting words as output units. However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Hence, such a word-based CTC model can only recognize the frequent words modeled by the network output nodes. Our first attempt to improve the acoustic-to-word model is a hybrid CTC model which consults a letter-based CTC when the word-based CTC model emits OOV tokens during testing time. Then, we propose a much better solution by training a mixed-unit CTC model which decomposes all the OOV words into sequences of frequent words and multi-letter units. Evaluated on a 3400 hours Microsoft Cortana voice assistant task, the final acoustic-to-word solution improves the baseline word-based CTC by relative 12.09% word error rate (WER) reduction when combined with our proposed attention CTC. Such an E2E model without using any language model (LM) or complex decoder outperforms the traditional context-dependent phoneme CTC which has strong LM and decoder by relative 6.79%.Comment: Accepted at ICASSP 201
    • …
    corecore