5,038 research outputs found

    Stochastic Approximation of Smooth and Strongly Convex Functions: Beyond the O(1/T)O(1/T) Convergence Rate

    Full text link
    Stochastic approximation (SA) is a classical approach for stochastic convex optimization. Previous studies have demonstrated that the convergence rate of SA can be improved by introducing either smoothness or strong convexity condition. In this paper, we make use of smoothness and strong convexity simultaneously to boost the convergence rate. Let λ\lambda be the modulus of strong convexity, κ\kappa be the condition number, FF_* be the minimal risk, and α>1\alpha>1 be some small constant. First, we demonstrate that, in expectation, an O(1/[λTα]+κF/T)O(1/[\lambda T^\alpha] + \kappa F_*/T) risk bound is attainable when T=Ω(κα)T = \Omega(\kappa^\alpha). Thus, when FF_* is small, the convergence rate could be faster than O(1/[λT])O(1/[\lambda T]) and approaches O(1/[λTα])O(1/[\lambda T^\alpha]) in the ideal case. Second, to further benefit from small risk, we show that, in expectation, an O(1/2T/κ+F)O(1/2^{T/\kappa}+F_*) risk bound is achievable. Thus, the excess risk reduces exponentially until reaching O(F)O(F_*), and if F=0F_*=0, we obtain a global linear convergence. Finally, we emphasize that our proof is constructive and each risk bound is equipped with an efficient stochastic algorithm attaining that bound

    Optimal Margin Distribution Machine

    Full text link
    Support vector machine (SVM) has been one of the most popular learning algorithms, with the central idea of maximizing the minimum margin, i.e., the smallest distance from the instances to the classification boundary. Recent theoretical results, however, disclosed that maximizing the minimum margin does not necessarily lead to better generalization performances, and instead, the margin distribution has been proven to be more crucial. Based on this idea, we propose a new method, named Optimal margin Distribution Machine (ODM), which tries to achieve a better generalization performance by optimizing the margin distribution. We characterize the margin distribution by the first- and second-order statistics, i.e., the margin mean and variance. The proposed method is a general learning approach which can be used in any place where SVM can be applied, and their superiority is verified both theoretically and empirically in this paper.Comment: arXiv admin note: substantial text overlap with arXiv:1311.098

    An experiential formula for the energy eigenvalues of a particle in a one-dimension finite-deep square well potential

    Full text link
    We propose an experiential formula for the calculation of the energy eigenvalues of a particle moving in a one-dimension finite-deep square well potential after some physical considerations. This formula shows a simple relation between the energy eigenvalues and the potential papameters, and can be used to estimate the energy eigenvalues in a very simple way

    Adaptive Online Learning in Dynamic Environments

    Full text link
    In this paper, we study online convex optimization in dynamic environments, and aim to bound the dynamic regret with respect to any sequence of comparators. Existing work have shown that online gradient descent enjoys an O(T(1+PT))O(\sqrt{T}(1+P_T)) dynamic regret, where TT is the number of iterations and PTP_T is the path-length of the comparator sequence. However, this result is unsatisfactory, as there exists a large gap from the Ω(T(1+PT))\Omega(\sqrt{T(1+P_T)}) lower bound established in our paper. To address this limitation, we develop a novel online method, namely adaptive learning for dynamic environment (Ader), which achieves an optimal O(T(1+PT))O(\sqrt{T(1+P_T)}) dynamic regret. The basic idea is to maintain a set of experts, each attaining an optimal dynamic regret for a specific path-length, and combines them with an expert-tracking algorithm. Furthermore, we propose an improved Ader based on the surrogate loss, and in this way the number of gradient evaluations per round is reduced from O(logT)O(\log T) to 11. Finally, we extend Ader to the setting that a sequence of dynamical models is available to characterize the comparators

    Super fidelity and related metric

    Full text link
    We report a new metric of quantum states. This metric is build up from super-fidelity, which has deep connection with the Uhlmann-Jozsa fidelity and plays an important role in quantifying entanglement. We find that the new metric possess some interesting properties

    Learning with Feature Evolvable Streams

    Full text link
    Learning with streaming data has attracted much attention during the past few years. Though most studies consider data stream with fixed features, in real practice the features may be evolvable. For example, features of data gathered by limited-lifespan sensors will change when these sensors are substituted by new ones. In this paper, we propose a novel learning paradigm: \emph{Feature Evolvable Streaming Learning} where old features would vanish and new features would occur. Rather than relying on only the current features, we attempt to recover the vanished features and exploit it to improve performance. Specifically, we learn two models from the recovered features and the current features, respectively. To benefit from the recovered features, we develop two ensemble methods. In the first method, we combine the predictions from two models and theoretically show that with the assistance of old features, the performance on new features can be improved. In the second approach, we dynamically select the best single prediction and establish a better performance guarantee when the best model switches. Experiments on both synthetic and real data validate the effectiveness of our proposal

    Adaptive Regret of Convex and Smooth Functions

    Full text link
    We investigate online convex optimization in changing environments, and choose the adaptive regret as the performance measure. The goal is to achieve a small regret over every interval so that the comparator is allowed to change over time. Different from previous works that only utilize the convexity condition, this paper further exploits smoothness to improve the adaptive regret. To this end, we develop novel adaptive algorithms for convex and smooth functions, and establish problem-dependent regret bounds over any interval. Our regret bounds are comparable to existing results in the worst case, and become much tighter when the comparator has a small loss

    The granularity effect in amorphous InGaZnO4_4 films prepared by rf sputtering method

    Full text link
    We systematically investigated the temperature behaviors of the electrical conductivity and Hall coefficient of two series of amorphous indium gallium zinc oxides (a-IGZO) films prepared by rf sputtering method. The two series of films are \sim700\,nm and \sim25\,nm thick, respectively. For each film, the conductivity increases with decreasing temperature from 300\,K to TmaxT_{\rm max}, where TmaxT_{\rm max} is the temperature at which the conductivity reaches its maximum. Below TmaxT_{\rm max}, the conductivity decreases with decreasing temperature. Both the conductivity and Hall coefficient vary linearly with lnT\ln T at low temperature regime. The lnT\ln T behaviors of conductivity and Hall coefficient cannot be explained by the traditional electron-electron interaction theory, but can be quantitatively described by the current electron-electron theory due to the presence of granularity. Combining with the scanning electron microscopy images of the films, we propose that the boundaries between the neighboring a-IGZO particles could make the film inhomogeneous and play an important role in the electron transport processes.Comment: 4 pages and 4 figure

    Online Stochastic Linear Optimization under One-bit Feedback

    Full text link
    In this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement and online recommendation. We assume the binary feedback is a random variable generated from the logit model, and aim to minimize the regret defined by the unknown linear function. Although the existing method for generalized linear bandit can be applied to our problem, the high computational cost makes it impractical for real-world problems. To address this challenge, we develop an efficient online learning algorithm by exploiting particular structures of the observation model. Specifically, we adopt online Newton step to estimate the unknown parameter and derive a tight confidence region based on the exponential concavity of the logistic loss. Our analysis shows that the proposed algorithm achieves a regret bound of O(dT)O(d\sqrt{T}), which matches the optimal result of stochastic linear bandits

    Stochastic Proximal Gradient Descent for Nuclear Norm Regularization

    Full text link
    In this paper, we utilize stochastic optimization to reduce the space complexity of convex composite optimization with a nuclear norm regularizer, where the variable is a matrix of size m×nm \times n. By constructing a low-rank estimate of the gradient, we propose an iterative algorithm based on stochastic proximal gradient descent (SPGD), and take the last iterate of SPGD as the final solution. The main advantage of the proposed algorithm is that its space complexity is O(m+n)O(m+n), in contrast, most of previous algorithms have a O(mn)O(mn) space complexity. Theoretical analysis shows that it achieves O(logT/T)O(\log T/\sqrt{T}) and O(logT/T)O(\log T/T) convergence rates for general convex functions and strongly convex functions, respectively
    corecore