Search CORE

5,038 research outputs found

Stochastic Approximation of Smooth and Strongly Convex Functions: Beyond the $O(1/T)$ Convergence Rate

Author: Zhang Lijun
Zhou Zhi-Hua
Publication venue
Publication date: 27/01/2019
Field of study

Stochastic approximation (SA) is a classical approach for stochastic convex optimization. Previous studies have demonstrated that the convergence rate of SA can be improved by introducing either smoothness or strong convexity condition. In this paper, we make use of smoothness and strong convexity simultaneously to boost the convergence rate. Let

\lambda

be the modulus of strong convexity,

\kappa

be the condition number,

F_*

be the minimal risk, and

\alpha>1

be some small constant. First, we demonstrate that, in expectation, an

O(1/[\lambda T^\alpha] + \kappa F_*/T)

risk bound is attainable when

T = \Omega(\kappa^\alpha)

. Thus, when

F_*

is small, the convergence rate could be faster than

O(1/[\lambda T])

and approaches

O(1/[\lambda T^\alpha])

in the ideal case. Second, to further benefit from small risk, we show that, in expectation, an

O(1/2^{T/\kappa}+F_*)

risk bound is achievable. Thus, the excess risk reduces exponentially until reaching

O(F_*)

, and if

F_*=0

, we obtain a global linear convergence. Finally, we emphasize that our proof is constructive and each risk bound is equipped with an efficient stochastic algorithm attaining that bound

arXiv.org e-Print Archive

Optimal Margin Distribution Machine

Author: Zhang Teng
Zhou Zhi-Hua
Publication venue
Publication date: 12/04/2016
Field of study

Support vector machine (SVM) has been one of the most popular learning algorithms, with the central idea of maximizing the minimum margin, i.e., the smallest distance from the instances to the classification boundary. Recent theoretical results, however, disclosed that maximizing the minimum margin does not necessarily lead to better generalization performances, and instead, the margin distribution has been proven to be more crucial. Based on this idea, we propose a new method, named Optimal margin Distribution Machine (ODM), which tries to achieve a better generalization performance by optimizing the margin distribution. We characterize the margin distribution by the first- and second-order statistics, i.e., the margin mean and variance. The proposed method is a general learning approach which can be used in any place where SVM can be applied, and their superiority is verified both theoretically and empirically in this paper.Comment: arXiv admin note: substantial text overlap with arXiv:1311.098

arXiv.org e-Print Archive

An experiential formula for the energy eigenvalues of a particle in a one-dimension finite-deep square well potential

Author: Yuan Chun-Hua
Zhang Zhi-Ming
Publication venue
Publication date: 29/09/2004
Field of study

We propose an experiential formula for the calculation of the energy eigenvalues of a particle moving in a one-dimension finite-deep square well potential after some physical considerations. This formula shows a simple relation between the energy eigenvalues and the potential papameters, and can be used to estimate the energy eigenvalues in a very simple way

arXiv.org e-Print Archive

CERN Document Server

Adaptive Online Learning in Dynamic Environments

Author: Lu Shiyin
Zhang Lijun
Zhou Zhi-Hua
Publication venue
Publication date: 25/10/2018
Field of study

In this paper, we study online convex optimization in dynamic environments, and aim to bound the dynamic regret with respect to any sequence of comparators. Existing work have shown that online gradient descent enjoys an

O(\sqrt{T}(1+P_T))

dynamic regret, where

T

is the number of iterations and

P_T

is the path-length of the comparator sequence. However, this result is unsatisfactory, as there exists a large gap from the

\Omega(\sqrt{T(1+P_T)})

lower bound established in our paper. To address this limitation, we develop a novel online method, namely adaptive learning for dynamic environment (Ader), which achieves an optimal

O(\sqrt{T(1+P_T)})

dynamic regret. The basic idea is to maintain a set of experts, each attaining an optimal dynamic regret for a specific path-length, and combines them with an expert-tracking algorithm. Furthermore, we propose an improved Ader based on the surrogate loss, and in this way the number of gradient evaluations per round is reduced from

O(\log T)

1

. Finally, we extend Ader to the setting that a sequence of dynamical models is available to characterize the comparators

arXiv.org e-Print Archive

Super fidelity and related metric

Author: Chen Jing-Ling
Chen Zhi-Hua
Ma Zhi-Hao
Zhang Fu-Lin
Publication venue
Publication date: 23/08/2009
Field of study

We report a new metric of quantum states. This metric is build up from super-fidelity, which has deep connection with the Uhlmann-Jozsa fidelity and plays an important role in quantifying entanglement. We find that the new metric possess some interesting properties

arXiv.org e-Print Archive

Learning with Feature Evolvable Streams

Author: Hou Bo-Jian
Zhang Lijun
Zhou Zhi-Hua
Publication venue
Publication date: 08/01/2018
Field of study

Learning with streaming data has attracted much attention during the past few years. Though most studies consider data stream with fixed features, in real practice the features may be evolvable. For example, features of data gathered by limited-lifespan sensors will change when these sensors are substituted by new ones. In this paper, we propose a novel learning paradigm: \emph{Feature Evolvable Streaming Learning} where old features would vanish and new features would occur. Rather than relying on only the current features, we attempt to recover the vanished features and exploit it to improve performance. Specifically, we learn two models from the recovered features and the current features, respectively. To benefit from the recovered features, we develop two ensemble methods. In the first method, we combine the predictions from two models and theoretically show that with the assistance of old features, the performance on new features can be improved. In the second approach, we dynamically select the best single prediction and establish a better performance guarantee when the best model switches. Experiments on both synthetic and real data validate the effectiveness of our proposal

arXiv.org e-Print Archive

Adaptive Regret of Convex and Smooth Functions

Author: Liu Tie-Yan
Zhang Lijun
Zhou Zhi-Hua
Publication venue
Publication date: 15/06/2019
Field of study

We investigate online convex optimization in changing environments, and choose the adaptive regret as the performance measure. The goal is to achieve a small regret over every interval so that the comparator is allowed to change over time. Different from previous works that only utilize the convexity condition, this paper further exploits smoothness to improve the adaptive regret. To this end, we develop novel adaptive algorithms for convex and smooth functions, and establish problem-dependent regret bounds over any interval. Our regret bounds are comparable to existing results in the worst case, and become much tighter when the comparator has a small loss

arXiv.org e-Print Archive

The granularity effect in amorphous InGaZnO $_4$ films prepared by rf sputtering method

Author: Li Zhi-Qing
Xie Xin-Jian
Zhang Hui
Zhang Xin-Hua
Publication venue: 'Wiley'
Publication date: 03/06/2016
Field of study

We systematically investigated the temperature behaviors of the electrical conductivity and Hall coefficient of two series of amorphous indium gallium zinc oxides (a-IGZO) films prepared by rf sputtering method. The two series of films are

\sim

700\,nm and

\sim

25\,nm thick, respectively. For each film, the conductivity increases with decreasing temperature from 300\,K to

T_{\rm max}

, where

T_{\rm max}

is the temperature at which the conductivity reaches its maximum. Below

T_{\rm max}

, the conductivity decreases with decreasing temperature. Both the conductivity and Hall coefficient vary linearly with

\ln T

at low temperature regime. The

\ln T

behaviors of conductivity and Hall coefficient cannot be explained by the traditional electron-electron interaction theory, but can be quantitatively described by the current electron-electron theory due to the presence of granularity. Combining with the scanning electron microscopy images of the films, we propose that the boundaries between the neighboring a-IGZO particles could make the film inhomogeneous and play an important role in the electron transport processes.Comment: 4 pages and 4 figure

arXiv.org e-Print Archive

Online Stochastic Linear Optimization under One-bit Feedback

Author: Jin Rong
Yang Tianbao
Zhang Lijun
Zhou Zhi-Hua
Publication venue
Publication date: 25/09/2015
Field of study

In this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement and online recommendation. We assume the binary feedback is a random variable generated from the logit model, and aim to minimize the regret defined by the unknown linear function. Although the existing method for generalized linear bandit can be applied to our problem, the high computational cost makes it impractical for real-world problems. To address this challenge, we develop an efficient online learning algorithm by exploiting particular structures of the observation model. Specifically, we adopt online Newton step to estimate the unknown parameter and derive a tight confidence region based on the exponential concavity of the logistic loss. Our analysis shows that the proposed algorithm achieves a regret bound of

O(d\sqrt{T})

, which matches the optimal result of stochastic linear bandits

arXiv.org e-Print Archive

Stochastic Proximal Gradient Descent for Nuclear Norm Regularization

Author: Jin Rong
Yang Tianbao
Zhang Lijun
Zhou Zhi-Hua
Publication venue
Publication date: 05/12/2015
Field of study

In this paper, we utilize stochastic optimization to reduce the space complexity of convex composite optimization with a nuclear norm regularizer, where the variable is a matrix of size

m \times n

. By constructing a low-rank estimate of the gradient, we propose an iterative algorithm based on stochastic proximal gradient descent (SPGD), and take the last iterate of SPGD as the final solution. The main advantage of the proposed algorithm is that its space complexity is

O(m+n)

, in contrast, most of previous algorithms have a

O(mn)

space complexity. Theoretical analysis shows that it achieves

O(\log T/\sqrt{T})

and

O(\log T/T)

convergence rates for general convex functions and strongly convex functions, respectively

arXiv.org e-Print Archive