1,166 research outputs found

    You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle

    Full text link
    Deep learning achieves state-of-the-art results in many tasks in computer vision and natural language processing. However, recent works have shown that deep networks can be vulnerable to adversarial perturbations, which raised a serious robustness issue of deep networks. Adversarial training, typically formulated as a robust optimization problem, is an effective way of improving the robustness of deep networks. A major drawback of existing adversarial training algorithms is the computational overhead of the generation of adversarial examples, typically far greater than that of the network training. This leads to the unbearable overall computational cost of adversarial training. In this paper, we show that adversarial training can be cast as a discrete time differential game. Through analyzing the Pontryagin's Maximal Principle (PMP) of the problem, we observe that the adversary update is only coupled with the parameters of the first layer of the network. This inspires us to restrict most of the forward and back propagation within the first layer of the network during adversary updates. This effectively reduces the total number of full forward and backward propagation to only one for each group of adversary updates. Therefore, we refer to this algorithm YOPO (You Only Propagate Once). Numerical experiments demonstrate that YOPO can achieve comparable defense accuracy with approximately 1/5 ~ 1/4 GPU time of the projected gradient descent (PGD) algorithm. Our codes are available at https://https://github.com/a1600012888/YOPO-You-Only-Propagate-Once.Comment: Accepted as a conference paper at NeurIPS 201

    Convergence Rates for the Stationary and Non-stationary Navier-Stokes Equations over Non-Lipschitz Boundaries

    Full text link
    In this paper, we consider the higher-order convergence rates for the 2D stationary and non-stationary Navier-Stokes Equations over highly oscillating periodic bumpy John domains with C2C^{2} regularity in some neighborhood of the boundary point (0,0). For the stationary case and any γ(0,1/2)\gamma\in (0,1/2), using the variational equation satisfied by the solution and the correctors for the bumpy John domains obtained by Higaki, Prange and Zhuge \cite{higaki2021large,MR4619004} after correcting the values on the inflow/outflow boundaries ({0}{1})×(0,1)(\{0\}\cup\{1\})\times(0,1), we can obtain an O(ε2γ)O(\varepsilon^{2-\gamma}) approximation in L2L^2 for the velocity and an O(ε2γ)O(\varepsilon^{2-\gamma}) convergence rates in L1L^1 approximated by the so called Navier's wall laws, which generalized the results obtained by J\"{a}ger and Mikeli\'{c} \cite{MR1813101}. Moreover, for the non-stationary case, using the energy method, we can obtain an O(ε2γ+exp(Ct))O(\varepsilon^{2-\gamma}+\exp(-Ct)) convergence rate for the velocity in Lx1L_x^1

    Extension of First Passage Probability

    Get PDF
    In this paper, we consider the extension of first passage probability. First, we present the first, second, third, and generally k-th passage probability of a Markov Chain moving from one state to another state through step-by-step calculation and two other matrix-version methods. Similarly, we compute the first passage probability of a Markov Chain moving from one state to multiple states. In all discussions, we take into account the situations that one state moves to a different state and returns to itself. Also, we find the mean number of steps needed from one state to another state in a Markov Chain for the first, second, third, and generally k-th passage. Besides, we find the probability generating function for the number of steps. This makes the calculation of passage probabilities, mean and variance of passage steps, easier. Additionally, if we extend a discrete Markov Chain to its corresponding continuous Markov Process with the same transition probabilities and transition time in the form of an exponential distribution with parameter 1 between two states, we can obtain the mean time needed from one state to another state by Laplace Transforms, which is the same as with the discrete situation. Subsequently, we can calculate the variance of the time needed from one state to another state in the same way
    corecore