1,166 research outputs found

### You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle

Deep learning achieves state-of-the-art results in many tasks in computer
vision and natural language processing. However, recent works have shown that
deep networks can be vulnerable to adversarial perturbations, which raised a
serious robustness issue of deep networks. Adversarial training, typically
formulated as a robust optimization problem, is an effective way of improving
the robustness of deep networks. A major drawback of existing adversarial
training algorithms is the computational overhead of the generation of
adversarial examples, typically far greater than that of the network training.
This leads to the unbearable overall computational cost of adversarial
training. In this paper, we show that adversarial training can be cast as a
discrete time differential game. Through analyzing the Pontryagin's Maximal
Principle (PMP) of the problem, we observe that the adversary update is only
coupled with the parameters of the first layer of the network. This inspires us
to restrict most of the forward and back propagation within the first layer of
the network during adversary updates. This effectively reduces the total number
of full forward and backward propagation to only one for each group of
adversary updates. Therefore, we refer to this algorithm YOPO (You Only
Propagate Once). Numerical experiments demonstrate that YOPO can achieve
comparable defense accuracy with approximately 1/5 ~ 1/4 GPU time of the
projected gradient descent (PGD) algorithm. Our codes are available at
https://https://github.com/a1600012888/YOPO-You-Only-Propagate-Once.Comment: Accepted as a conference paper at NeurIPS 201

### Convergence Rates for the Stationary and Non-stationary Navier-Stokes Equations over Non-Lipschitz Boundaries

In this paper, we consider the higher-order convergence rates for the 2D
stationary and non-stationary Navier-Stokes Equations over highly oscillating
periodic bumpy John domains with $C^{2}$ regularity in some neighborhood of the
boundary point (0,0). For the stationary case and any $\gamma\in (0,1/2)$,
using the variational equation satisfied by the solution and the correctors for
the bumpy John domains obtained by Higaki, Prange and Zhuge
\cite{higaki2021large,MR4619004} after correcting the values on the
inflow/outflow boundaries $(\{0\}\cup\{1\})\times(0,1)$, we can obtain an
$O(\varepsilon^{2-\gamma})$ approximation in $L^2$ for the velocity and an
$O(\varepsilon^{2-\gamma})$ convergence rates in $L^1$ approximated by the so
called Navier's wall laws, which generalized the results obtained by J\"{a}ger
and Mikeli\'{c} \cite{MR1813101}. Moreover, for the non-stationary case, using
the energy method, we can obtain an $O(\varepsilon^{2-\gamma}+\exp(-Ct))$
convergence rate for the velocity in $L_x^1$

### Extension of First Passage Probability

In this paper, we consider the extension of first passage probability. First, we present the first, second, third, and generally k-th passage probability of a Markov Chain moving from one state to another state through step-by-step calculation and two other matrix-version methods. Similarly, we compute the first passage probability of a Markov Chain moving from one state to multiple states. In all discussions, we take into account the situations that one state moves to a different state and returns to itself. Also, we find the mean number of steps needed from one state to another state in a Markov Chain for the first, second, third, and generally k-th passage. Besides, we find the probability generating function for the number of steps. This makes the calculation of passage probabilities, mean and variance of passage steps, easier. Additionally, if we extend a discrete Markov Chain to its corresponding continuous Markov Process with the same transition probabilities and transition time in the form of an exponential distribution with parameter 1 between two states, we can obtain the mean time needed from one state to another state by Laplace Transforms, which is the same as with the discrete situation. Subsequently, we can calculate the variance of the time needed from one state to another state in the same way

- …