429 research outputs found
Duality relation between coherence and path information in the presence of quantum memory
The wave-particle duality demonstrates a competition relation between wave
and particle behavior for a particle going through an interferometer. This
duality can be formulated as an inequality, which upper bounds the sum of
interference visibility and path information. However, if the particle is
entangled with a quantum memory, then the bound may decrease. Here, we find the
duality relation between coherence and path information for a particle going
through a multipath interferometer in the presence of a quantum memory,
offering an upper bound on the duality relation which is directly connected
with the amount of entanglement between the particle and the quantum memory.Comment: 6 pages, 1 figure, comments are welcom
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popular
normalization schemes (including Batch Normalization) in today's deep learning
can move it far from a traditional optimization viewpoint, e.g., use of
exponentially increasing learning rates. The current paper highlights other
ways in which behavior of normalized nets departs from traditional viewpoints,
and then initiates a formal framework for studying their mathematics via
suitable adaptation of the conventional framework namely, modeling SGD-induced
training trajectory via a suitable stochastic differential equation (SDE) with
a noise term that captures gradient noise. This yields: (a) A new ' intrinsic
learning rate' parameter that is the product of the normal learning rate and
weight decay factor. Analysis of the SDE shows how the effective speed of
learning varies and equilibrates over time under the control of intrinsic LR.
(b) A challenge -- via theory and experiments -- to popular belief that good
generalization requires large learning rates at the start of training. (c) New
experiments, backed by mathematical intuition, suggesting the number of steps
to equilibrium (in function space) scales as the inverse of the intrinsic
learning rate, as opposed to the exponential time convergence bound implied by
SDE analysis. We name it the Fast Equilibrium Conjecture and suggest it holds
the key to why Batch Normalization is effective.Comment: 25 pages, 12 figures. Accepted By 34th Conference on Neural
Information Processing Systems (NeurIPS 2020
Quantifying the resource content of quantum channels: An operational approach
We propose a general method to operationally quantify the resourcefulness of
quantum channels via channel discrimination, an important information
processing task. A main result is that the maximum success probability of
distinguishing a given channel from the set of free channels by free probe
states is exactly characterized by the resource generating power, i.e. the
maximum amount of resource produced by the action of the channel, given by the
trace distance to the set of free states. We apply this framework to the
resource theory of quantum coherence, as an informative example. The general
results can also be easily applied to other resource theories such as
entanglement, magic states, and asymmetry.Comment: v2. 9 pages, new references are added v1. 8 pages, no figure
Beyond Hard Samples: Robust and Effective Grammatical Error Correction with Cycle Self-Augmenting
Recent studies have revealed that grammatical error correction methods in the
sequence-to-sequence paradigm are vulnerable to adversarial attack, and simply
utilizing adversarial examples in the pre-training or post-training process can
significantly enhance the robustness of GEC models to certain types of attack
without suffering too much performance loss on clean data. In this paper, we
further conduct a thorough robustness evaluation of cutting-edge GEC methods
for four different types of adversarial attacks and propose a simple yet very
effective Cycle Self-Augmenting (CSA) method accordingly. By leveraging the
augmenting data from the GEC models themselves in the post-training process and
introducing regularization data for cycle training, our proposed method can
effectively improve the model robustness of well-trained GEC models with only a
few more training epochs as an extra cost. More concretely, further training on
the regularization data can prevent the GEC models from over-fitting on
easy-to-learn samples and thus can improve the generalization capability and
robustness towards unseen data (adversarial noise/samples). Meanwhile, the
self-augmented data can provide more high-quality pseudo pairs to improve model
performance on the original testing data. Experiments on four benchmark
datasets and seven strong models indicate that our proposed training method can
significantly enhance the robustness of four types of attacks without using
purposely built adversarial examples in training. Evaluation results on clean
data further confirm that our proposed CSA method significantly improves the
performance of four baselines and yields nearly comparable results with other
state-of-the-art models. Our code is available at
https://github.com/ZetangForward/CSA-GEC
The Marginal Value of Momentum for Small Learning Rate SGD
Momentum is known to accelerate the convergence of gradient descent in
strongly convex settings without stochastic gradient noise. In stochastic
optimization, such as training neural networks, folklore suggests that momentum
may help deep learning optimization by reducing the variance of the stochastic
gradient update, but previous theoretical analyses do not find momentum to
offer any provable acceleration. Theoretical results in this paper clarify the
role of momentum in stochastic settings where the learning rate is small and
gradient noise is the dominant source of instability, suggesting that SGD with
and without momentum behave similarly in the short and long time horizons.
Experiments show that momentum indeed has limited benefits for both
optimization and generalization in practical training regimes where the optimal
learning rate is not very large, including small- to medium-batch training from
scratch on ImageNet and fine-tuning language models on downstream tasks
Coexistence Designs of Radar and Communication Systems in a Multi-path Scenario
The focus of this study is on the spectrum sharing between multiple-input
multiple-output (MIMO) communications and co-located MIMO radar systems in
multi-path environments. The major challenge is to suppress the mutual
interference between the two systems while combining the useful multi-path
components received at each system. We tackle this challenge by jointly
designing the communication precoder, radar transmit waveform and receive
filter. Specifically, the signal-to-interference-plus-noise ratio (SINR) at the
radar receiver is maximized subject to constraints on the radar waveform,
communication rate and transmit power. The multi-path propagation complicates
the expressions of the radar SINR and communication rate, leading to a
non-convex problem. To solve it, a sub-optimal algorithm based on the
alternating maximization is used to optimize the precoder, radar transmit
waveform and receive filter iteratively. Simulation results are provided to
demonstrate the effectiveness of the proposed design
- …