142 research outputs found
Infusing Hierarchical Guidance into Prompt Tuning: A Parameter-Efficient Framework for Multi-level Implicit Discourse Relation Recognition
Multi-level implicit discourse relation recognition (MIDRR) aims at
identifying hierarchical discourse relations among arguments. Previous methods
achieve the promotion through fine-tuning PLMs. However, due to the data
scarcity and the task gap, the pre-trained feature space cannot be accurately
tuned to the task-specific space, which even aggravates the collapse of the
vanilla space. Besides, the comprehension of hierarchical semantics for MIDRR
makes the conversion much harder. In this paper, we propose a prompt-based
Parameter-Efficient Multi-level IDRR (PEMI) framework to solve the above
problems. First, we leverage parameter-efficient prompt tuning to drive the
inputted arguments to match the pre-trained space and realize the approximation
with few parameters. Furthermore, we propose a hierarchical label refining
(HLR) method for the prompt verbalizer to deeply integrate hierarchical
guidance into the prompt tuning. Finally, our model achieves comparable results
on PDTB 2.0 and 3.0 using about 0.1% trainable parameters compared with
baselines and the visualization demonstrates the effectiveness of our HLR
method.Comment: accepted to ACL 202
Fast Adversarial Training with Smooth Convergence
Fast adversarial training (FAT) is beneficial for improving the adversarial
robustness of neural networks. However, previous FAT work has encountered a
significant issue known as catastrophic overfitting when dealing with large
perturbation budgets, \ie the adversarial robustness of models declines to near
zero during training.
To address this, we analyze the training process of prior FAT work and
observe that catastrophic overfitting is accompanied by the appearance of loss
convergence outliers.
Therefore, we argue a moderately smooth loss convergence process will be a
stable FAT process that solves catastrophic overfitting.
To obtain a smooth loss convergence process, we propose a novel oscillatory
constraint (dubbed ConvergeSmooth) to limit the loss difference between
adjacent epochs. The convergence stride of ConvergeSmooth is introduced to
balance convergence and smoothing. Likewise, we design weight centralization
without introducing additional hyperparameters other than the loss balance
coefficient.
Our proposed methods are attack-agnostic and thus can improve the training
stability of various FAT techniques.
Extensive experiments on popular datasets show that the proposed methods
efficiently avoid catastrophic overfitting and outperform all previous FAT
methods. Code is available at \url{https://github.com/FAT-CS/ConvergeSmooth}
Catastrophic Overfitting: A Potential Blessing in Disguise
Fast Adversarial Training (FAT) has gained increasing attention within the
research community owing to its efficacy in improving adversarial robustness.
Particularly noteworthy is the challenge posed by catastrophic overfitting (CO)
in this field. Although existing FAT approaches have made strides in mitigating
CO, the ascent of adversarial robustness occurs with a non-negligible decline
in classification accuracy on clean samples. To tackle this issue, we initially
employ the feature activation differences between clean and adversarial
examples to analyze the underlying causes of CO. Intriguingly, our findings
reveal that CO can be attributed to the feature coverage induced by a few
specific pathways. By intentionally manipulating feature activation differences
in these pathways with well-designed regularization terms, we can effectively
mitigate and induce CO, providing further evidence for this observation.
Notably, models trained stably with these terms exhibit superior performance
compared to prior FAT work. On this basis, we harness CO to achieve `attack
obfuscation', aiming to bolster model performance. Consequently, the models
suffering from CO can attain optimal classification accuracy on both clean and
adversarial data when adding random noise to inputs during evaluation. We also
validate their robustness against transferred adversarial examples and the
necessity of inducing CO to improve robustness. Hence, CO may not be a problem
that has to be solved
Partitioning of Kinetic Energy in the Arctic Ocean's Beaufort Gyre
Kinetic energy (KE) in the Arctic Ocean's Beaufort Gyre is dominated by the mesoscale eddy field that plays a central role in the transport of freshwater, heat, and biogeochemical tracers. Understanding Beaufort Gyre KE variability sheds light on how this freshwater reservoir responds to wind forcing and sea ice and ocean changes. The evolution and fate of mesoscale eddies relate to energy pathways in the ocean (e.g., the exchange of energy between barotropic and baroclinic modes). Mooring measurements of horizontal velocities in the Beaufort Gyre are analyzed to partition KE into barotropic and baroclinic modes and explore their evolution. We find that a significant fraction of water column KE is in the barotropic and the first two baroclinic modes. We explain this energy partitioning by quantifying the energy transfer coefficients between the vertical modes using the quasi‐geostrophic potential vorticity conservation equations with a specific background stratification observed in the Beaufort Gyre. We find that the quasi‐geostrophic vertical mode interactions uphold the persistence of KE in the first two baroclinic modes, consistent with observations. Our results explain the specific role of halocline structure on KE evolution in the gyre and suggest depressed transfer to the barotropic mode. This limits the capacity for frictional dissipation at the sea floor and suggests that energy dissipation via sea ice‐ocean drag may be prominent
Mitigating Shortcuts in Language Models with Soft Label Encoding
Recent research has shown that large language models rely on spurious
correlations in the data for natural language understanding (NLU) tasks. In
this work, we aim to answer the following research question: Can we reduce
spurious correlations by modifying the ground truth labels of the training
data? Specifically, we propose a simple yet effective debiasing framework,
named Soft Label Encoding (SoftLE). We first train a teacher model with hard
labels to determine each sample's degree of relying on shortcuts. We then add
one dummy class to encode the shortcut degree, which is used to smooth other
dimensions in the ground truth label to generate soft labels. This new ground
truth label is used to train a more robust student model. Extensive experiments
on two NLU benchmark tasks demonstrate that SoftLE significantly improves
out-of-distribution generalization while maintaining satisfactory
in-distribution accuracy
Partitioning of kinetic energy in the Arctic Ocean's Beaufort Gyre
Author Posting. © American Geophysical Union, 2018. This article is posted here by permission of American Geophysical Union for personal use, not for redistribution. The definitive version was published in Journal of Geophysical Research: Oceans 123 (2018): 4806-4819, doi:10.1029/2018JC014037.Kinetic energy (KE) in the Arctic Ocean's Beaufort Gyre is dominated by the mesoscale eddy field that plays a central role in the transport of freshwater, heat, and biogeochemical tracers. Understanding Beaufort Gyre KE variability sheds light on how this freshwater reservoir responds to wind forcing and sea ice and ocean changes. The evolution and fate of mesoscale eddies relate to energy pathways in the ocean (e.g., the exchange of energy between barotropic and baroclinic modes). Mooring measurements of horizontal velocities in the Beaufort Gyre are analyzed to partition KE into barotropic and baroclinic modes and explore their evolution. We find that a significant fraction of water column KE is in the barotropic and the first two baroclinic modes. We explain this energy partitioning by quantifying the energy transfer coefficients between the vertical modes using the quasi‐geostrophic potential vorticity conservation equations with a specific background stratification observed in the Beaufort Gyre. We find that the quasi‐geostrophic vertical mode interactions uphold the persistence of KE in the first two baroclinic modes, consistent with observations. Our results explain the specific role of halocline structure on KE evolution in the gyre and suggest depressed transfer to the barotropic mode. This limits the capacity for frictional dissipation at the sea floor and suggests that energy dissipation via sea ice‐ocean drag may be prominent.National Science Foundation Division of Polar Programs Grant Number: 11076232019-01-1
Sparse Recovery over Graph Incidence Matrices
Classical results in sparse recovery guarantee the exact reconstruction of
-sparse signals under assumptions on the dictionary that are either too
strong or NP-hard to check. Moreover, such results may be pessimistic in
practice since they are based on a worst-case analysis. In this paper, we
consider the sparse recovery of signals defined over a graph, for which the
dictionary takes the form of an incidence matrix. We derive necessary and
sufficient conditions for sparse recovery, which depend on properties of the
cycles of the graph that can be checked in polynomial time. We also derive
support-dependent conditions for sparse recovery that depend only on the
intersection of the cycles of the graph with the support of the signal.
Finally, we exploit sparsity properties on the measurements and the structure
of incidence matrices to propose a specialized sub-graph-based recovery
algorithm that outperforms the standard -minimization approach.Comment: Accepted to 57th IEEE Conference on Decision and Contro
- …