142 research outputs found

    Infusing Hierarchical Guidance into Prompt Tuning: A Parameter-Efficient Framework for Multi-level Implicit Discourse Relation Recognition

    Full text link
    Multi-level implicit discourse relation recognition (MIDRR) aims at identifying hierarchical discourse relations among arguments. Previous methods achieve the promotion through fine-tuning PLMs. However, due to the data scarcity and the task gap, the pre-trained feature space cannot be accurately tuned to the task-specific space, which even aggravates the collapse of the vanilla space. Besides, the comprehension of hierarchical semantics for MIDRR makes the conversion much harder. In this paper, we propose a prompt-based Parameter-Efficient Multi-level IDRR (PEMI) framework to solve the above problems. First, we leverage parameter-efficient prompt tuning to drive the inputted arguments to match the pre-trained space and realize the approximation with few parameters. Furthermore, we propose a hierarchical label refining (HLR) method for the prompt verbalizer to deeply integrate hierarchical guidance into the prompt tuning. Finally, our model achieves comparable results on PDTB 2.0 and 3.0 using about 0.1% trainable parameters compared with baselines and the visualization demonstrates the effectiveness of our HLR method.Comment: accepted to ACL 202

    Fast Adversarial Training with Smooth Convergence

    Full text link
    Fast adversarial training (FAT) is beneficial for improving the adversarial robustness of neural networks. However, previous FAT work has encountered a significant issue known as catastrophic overfitting when dealing with large perturbation budgets, \ie the adversarial robustness of models declines to near zero during training. To address this, we analyze the training process of prior FAT work and observe that catastrophic overfitting is accompanied by the appearance of loss convergence outliers. Therefore, we argue a moderately smooth loss convergence process will be a stable FAT process that solves catastrophic overfitting. To obtain a smooth loss convergence process, we propose a novel oscillatory constraint (dubbed ConvergeSmooth) to limit the loss difference between adjacent epochs. The convergence stride of ConvergeSmooth is introduced to balance convergence and smoothing. Likewise, we design weight centralization without introducing additional hyperparameters other than the loss balance coefficient. Our proposed methods are attack-agnostic and thus can improve the training stability of various FAT techniques. Extensive experiments on popular datasets show that the proposed methods efficiently avoid catastrophic overfitting and outperform all previous FAT methods. Code is available at \url{https://github.com/FAT-CS/ConvergeSmooth}

    Catastrophic Overfitting: A Potential Blessing in Disguise

    Full text link
    Fast Adversarial Training (FAT) has gained increasing attention within the research community owing to its efficacy in improving adversarial robustness. Particularly noteworthy is the challenge posed by catastrophic overfitting (CO) in this field. Although existing FAT approaches have made strides in mitigating CO, the ascent of adversarial robustness occurs with a non-negligible decline in classification accuracy on clean samples. To tackle this issue, we initially employ the feature activation differences between clean and adversarial examples to analyze the underlying causes of CO. Intriguingly, our findings reveal that CO can be attributed to the feature coverage induced by a few specific pathways. By intentionally manipulating feature activation differences in these pathways with well-designed regularization terms, we can effectively mitigate and induce CO, providing further evidence for this observation. Notably, models trained stably with these terms exhibit superior performance compared to prior FAT work. On this basis, we harness CO to achieve `attack obfuscation', aiming to bolster model performance. Consequently, the models suffering from CO can attain optimal classification accuracy on both clean and adversarial data when adding random noise to inputs during evaluation. We also validate their robustness against transferred adversarial examples and the necessity of inducing CO to improve robustness. Hence, CO may not be a problem that has to be solved

    Partitioning of Kinetic Energy in the Arctic Ocean's Beaufort Gyre

    Get PDF
    Kinetic energy (KE) in the Arctic Ocean's Beaufort Gyre is dominated by the mesoscale eddy field that plays a central role in the transport of freshwater, heat, and biogeochemical tracers. Understanding Beaufort Gyre KE variability sheds light on how this freshwater reservoir responds to wind forcing and sea ice and ocean changes. The evolution and fate of mesoscale eddies relate to energy pathways in the ocean (e.g., the exchange of energy between barotropic and baroclinic modes). Mooring measurements of horizontal velocities in the Beaufort Gyre are analyzed to partition KE into barotropic and baroclinic modes and explore their evolution. We find that a significant fraction of water column KE is in the barotropic and the first two baroclinic modes. We explain this energy partitioning by quantifying the energy transfer coefficients between the vertical modes using the quasi‐geostrophic potential vorticity conservation equations with a specific background stratification observed in the Beaufort Gyre. We find that the quasi‐geostrophic vertical mode interactions uphold the persistence of KE in the first two baroclinic modes, consistent with observations. Our results explain the specific role of halocline structure on KE evolution in the gyre and suggest depressed transfer to the barotropic mode. This limits the capacity for frictional dissipation at the sea floor and suggests that energy dissipation via sea ice‐ocean drag may be prominent

    Mitigating Shortcuts in Language Models with Soft Label Encoding

    Full text link
    Recent research has shown that large language models rely on spurious correlations in the data for natural language understanding (NLU) tasks. In this work, we aim to answer the following research question: Can we reduce spurious correlations by modifying the ground truth labels of the training data? Specifically, we propose a simple yet effective debiasing framework, named Soft Label Encoding (SoftLE). We first train a teacher model with hard labels to determine each sample's degree of relying on shortcuts. We then add one dummy class to encode the shortcut degree, which is used to smooth other dimensions in the ground truth label to generate soft labels. This new ground truth label is used to train a more robust student model. Extensive experiments on two NLU benchmark tasks demonstrate that SoftLE significantly improves out-of-distribution generalization while maintaining satisfactory in-distribution accuracy

    Partitioning of kinetic energy in the Arctic Ocean's Beaufort Gyre

    Get PDF
    Author Posting. © American Geophysical Union, 2018. This article is posted here by permission of American Geophysical Union for personal use, not for redistribution. The definitive version was published in Journal of Geophysical Research: Oceans 123 (2018): 4806-4819, doi:10.1029/2018JC014037.Kinetic energy (KE) in the Arctic Ocean's Beaufort Gyre is dominated by the mesoscale eddy field that plays a central role in the transport of freshwater, heat, and biogeochemical tracers. Understanding Beaufort Gyre KE variability sheds light on how this freshwater reservoir responds to wind forcing and sea ice and ocean changes. The evolution and fate of mesoscale eddies relate to energy pathways in the ocean (e.g., the exchange of energy between barotropic and baroclinic modes). Mooring measurements of horizontal velocities in the Beaufort Gyre are analyzed to partition KE into barotropic and baroclinic modes and explore their evolution. We find that a significant fraction of water column KE is in the barotropic and the first two baroclinic modes. We explain this energy partitioning by quantifying the energy transfer coefficients between the vertical modes using the quasi‐geostrophic potential vorticity conservation equations with a specific background stratification observed in the Beaufort Gyre. We find that the quasi‐geostrophic vertical mode interactions uphold the persistence of KE in the first two baroclinic modes, consistent with observations. Our results explain the specific role of halocline structure on KE evolution in the gyre and suggest depressed transfer to the barotropic mode. This limits the capacity for frictional dissipation at the sea floor and suggests that energy dissipation via sea ice‐ocean drag may be prominent.National Science Foundation Division of Polar Programs Grant Number: 11076232019-01-1

    Sparse Recovery over Graph Incidence Matrices

    Full text link
    Classical results in sparse recovery guarantee the exact reconstruction of ss-sparse signals under assumptions on the dictionary that are either too strong or NP-hard to check. Moreover, such results may be pessimistic in practice since they are based on a worst-case analysis. In this paper, we consider the sparse recovery of signals defined over a graph, for which the dictionary takes the form of an incidence matrix. We derive necessary and sufficient conditions for sparse recovery, which depend on properties of the cycles of the graph that can be checked in polynomial time. We also derive support-dependent conditions for sparse recovery that depend only on the intersection of the cycles of the graph with the support of the signal. Finally, we exploit sparsity properties on the measurements and the structure of incidence matrices to propose a specialized sub-graph-based recovery algorithm that outperforms the standard 1\ell_1-minimization approach.Comment: Accepted to 57th IEEE Conference on Decision and Contro
    corecore