106 research outputs found
On the Over-Memorization During Natural, Robust and Catastrophic Overfitting
Overfitting negatively impacts the generalization ability of deep neural
networks (DNNs) in both natural and adversarial training. Existing methods
struggle to consistently address different types of overfitting, typically
designing strategies that focus separately on either natural or adversarial
patterns. In this work, we adopt a unified perspective by solely focusing on
natural patterns to explore different types of overfitting. Specifically, we
examine the memorization effect in DNNs and reveal a shared behaviour termed
over-memorization, which impairs their generalization capacity. This behaviour
manifests as DNNs suddenly becoming high-confidence in predicting certain
training patterns and retaining a persistent memory for them. Furthermore, when
DNNs over-memorize an adversarial pattern, they tend to simultaneously exhibit
high-confidence prediction for the corresponding natural pattern. These
findings motivate us to holistically mitigate different types of overfitting by
hindering the DNNs from over-memorization natural patterns. To this end, we
propose a general framework, Distraction Over-Memorization (DOM), which
explicitly prevents over-memorization by either removing or augmenting the
high-confidence natural patterns. Extensive experiments demonstrate the
effectiveness of our proposed method in mitigating overfitting across various
training paradigms
Strength-Adaptive Adversarial Training
Adversarial training (AT) is proved to reliably improve network's robustness
against adversarial data. However, current AT with a pre-specified perturbation
budget has limitations in learning a robust network. Firstly, applying a
pre-specified perturbation budget on networks of various model capacities will
yield divergent degree of robustness disparity between natural and robust
accuracies, which deviates from robust network's desideratum. Secondly, the
attack strength of adversarial training data constrained by the pre-specified
perturbation budget fails to upgrade as the growth of network robustness, which
leads to robust overfitting and further degrades the adversarial robustness. To
overcome these limitations, we propose \emph{Strength-Adaptive Adversarial
Training} (SAAT). Specifically, the adversary employs an adversarial loss
constraint to generate adversarial training data. Under this constraint, the
perturbation budget will be adaptively adjusted according to the training state
of adversarial data, which can effectively avoid robust overfitting. Besides,
SAAT explicitly constrains the attack strength of training data through the
adversarial loss, which manipulates model capacity scheduling during training,
and thereby can flexibly control the degree of robustness disparity and adjust
the tradeoff between natural accuracy and robustness. Extensive experiments
show that our proposal boosts the robustness of adversarial training
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
The remarkable capabilities and intricate nature of Artificial Intelligence
(AI) have dramatically escalated the imperative for specialized AI
accelerators. Nonetheless, designing these accelerators for various AI
workloads remains both labor- and time-intensive. While existing design
exploration and automation tools can partially alleviate the need for extensive
human involvement, they still demand substantial hardware expertise, posing a
barrier to non-experts and stifling AI accelerator development. Motivated by
the astonishing potential of large language models (LLMs) for generating
high-quality content in response to human language instructions, we embark on
this work to examine the possibility of harnessing LLMs to automate AI
accelerator design. Through this endeavor, we develop GPT4AIGChip, a framework
intended to democratize AI accelerator design by leveraging human natural
languages instead of domain-specific languages. Specifically, we first perform
an in-depth investigation into LLMs' limitations and capabilities for AI
accelerator design, thus aiding our understanding of our current position and
garnering insights into LLM-powered automated AI accelerator design.
Furthermore, drawing inspiration from the above insights, we develop a
framework called GPT4AIGChip, which features an automated demo-augmented
prompt-generation pipeline utilizing in-context learning to guide LLMs towards
creating high-quality AI accelerator design. To our knowledge, this work is the
first to demonstrate an effective pipeline for LLM-powered automated AI
accelerator generation. Accordingly, we anticipate that our insights and
framework can serve as a catalyst for innovations in next-generation
LLM-powered design automation tools.Comment: Accepted by ICCAD 202
- …