212 research outputs found
DOF: Accelerating High-order Differential Operators with Forward Propagation
Solving partial differential equations (PDEs) efficiently is essential for
analyzing complex physical systems. Recent advancements in leveraging deep
learning for solving PDE have shown significant promise. However, machine
learning methods, such as Physics-Informed Neural Networks (PINN), face
challenges in handling high-order derivatives of neural network-parameterized
functions. Inspired by Forward Laplacian, a recent method of accelerating
Laplacian computation, we propose an efficient computational framework,
Differential Operator with Forward-propagation (DOF), for calculating general
second-order differential operators without losing any precision. We provide
rigorous proof of the advantages of our method over existing methods,
demonstrating two times improvement in efficiency and reduced memory
consumption on any architectures. Empirical results illustrate that our method
surpasses traditional automatic differentiation (AutoDiff) techniques,
achieving 2x improvement on the MLP structure and nearly 20x improvement on the
MLP with Jacobian sparsity
Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective
Recent studies have discovered that Chain-of-Thought prompting (CoT) can
dramatically improve the performance of Large Language Models (LLMs),
particularly when dealing with complex tasks involving mathematics or
reasoning. Despite the enormous empirical success, the underlying mechanisms
behind CoT and how it unlocks the potential of LLMs remain elusive. In this
paper, we take a first step towards theoretically answering these questions.
Specifically, we examine the capacity of LLMs with CoT in solving fundamental
mathematical and decision-making problems. We start by giving an impossibility
result showing that any bounded-depth Transformer cannot directly output
correct answers for basic arithmetic/equation tasks unless the model size grows
super-polynomially with respect to the input length. In contrast, we then prove
by construction that autoregressive Transformers of a constant size suffice to
solve both tasks by generating CoT derivations using a commonly-used math
language format. Moreover, we show LLMs with CoT are capable of solving a
general class of decision-making problems known as Dynamic Programming, thus
justifying its power in tackling complex real-world tasks. Finally, extensive
experiments on four tasks show that, while Transformers always fail to predict
the answers directly, they can consistently learn to generate correct solutions
step-by-step given sufficient CoT demonstrations.Comment: 33 page
Towards diluted magnetism in TaAs
Magnetism in Weyl semimetals is desired to investigate the interaction
between the magnetic moments and Weyl fermions, e.g. to explore anomalous
quantum Hall phenomena. Here we demonstrate that proton irradiation is an
effective tool to induce ferromagnetism in the Weyl semimetal TaAs. The
intrinsic magnetism is observed with a transition temperature above room
temperature. The magnetic moments from d states are found to be localized
around Ta atoms. Further, the first-principles calculations indicate that the d
states localized on the nearest-neighbor Ta atoms of As vacancy sites are
responsible for the observed magnetic moments and the long-ranged magnetic
order. The results show the feasibility of inducing ferromagnetism in Weyl
semimetals so that they may facilitate the applications of this material in
spintronics.Comment: 20 pages, 6 figure
MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection
Visual anomaly detection plays a crucial role in not only manufacturing
inspection to find defects of products during manufacturing processes, but also
maintenance inspection to keep equipment in optimum working condition
particularly outdoors. Due to the scarcity of the defective samples,
unsupervised anomaly detection has attracted great attention in recent years.
However, existing datasets for unsupervised anomaly detection are biased
towards manufacturing inspection, not considering maintenance inspection which
is usually conducted under outdoor uncontrolled environment such as varying
camera viewpoints, messy background and degradation of object surface after
long-term working. We focus on outdoor maintenance inspection and contribute a
comprehensive Maintenance Inspection Anomaly Detection (MIAD) dataset which
contains more than 100K high-resolution color images in various outdoor
industrial scenarios. This dataset is generated by a 3D graphics software and
covers both surface and logical anomalies with pixel-precise ground truth.
Extensive evaluations of representative algorithms for unsupervised anomaly
detection are conducted, and we expect MIAD and corresponding experimental
results can inspire research community in outdoor unsupervised anomaly
detection tasks. Worthwhile and related future work can be spawned from our new
dataset
Learning from Future: A Novel Self-Training Framework for Semantic Segmentation
Self-training has shown great potential in semi-supervised learning. Its core
idea is to use the model learned on labeled data to generate pseudo-labels for
unlabeled samples, and in turn teach itself. To obtain valid supervision,
active attempts typically employ a momentum teacher for pseudo-label prediction
yet observe the confirmation bias issue, where the incorrect predictions may
provide wrong supervision signals and get accumulated in the training process.
The primary cause of such a drawback is that the prevailing self-training
framework acts as guiding the current state with previous knowledge, because
the teacher is updated with the past student only. To alleviate this problem,
we propose a novel self-training strategy, which allows the model to learn from
the future. Concretely, at each training step, we first virtually optimize the
student (i.e., caching the gradients without applying them to the model
weights), then update the teacher with the virtual future student, and finally
ask the teacher to produce pseudo-labels for the current student as the
guidance. In this way, we manage to improve the quality of pseudo-labels and
thus boost the performance. We also develop two variants of our
future-self-training (FST) framework through peeping at the future both deeply
(FST-D) and widely (FST-W). Taking the tasks of unsupervised domain adaptive
semantic segmentation and semi-supervised semantic segmentation as the
instances, we experimentally demonstrate the effectiveness and superiority of
our approach under a wide range of settings. Code will be made publicly
available.Comment: Accepted to NeurIPS 202
- …