171 research outputs found
Dataset Distillation: A Comprehensive Review
Recent success of deep learning is largely attributed to the sheer amount of
data used for training deep neural networks.Despite the unprecedented success,
the massive data, unfortunately, significantly increases the burden on storage
and transmission and further gives rise to a cumbersome model training process.
Besides, relying on the raw data for training \emph{per se} yields concerns
about privacy and copyright. To alleviate these shortcomings, dataset
distillation~(DD), also known as dataset condensation (DC), was introduced and
has recently attracted much research attention in the community. Given an
original dataset, DD aims to derive a much smaller dataset containing synthetic
samples, based on which the trained models yield performance comparable with
those trained on the original dataset. In this paper, we give a comprehensive
review and summary of recent advances in DD and its application. We first
introduce the task formally and propose an overall algorithmic framework
followed by all existing DD methods. Next, we provide a systematic taxonomy of
current methodologies in this area, and discuss their theoretical
interconnections. We also present current challenges in DD through extensive
experiments and envision possible directions for future works.Comment: 23 pages, 168 references, 8 figures, under revie
Overcoming Catastrophic Forgetting in Graph Neural Networks
Catastrophic forgetting refers to the tendency that a neural network
"forgets" the previous learned knowledge upon learning new tasks. Prior methods
have been focused on overcoming this problem on convolutional neural networks
(CNNs), where the input samples like images lie in a grid domain, but have
largely overlooked graph neural networks (GNNs) that handle non-grid data. In
this paper, we propose a novel scheme dedicated to overcoming catastrophic
forgetting problem and hence strengthen continual learning in GNNs. At the
heart of our approach is a generic module, termed as topology-aware weight
preserving~(TWP), applicable to arbitrary form of GNNs in a plug-and-play
fashion. Unlike the main stream of CNN-based continual learning methods that
rely on solely slowing down the updates of parameters important to the
downstream task, TWP explicitly explores the local structures of the input
graph, and attempts to stabilize the parameters playing pivotal roles in the
topological aggregation. We evaluate TWP on different GNN backbones over
several datasets, and demonstrate that it yields performances superior to the
state of the art. Code is publicly available at
\url{https://github.com/hhliu79/TWP}.Comment: Accepted by AAAI 202
DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation
One key challenge of exemplar-guided image generation lies in establishing
fine-grained correspondences between input and guided images. Prior approaches,
despite the promising results, have relied on either estimating dense attention
to compute per-point matching, which is limited to only coarse scales due to
the quadratic memory cost, or fixing the number of correspondences to achieve
linear complexity, which lacks flexibility. In this paper, we propose a dynamic
sparse attention based Transformer model, termed Dynamic Sparse Transformer
(DynaST), to achieve fine-level matching with favorable efficiency. The heart
of our approach is a novel dynamic-attention unit, dedicated to covering the
variation on the optimal number of tokens one position should focus on.
Specifically, DynaST leverages the multi-layer nature of Transformer structure,
and performs the dynamic attention scheme in a cascaded manner to refine
matching results and synthesize visually-pleasing outputs. In addition, we
introduce a unified training objective for DynaST, making it a versatile
reference-based image translation framework for both supervised and
unsupervised scenarios. Extensive experiments on three applications,
pose-guided person image generation, edge-based face synthesis, and undistorted
image style transfer, demonstrate that DynaST achieves superior performance in
local details, outperforming the state of the art while reducing the
computational cost significantly. Our code is available at
https://github.com/Huage001/DynaSTComment: ECCV 202
An Optimized Method for Terrain Reconstruction Based on Descent Images
An optimization method is proposed to perform high-accuracy terrain reconstruction of the landing area of Chang'e III. First, feature matching is conducted using geometric model constraints. Then, the initial terrain is obtained and the initial normal vector of each point is solved on the basis of the initial terrain. By changing the vector around the initial normal vector in small steps a set of new vectors is obtained. By combining these vectors with the direction of light and camera, the functions are set up on the basis of a surface reflection model. Then, a series of gray values is derived by solving the equations. The new optimized vector is recorded when the obtained gray value is closest to the corresponding pixel. Finally, the optimized terrain is obtained after iteration of the vector field. Experiments were conducted using the laboratory images and descent images of Chang'e III. The results showed that the performance of the proposed method was better than that of the classical feature matching method. It can provide a reference for terrain reconstruction of the landing area in subsequent moon exploration missions
Mutual-modality Adversarial Attack with Semantic Perturbation
Adversarial attacks constitute a notable threat to machine learning systems,
given their potential to induce erroneous predictions and classifications.
However, within real-world contexts, the essential specifics of the deployed
model are frequently treated as a black box, consequently mitigating the
vulnerability to such attacks. Thus, enhancing the transferability of the
adversarial samples has become a crucial area of research, which heavily relies
on selecting appropriate surrogate models. To address this challenge, we
propose a novel approach that generates adversarial attacks in a
mutual-modality optimization scheme. Our approach is accomplished by leveraging
the pre-trained CLIP model. Firstly, we conduct a visual attack on the clean
image that causes semantic perturbations on the aligned embedding space with
the other textual modality. Then, we apply the corresponding defense on the
textual modality by updating the prompts, which forces the re-matching on the
perturbed embedding space. Finally, to enhance the attack transferability, we
utilize the iterative training strategy on the visual attack and the textual
defense, where the two processes optimize from each other. We evaluate our
approach on several benchmark datasets and demonstrate that our mutual-modal
attack strategy can effectively produce high-transferable attacks, which are
stable regardless of the target networks. Our approach outperforms
state-of-the-art attack methods and can be readily deployed as a plug-and-play
solution.Comment: Accepted by AAAI202
Optimal Sensor Allocation with Multiple Linear Dispersion Processes
This paper considers the optimal sensor allocation for estimating the
emission rates of multiple sources in a two-dimensional spatial domain.
Locations of potential emission sources are known (e.g., factory stacks), and
the number of sources is much greater than the number of sensors that can be
deployed, giving rise to the optimal sensor allocation problem. In particular,
we consider linear dispersion forward models, and the optimal sensor allocation
is formulated as a bilevel optimization problem. The outer problem determines
the optimal sensor locations by minimizing the overall Mean Squared Error of
the estimated emission rates over various wind conditions, while the inner
problem solves an inverse problem that estimates the emission rates. Two
algorithms, including the repeated Sample Average Approximation and the
Stochastic Gradient Descent based bilevel approximation, are investigated in
solving the sensor allocation problem. Convergence analysis is performed to
obtain the performance guarantee, and numerical examples are presented to
illustrate the proposed approach
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
We propose MM-Vet, an evaluation benchmark that examines large multimodal
models (LMMs) on complicated multimodal tasks. Recent LMMs have shown various
intriguing abilities, such as solving math problems written on the blackboard,
reasoning about events and celebrities in news images, and explaining visual
jokes. Rapid model advancements pose challenges to evaluation benchmark
development. Problems include: (1) How to systematically structure and evaluate
the complicated multimodal tasks; (2) How to design evaluation metrics that
work well across question and answer types; and (3) How to give model insights
beyond a simple performance ranking. To this end, we present MM-Vet, designed
based on the insight that the intriguing ability to solve complicated tasks is
often achieved by a generalist model being able to integrate different core
vision-language (VL) capabilities. MM-Vet defines 6 core VL capabilities and
examines the 16 integrations of interest derived from the capability
combination. For evaluation metrics, we propose an LLM-based evaluator for
open-ended outputs. The evaluator enables the evaluation across different
question types and answer styles, resulting in a unified scoring metric. We
evaluate representative LMMs on MM-Vet, providing insights into the
capabilities of different LMM system paradigms and models. Code and data are
available at https://github.com/yuweihao/MM-Vet.Comment: Code and data: https://github.com/yuweihao/MM-Ve
- …