171 research outputs found

    Dataset Distillation: A Comprehensive Review

    Full text link
    Recent success of deep learning is largely attributed to the sheer amount of data used for training deep neural networks.Despite the unprecedented success, the massive data, unfortunately, significantly increases the burden on storage and transmission and further gives rise to a cumbersome model training process. Besides, relying on the raw data for training \emph{per se} yields concerns about privacy and copyright. To alleviate these shortcomings, dataset distillation~(DD), also known as dataset condensation (DC), was introduced and has recently attracted much research attention in the community. Given an original dataset, DD aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset. In this paper, we give a comprehensive review and summary of recent advances in DD and its application. We first introduce the task formally and propose an overall algorithmic framework followed by all existing DD methods. Next, we provide a systematic taxonomy of current methodologies in this area, and discuss their theoretical interconnections. We also present current challenges in DD through extensive experiments and envision possible directions for future works.Comment: 23 pages, 168 references, 8 figures, under revie

    Overcoming Catastrophic Forgetting in Graph Neural Networks

    Full text link
    Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks. Prior methods have been focused on overcoming this problem on convolutional neural networks (CNNs), where the input samples like images lie in a grid domain, but have largely overlooked graph neural networks (GNNs) that handle non-grid data. In this paper, we propose a novel scheme dedicated to overcoming catastrophic forgetting problem and hence strengthen continual learning in GNNs. At the heart of our approach is a generic module, termed as topology-aware weight preserving~(TWP), applicable to arbitrary form of GNNs in a plug-and-play fashion. Unlike the main stream of CNN-based continual learning methods that rely on solely slowing down the updates of parameters important to the downstream task, TWP explicitly explores the local structures of the input graph, and attempts to stabilize the parameters playing pivotal roles in the topological aggregation. We evaluate TWP on different GNN backbones over several datasets, and demonstrate that it yields performances superior to the state of the art. Code is publicly available at \url{https://github.com/hhliu79/TWP}.Comment: Accepted by AAAI 202

    DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

    Full text link
    One key challenge of exemplar-guided image generation lies in establishing fine-grained correspondences between input and guided images. Prior approaches, despite the promising results, have relied on either estimating dense attention to compute per-point matching, which is limited to only coarse scales due to the quadratic memory cost, or fixing the number of correspondences to achieve linear complexity, which lacks flexibility. In this paper, we propose a dynamic sparse attention based Transformer model, termed Dynamic Sparse Transformer (DynaST), to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Specifically, DynaST leverages the multi-layer nature of Transformer structure, and performs the dynamic attention scheme in a cascaded manner to refine matching results and synthesize visually-pleasing outputs. In addition, we introduce a unified training objective for DynaST, making it a versatile reference-based image translation framework for both supervised and unsupervised scenarios. Extensive experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details, outperforming the state of the art while reducing the computational cost significantly. Our code is available at https://github.com/Huage001/DynaSTComment: ECCV 202

    An Optimized Method for Terrain Reconstruction Based on Descent Images

    Get PDF
    An optimization method is proposed to perform high-accuracy terrain reconstruction of the landing area of Chang'e III. First, feature matching is conducted using geometric model constraints. Then, the initial terrain is obtained and the initial normal vector of each point is solved on the basis of the initial terrain. By changing the vector around the initial normal vector in small steps a set of new vectors is obtained. By combining these vectors with the direction of light and camera, the functions are set up on the basis of a surface reflection model. Then, a series of gray values is derived by solving the equations. The new optimized vector is recorded when the obtained gray value is closest to the corresponding pixel. Finally, the optimized terrain is obtained after iteration of the vector field. Experiments were conducted using the laboratory images and descent images of Chang'e III. The results showed that the performance of the proposed method was better than that of the classical feature matching method. It can provide a reference for terrain reconstruction of the landing area in subsequent moon exploration missions

    Mutual-modality Adversarial Attack with Semantic Perturbation

    Full text link
    Adversarial attacks constitute a notable threat to machine learning systems, given their potential to induce erroneous predictions and classifications. However, within real-world contexts, the essential specifics of the deployed model are frequently treated as a black box, consequently mitigating the vulnerability to such attacks. Thus, enhancing the transferability of the adversarial samples has become a crucial area of research, which heavily relies on selecting appropriate surrogate models. To address this challenge, we propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme. Our approach is accomplished by leveraging the pre-trained CLIP model. Firstly, we conduct a visual attack on the clean image that causes semantic perturbations on the aligned embedding space with the other textual modality. Then, we apply the corresponding defense on the textual modality by updating the prompts, which forces the re-matching on the perturbed embedding space. Finally, to enhance the attack transferability, we utilize the iterative training strategy on the visual attack and the textual defense, where the two processes optimize from each other. We evaluate our approach on several benchmark datasets and demonstrate that our mutual-modal attack strategy can effectively produce high-transferable attacks, which are stable regardless of the target networks. Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.Comment: Accepted by AAAI202

    Optimal Sensor Allocation with Multiple Linear Dispersion Processes

    Full text link
    This paper considers the optimal sensor allocation for estimating the emission rates of multiple sources in a two-dimensional spatial domain. Locations of potential emission sources are known (e.g., factory stacks), and the number of sources is much greater than the number of sensors that can be deployed, giving rise to the optimal sensor allocation problem. In particular, we consider linear dispersion forward models, and the optimal sensor allocation is formulated as a bilevel optimization problem. The outer problem determines the optimal sensor locations by minimizing the overall Mean Squared Error of the estimated emission rates over various wind conditions, while the inner problem solves an inverse problem that estimates the emission rates. Two algorithms, including the repeated Sample Average Approximation and the Stochastic Gradient Descent based bilevel approximation, are investigated in solving the sensor allocation problem. Convergence analysis is performed to obtain the performance guarantee, and numerical examples are presented to illustrate the proposed approach

    MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

    Full text link
    We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing abilities, such as solving math problems written on the blackboard, reasoning about events and celebrities in news images, and explaining visual jokes. Rapid model advancements pose challenges to evaluation benchmark development. Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking. To this end, we present MM-Vet, designed based on the insight that the intriguing ability to solve complicated tasks is often achieved by a generalist model being able to integrate different core vision-language (VL) capabilities. MM-Vet defines 6 core VL capabilities and examines the 16 integrations of interest derived from the capability combination. For evaluation metrics, we propose an LLM-based evaluator for open-ended outputs. The evaluator enables the evaluation across different question types and answer styles, resulting in a unified scoring metric. We evaluate representative LMMs on MM-Vet, providing insights into the capabilities of different LMM system paradigms and models. Code and data are available at https://github.com/yuweihao/MM-Vet.Comment: Code and data: https://github.com/yuweihao/MM-Ve
    • …
    corecore