34 research outputs found

    Addressing Negative Transfer in Diffusion Models

    Full text link
    Diffusion-based generative models have achieved remarkable success in various domains. It trains a model on denoising tasks that encompass different noise levels simultaneously, representing a form of multi-task learning (MTL). However, analyzing and improving diffusion models from an MTL perspective remains under-explored. In particular, MTL can sometimes lead to the well-known phenomenon of negative transfer\textit{negative transfer}, which results in the performance degradation of certain tasks due to conflicts between tasks. In this paper, we aim to analyze diffusion training from an MTL standpoint, presenting two key observations: (O1)\textbf{(O1)} the task affinity between denoising tasks diminishes as the gap between noise levels widens, and (O2)\textbf{(O2)} negative transfer can arise even in the context of diffusion training. Building upon these observations, our objective is to enhance diffusion training by mitigating negative transfer. To achieve this, we propose leveraging existing MTL methods, but the presence of a huge number of denoising tasks makes this computationally expensive to calculate the necessary per-task loss or gradient. To address this challenge, we propose clustering the denoising tasks into small task clusters and applying MTL methods to them. Specifically, based on (O2)\textbf{(O2)}, we employ interval clustering to enforce temporal proximity among denoising tasks within clusters. We show that interval clustering can be solved with dynamic programming and utilize signal-to-noise ratio, timestep, and task affinity for clustering objectives. Through this, our approach addresses the issue of negative transfer in diffusion models by allowing for efficient computation of MTL methods. We validate the proposed clustering and its integration with MTL methods through various experiments, demonstrating improved sample quality of diffusion models.Comment: 22 pages, 12 figures, under revie

    Multi-Architecture Multi-Expert Diffusion Models

    Full text link
    Diffusion models have achieved impressive results in generating diverse and realistic data by employing multi-step denoising processes. However, the need for accommodating significant variations in input noise at each time-step has led to diffusion models requiring a large number of parameters for their denoisers. We have observed that diffusion models effectively act as filters for different frequency ranges at each time-step noise. While some previous works have introduced multi-expert strategies, assigning denoisers to different noise intervals, they overlook the importance of specialized operations for high and low frequencies. For instance, self-attention operations are effective at handling low-frequency components (low-pass filters), while convolutions excel at capturing high-frequency features (high-pass filters). In other words, existing diffusion models employ denoisers with the same architecture, without considering the optimal operations for each time-step noise. To address this limitation, we propose a novel approach called Multi-architecturE Multi-Expert (MEME), which consists of multiple experts with specialized architectures tailored to the operations required at each time-step interval. Through extensive experiments, we demonstrate that MEME outperforms large competitors in terms of both generation performance and computational efficiency

    Towards Practical Plug-and-Play Diffusion Models

    Full text link
    Diffusion-based generative models have achieved remarkable success in image generation. Their guidance formulation allows an external model to plug-and-play control the generation process for various tasks without fine-tuning the diffusion model. However, the direct use of publicly available off-the-shelf models for guidance fails due to their poor performance on noisy inputs. For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single model; (2) collecting labeled datasets hinders scaling up for various tasks. To tackle the limitations, we propose a novel strategy that leverages multiple experts where each expert is specialized in a particular noise range and guides the reverse process at its corresponding timesteps. However, as it is infeasible to manage multiple networks and utilize labeled data, we present a practical guidance framework termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet class conditional generation experiments to show that our method can successfully guide diffusion with small trainable parameters and no labeled data. Finally, we show that image classifiers, depth estimators, and semantic segmentation models can guide publicly available GLIDE through our framework in a plug-and-play manner

    READSUM: Retrieval-Augmented Adaptive Transformer for Source Code Summarization

    No full text
    Code summarization is the process of automatically generating brief and informative summaries of source code to aid in software comprehension and maintenance. In this paper, we propose a novel model called READSUM, REtrieval-augmented ADaptive transformer for source code SUMmarization, that combines both abstractive and extractive approaches. Our proposed model generates code summaries in an abstractive manner, taking into account both the structural and sequential information of the input code, while also utilizing an extractive approach that leverages a retrieved summary of similar code to increase the frequency of important keywords. To effectively blend the original code and the retrieved similar code at the embedding layer stage, we obtain the augmented representation of the original code and the retrieved code through multi-head self-attention. In addition, we develop a self-attention network that adaptively learns the structural and sequential information for the representations in the encoder stage. Furthermore, we design a fusion network to capture the relation between the original code and the retrieved summary at the decoder stage. The fusion network effectively guides summary generation based on the retrieved summary. Finally, READSUM extracts important keywords using an extractive approach and generates high-quality summaries using an abstractive approach that considers both the structural and sequential information of the source code. We demonstrate the superiority of READSUM through various experiments and an ablation study. Additionally, we perform a human evaluation to assess the quality of the generated summary

    A Low-Jitter 8-GHz RO-Based ADPLL With PVT-Robust Replica-Based Analog Closed Loop for Supply Noise Compensation

    No full text
    IEEEThis article presents a ring oscillator (RO)-based all-digital phase-locked loop (ADPLL) that is implemented with a high-gain analog closed loop for supply noise compensation (ACSC). The ACSC not only allows high-frequency oscillation of the RO but also is robust over process, voltage, and temperature (PVT) variations thanks to its replica-based configuration. Moreover, a comprehensive analysis of the noise contribution of the ACSC is conducted for the ADPLL to retain its low-jitter output. Implemented in 40-nm CMOS technology, the ADPLL, with a 1.1-V supply, achieves an rms jitter of 289 fs at 8 GHz without any injected supply noise. Under a 20-m <formula> <tex>Vrms{{V}_{rms}}</tex> </formula> white supply noise, the ADPLL gives rms jitters of 8.7 and 0.63 ps at 8 GHz when the ACSC is disabled and enabled, respectively. The overall power consumption and the area of the presented ADPLL are 9.48 mW and 0.055 mm², respectively.N

    Development of integrated prompt gamma imaging and positron emission tomography system for in vivo 3-D dose verification: a Monte Carlo study

    No full text
    An accurate knowledge of in vivo proton dose distribution is key to fully utilizing the potential advantages of proton therapy. Two representative indirect methods for in vivo range verification, namely, prompt gamma (PG) imaging and positron emission tomography (PET), are available. This study proposes a PG-PET system that combines the advantages of these two methods and presents detector geometry and background reduction techniques optimized for the PG-PET system. The characteristics of the secondary radiations emitted by a water phantom by interaction with a 150 MeV proton beam were analysed using Geant4.10.00, and the 2-D PG distributions were obtained and assessed for different detector geometries. In addition, the energy window (EW), depth-of-interaction (DOI), and time-of-flight (TOF) techniques are proposed as the background reduction techniques. To evaluate the performance of the PG-PET system, the 3-D dose distribution in the water phantom caused by two proton beams of energies 80 MeV and 100 MeV was verified using 16 optimal detectors. The thickness of the parallel-hole tungsten collimator of pitch 8 mm and width 7 mm was determined as 200 mm, and that of the GAGG scintillator was determined as 30 mm, by an optimization study. Further, 3–7 MeV and 2–7 MeV were obtained as the optimal EWs when the DOI and both the DOI and TOF techniques were applied for data processing, respectively; the detector performances were improved by about 38% and 167%, respectively, compared with that when applying only the 3–5 MeV EW. In this study, we confirmed that the PG distribution can be obtained by simply combining the 2-D parallel hole collimator and the PET detector module. In the future, we will develop an accurate 3-D dose evaluation technique using deep learning algorithms based on the image sets of dose, PG, and PET distributions for various proton energies

    Investigation on Quality Prediction Algorithm in Ultrasonic Metal Welding for Multilayered Cu for Battery Cells

    No full text
    This study focuses on common batteries used in electric vehicles, which are composed of cells grouped into modules and stacked to form a battery pack. Ultrasonic metal welding (UMW) is employed to bond these cells, and ensuring a reliable weld quality by inspection is crucial for maintaining the performance and stability of batteries. Currently, the quality of UMW in battery cells is assessed through sample unit-based T-peel tests and visual inspection, methods that suffer from reduced productivity and increased costs due to their destructive nature. This research introduces an algorithm designed to predict weld quality by analyzing welding process signals of the UMW process, specifically the bonding between 8- μm\mu \text{m} -thick Cu foil and 0.2-mm-thick nickel-plated copper strip materials used in battery cell manufacturing. To achieve quality prediction, current and voltage signals from the welder, as well as the displacement signal of the welded part are used. A UMW system was constructed, incorporating a current sensor, a voltage sensor, and a linear variable displacement transducer (LVDT) for measuring these signals. The study further derives welding energy from the current and voltage signals, analyzes changes in the behavior of the sonotrode during welding using the LVDT sensor, and employs data analysis to derive feature variables. These variables are then used in a machine learning-based classification model. Ultimately, the study develops and evaluates a support vector machine (SVM)-based algorithm for real-time UMW quality determination. The algorithm achieves a high classification accuracy of 98%, showcasing its effectiveness in predicting the quality of ultrasonic metal welding in the battery manufacturing process
    corecore