34 research outputs found
Addressing Negative Transfer in Diffusion Models
Diffusion-based generative models have achieved remarkable success in various
domains. It trains a model on denoising tasks that encompass different noise
levels simultaneously, representing a form of multi-task learning (MTL).
However, analyzing and improving diffusion models from an MTL perspective
remains under-explored. In particular, MTL can sometimes lead to the well-known
phenomenon of , which results in the performance
degradation of certain tasks due to conflicts between tasks. In this paper, we
aim to analyze diffusion training from an MTL standpoint, presenting two key
observations: the task affinity between denoising tasks
diminishes as the gap between noise levels widens, and negative
transfer can arise even in the context of diffusion training. Building upon
these observations, our objective is to enhance diffusion training by
mitigating negative transfer. To achieve this, we propose leveraging existing
MTL methods, but the presence of a huge number of denoising tasks makes this
computationally expensive to calculate the necessary per-task loss or gradient.
To address this challenge, we propose clustering the denoising tasks into small
task clusters and applying MTL methods to them. Specifically, based on
, we employ interval clustering to enforce temporal proximity
among denoising tasks within clusters. We show that interval clustering can be
solved with dynamic programming and utilize signal-to-noise ratio, timestep,
and task affinity for clustering objectives. Through this, our approach
addresses the issue of negative transfer in diffusion models by allowing for
efficient computation of MTL methods. We validate the proposed clustering and
its integration with MTL methods through various experiments, demonstrating
improved sample quality of diffusion models.Comment: 22 pages, 12 figures, under revie
Multi-Architecture Multi-Expert Diffusion Models
Diffusion models have achieved impressive results in generating diverse and
realistic data by employing multi-step denoising processes. However, the need
for accommodating significant variations in input noise at each time-step has
led to diffusion models requiring a large number of parameters for their
denoisers. We have observed that diffusion models effectively act as filters
for different frequency ranges at each time-step noise. While some previous
works have introduced multi-expert strategies, assigning denoisers to different
noise intervals, they overlook the importance of specialized operations for
high and low frequencies. For instance, self-attention operations are effective
at handling low-frequency components (low-pass filters), while convolutions
excel at capturing high-frequency features (high-pass filters). In other words,
existing diffusion models employ denoisers with the same architecture, without
considering the optimal operations for each time-step noise. To address this
limitation, we propose a novel approach called Multi-architecturE Multi-Expert
(MEME), which consists of multiple experts with specialized architectures
tailored to the operations required at each time-step interval. Through
extensive experiments, we demonstrate that MEME outperforms large competitors
in terms of both generation performance and computational efficiency
Towards Practical Plug-and-Play Diffusion Models
Diffusion-based generative models have achieved remarkable success in image
generation. Their guidance formulation allows an external model to
plug-and-play control the generation process for various tasks without
fine-tuning the diffusion model. However, the direct use of publicly available
off-the-shelf models for guidance fails due to their poor performance on noisy
inputs. For that, the existing practice is to fine-tune the guidance models
with labeled data corrupted with noises. In this paper, we argue that this
practice has limitations in two aspects: (1) performing on inputs with
extremely various noises is too hard for a single model; (2) collecting labeled
datasets hinders scaling up for various tasks. To tackle the limitations, we
propose a novel strategy that leverages multiple experts where each expert is
specialized in a particular noise range and guides the reverse process at its
corresponding timesteps. However, as it is infeasible to manage multiple
networks and utilize labeled data, we present a practical guidance framework
termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient
fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet
class conditional generation experiments to show that our method can
successfully guide diffusion with small trainable parameters and no labeled
data. Finally, we show that image classifiers, depth estimators, and semantic
segmentation models can guide publicly available GLIDE through our framework in
a plug-and-play manner
READSUM: Retrieval-Augmented Adaptive Transformer for Source Code Summarization
Code summarization is the process of automatically generating brief and informative summaries of source code to aid in software comprehension and maintenance. In this paper, we propose a novel model called READSUM, REtrieval-augmented ADaptive transformer for source code SUMmarization, that combines both abstractive and extractive approaches. Our proposed model generates code summaries in an abstractive manner, taking into account both the structural and sequential information of the input code, while also utilizing an extractive approach that leverages a retrieved summary of similar code to increase the frequency of important keywords. To effectively blend the original code and the retrieved similar code at the embedding layer stage, we obtain the augmented representation of the original code and the retrieved code through multi-head self-attention. In addition, we develop a self-attention network that adaptively learns the structural and sequential information for the representations in the encoder stage. Furthermore, we design a fusion network to capture the relation between the original code and the retrieved summary at the decoder stage. The fusion network effectively guides summary generation based on the retrieved summary. Finally, READSUM extracts important keywords using an extractive approach and generates high-quality summaries using an abstractive approach that considers both the structural and sequential information of the source code. We demonstrate the superiority of READSUM through various experiments and an ablation study. Additionally, we perform a human evaluation to assess the quality of the generated summary
A Low-Jitter 8-GHz RO-Based ADPLL With PVT-Robust Replica-Based Analog Closed Loop for Supply Noise Compensation
IEEEThis article presents a ring oscillator (RO)-based all-digital phase-locked loop (ADPLL) that is implemented with a high-gain analog closed loop for supply noise compensation (ACSC). The ACSC not only allows high-frequency oscillation of the RO but also is robust over process, voltage, and temperature (PVT) variations thanks to its replica-based configuration. Moreover, a comprehensive analysis of the noise contribution of the ACSC is conducted for the ADPLL to retain its low-jitter output. Implemented in 40-nm CMOS technology, the ADPLL, with a 1.1-V supply, achieves an rms jitter of 289 fs at 8 GHz without any injected supply noise. Under a 20-m <formula> <tex></tex> </formula> white supply noise, the ADPLL gives rms jitters of 8.7 and 0.63 ps at 8 GHz when the ACSC is disabled and enabled, respectively. The overall power consumption and the area of the presented ADPLL are 9.48 mW and 0.055 mm&#x00B2;, respectively.N
Development of integrated prompt gamma imaging and positron emission tomography system for in vivo 3-D dose verification: a Monte Carlo study
An accurate knowledge of in vivo proton dose distribution is key to fully utilizing the potential advantages of proton therapy. Two representative indirect methods for in vivo range verification, namely, prompt gamma (PG) imaging and positron emission tomography (PET), are available. This study proposes a PG-PET system that combines the advantages of these two methods and presents detector geometry and background reduction techniques optimized for the PG-PET system. The characteristics of the secondary radiations emitted by a water phantom by interaction with a 150 MeV proton beam were analysed using Geant4.10.00, and the 2-D PG distributions were obtained and assessed for different detector geometries. In addition, the energy window (EW), depth-of-interaction (DOI), and time-of-flight (TOF) techniques are proposed as the background reduction techniques. To evaluate the performance of the PG-PET system, the 3-D dose distribution in the water phantom caused by two proton beams of energies 80 MeV and 100 MeV was verified using 16 optimal detectors. The thickness of the parallel-hole tungsten collimator of pitch 8 mm and width 7 mm was determined as 200 mm, and that of the GAGG scintillator was determined as 30 mm, by an optimization study. Further, 3–7 MeV and 2–7 MeV were obtained as the optimal EWs when the DOI and both the DOI and TOF techniques were applied for data processing, respectively; the detector performances were improved by about 38% and 167%, respectively, compared with that when applying only the 3–5 MeV EW. In this study, we confirmed that the PG distribution can be obtained by simply combining the 2-D parallel hole collimator and the PET detector module. In the future, we will develop an accurate 3-D dose evaluation technique using deep learning algorithms based on the image sets of dose, PG, and PET distributions for various proton energies
Investigation on Quality Prediction Algorithm in Ultrasonic Metal Welding for Multilayered Cu for Battery Cells
This study focuses on common batteries used in electric vehicles, which are composed of cells grouped into modules and stacked to form a battery pack. Ultrasonic metal welding (UMW) is employed to bond these cells, and ensuring a reliable weld quality by inspection is crucial for maintaining the performance and stability of batteries. Currently, the quality of UMW in battery cells is assessed through sample unit-based T-peel tests and visual inspection, methods that suffer from reduced productivity and increased costs due to their destructive nature. This research introduces an algorithm designed to predict weld quality by analyzing welding process signals of the UMW process, specifically the bonding between 8- -thick Cu foil and 0.2-mm-thick nickel-plated copper strip materials used in battery cell manufacturing. To achieve quality prediction, current and voltage signals from the welder, as well as the displacement signal of the welded part are used. A UMW system was constructed, incorporating a current sensor, a voltage sensor, and a linear variable displacement transducer (LVDT) for measuring these signals. The study further derives welding energy from the current and voltage signals, analyzes changes in the behavior of the sonotrode during welding using the LVDT sensor, and employs data analysis to derive feature variables. These variables are then used in a machine learning-based classification model. Ultimately, the study develops and evaluates a support vector machine (SVM)-based algorithm for real-time UMW quality determination. The algorithm achieves a high classification accuracy of 98%, showcasing its effectiveness in predicting the quality of ultrasonic metal welding in the battery manufacturing process