130 research outputs found
Dilated Deep Residual Network for Image Denoising
Variations of deep neural networks such as convolutional neural network (CNN)
have been successfully applied to image denoising. The goal is to automatically
learn a mapping from a noisy image to a clean image given training data
consisting of pairs of noisy and clean images. Most existing CNN models for
image denoising have many layers. In such cases, the models involve a large
amount of parameters and are computationally expensive to train. In this paper,
we develop a dilated residual CNN for Gaussian image denoising. Compared with
the recently proposed residual denoiser, our method can achieve comparable
performance with less computational cost. Specifically, we enlarge receptive
field by adopting dilated convolution in residual network, and the dilation
factor is set to a certain value. We utilize appropriate zero padding to make
the dimension of the output the same as the input. It has been proven that the
expansion of receptive field can boost the CNN performance in image
classification, and we further demonstrate that it can also lead to competitive
performance for denoising problem. Moreover, we present a formula to calculate
receptive field size when dilated convolution is incorporated. Thus, the change
of receptive field can be interpreted mathematically. To validate the efficacy
of our approach, we conduct extensive experiments for both gray and color image
denoising with specific or randomized noise levels. Both of the quantitative
measurements and the visual results of denoising are promising comparing with
state-of-the-art baselines.Comment: camera ready, 8 pages, accepted to IEEE ICTAI 201
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
Augmenting large language models (LLMs) with external tools has emerged as a
promising approach to solving complex problems. However, traditional methods,
which finetune LLMs with tool demonstration data, can be both costly and
restricted to a predefined set of tools. Recent in-context learning paradigm
alleviates these issues, but the limited context length only allows for a few
shots of demonstrations, leading to suboptimal understandings of the tools.
Moreover, when there are numerous tools to choose from, in-context learning
could completely fail to work. In this paper, we propose an alternative
approach, , which combines the benefits of both sides. Our
approach represents each as a to
() and learns an embedding for it, enabling tool calls in the
same way as generating a regular word token. Once a toolken is triggered, the
LLM is prompted to complete arguments for the tool to execute. ToolkenGPT
offers the flexibility to plug in an arbitrary number of tools by expanding the
set of toolkens on the fly. In addition, it improves tool use by allowing
extensive demonstration data for learning the toolken embeddings. In diverse
domains, including numerical reasoning, knowledge-based question answering, and
embodied plan generation, our approach effectively augments LLMs with tools and
substantially outperforms various latest baselines. ToolkenGPT demonstrates the
promising ability to use relevant tools from a large tool set in complex
scenarios
Your Contrastive Learning Is Secretly Doing Stochastic Neighbor Embedding
Contrastive learning, especially self-supervised contrastive learning (SSCL),
has achieved great success in extracting powerful features from unlabeled data.
In this work, we contribute to the theoretical understanding of SSCL and
uncover its connection to the classic data visualization method, stochastic
neighbor embedding (SNE), whose goal is to preserve pairwise distances. From
the perspective of preserving neighboring information, SSCL can be viewed as a
special case of SNE with the input space pairwise similarities specified by
data augmentation. The established correspondence facilitates deeper
theoretical understanding of learned features of SSCL, as well as
methodological guidelines for practical improvement. Specifically, through the
lens of SNE, we provide novel analysis on domain-agnostic augmentations,
implicit bias and robustness of learned features. To illustrate the practical
advantage, we demonstrate that the modifications from SNE to -SNE can also
be adopted in the SSCL setting, achieving significant improvement in both
in-distribution and out-of-distribution generalization.Comment: Accepted by ICLR 202
Valuing Nature in Business-A Case Study of Chemical Manufacturing and Forest Products Industries
Over the past several decades, there has been an increased realization of the extent to which the means of production in human society depend on and impact increasingly fragile natural systems. Working with our client, The Nature Conservancy, we researched trends in ecosystem valuation within the chemical manufacturing and forest product industries, discerning ways to identify and evaluate future ecosystem investment opportunities. This research resulted in a framework that businesses could use to identify future ecosystem service opportunities and then score the opportunities’ business values using a multi-criteria analysis approach.
We identified potential ecosystem service opportunities by overlaying classifications of business risk on major operational subsectors within the industries, populating the resulting table with key ecosystem impacts and opportunities. Through the application of this process, we identified three hypothetical ecosystem service projects applicable to both the chemical manufacturing and forest product industries and used them to test our scoring framework. The identified projects were constructed wetlands for wastewater treatment, coastal habitat protection for storm surge protection, and forest carbon sequestration. We ranked the business value of each project using five criteria important to businesses: financial value, reputational benefits, environmental risk reduction, political and regulatory enabling conditions, and level of knowledge and activity in the field. According to our research, businesses emphasize financial benefits most highly when evaluating potential investments, so we weighted financial values most heavily in our ranking scheme. Our analysis indicated that a forest carbon sequestration project had the highest potential business value relative to the other project types due to its higher expected financial benefits. The constructed wetland project, which also had a relatively high expected financial benefit, followed second. Finally, the coastal habitat protection project had the lowest relative business value due to high costs, a low level of scientific knowledge, and weak regulatory support.
The identification and ranking methodologies are designed to be flexible, allowing adaptation for use given varying business objectives. The weights on the five valuation criteria can be adjusted to reflect a business’s concerns. This scoring methodology is useful for businesses because few tools exist to enable comparative analysis of business ecosystem service investments. We believe this tool provides a useful approach to determining the value that nature and ecosystem services provide to a wide range of businesses, and we recommend its application outside the chemical manufacturing and forest products industry for further refinement
Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models
Due to the ease of training, ability to scale, and high sample quality,
diffusion models (DMs) have become the preferred option for generative
modeling, with numerous pre-trained models available for a wide variety of
datasets. Containing intricate information about data distributions,
pre-trained DMs are valuable assets for downstream applications. In this work,
we consider learning from pre-trained DMs and transferring their knowledge to
other generative models in a data-free fashion. Specifically, we propose a
general framework called Diff-Instruct to instruct the training of arbitrary
generative models as long as the generated samples are differentiable with
respect to the model parameters. Our proposed Diff-Instruct is built on a
rigorous mathematical foundation where the instruction process directly
corresponds to minimizing a novel divergence we call Integral Kullback-Leibler
(IKL) divergence. IKL is tailored for DMs by calculating the integral of the KL
divergence along a diffusion process, which we show to be more robust in
comparing distributions with misaligned supports. We also reveal non-trivial
connections of our method to existing works such as DreamFusion, and generative
adversarial training. To demonstrate the effectiveness and universality of
Diff-Instruct, we consider two scenarios: distilling pre-trained diffusion
models and refining existing GAN models. The experiments on distilling
pre-trained diffusion models show that Diff-Instruct results in
state-of-the-art single-step diffusion-based models. The experiments on
refining GAN models show that the Diff-Instruct can consistently improve the
pre-trained generators of GAN models across various settings
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
In-Context Learning (ICL) has been a powerful emergent property of large
language models that has attracted increasing attention in recent years. In
contrast to regular gradient-based learning, ICL is highly interpretable and
does not require parameter updates. In this paper, we show that, for linearized
transformer networks, ICL can be made explicit and permanent through the
inclusion of bias terms. We mathematically demonstrate the equivalence between
a model with ICL demonstration prompts and the same model with the additional
bias terms. Our algorithm (ICLCA) allows for exact conversion in an inexpensive
manner. Existing methods are not exact and require expensive parameter updates.
We demonstrate the efficacy of our approach through experiments that show the
exact incorporation of ICL tokens into a linear transformer. We further suggest
how our method can be adapted to achieve cheap approximate conversion of ICL
tokens, even in regular transformer networks that are not linearized. Our
experiments on GPT-2 show that, even though the conversion is only approximate,
the model still gains valuable context from the included bias terms.Comment: Accepted to ICML 202
Training Energy-Based Models with Diffusion Contrastive Divergences
Energy-Based Models (EBMs) have been widely used for generative modeling.
Contrastive Divergence (CD), a prevailing training objective for EBMs, requires
sampling from the EBM with Markov Chain Monte Carlo methods (MCMCs), which
leads to an irreconcilable trade-off between the computational burden and the
validity of the CD. Running MCMCs till convergence is computationally
intensive. On the other hand, short-run MCMC brings in an extra non-negligible
parameter gradient term that is difficult to handle. In this paper, we provide
a general interpretation of CD, viewing it as a special instance of our
proposed Diffusion Contrastive Divergence (DCD) family. By replacing the
Langevin dynamic used in CD with other EBM-parameter-free diffusion processes,
we propose a more efficient divergence. We show that the proposed DCDs are both
more computationally efficient than the CD and are not limited to a
non-negligible gradient term. We conduct intensive experiments, including both
synthesis data modeling and high-dimensional image denoising and generation, to
show the advantages of the proposed DCDs. On the synthetic data learning and
image denoising experiments, our proposed DCD outperforms CD by a large margin.
In image generation experiments, the proposed DCD is capable of training an
energy-based model for generating the Celab-A dataset, which is
comparable to existing EBMs
- …