11 research outputs found
xUnit: Learning a Spatial Activation Function for Efficient Image Restoration
In recent years, deep neural networks (DNNs) achieved unprecedented
performance in many low-level vision tasks. However, state-of-the-art results
are typically achieved by very deep networks, which can reach tens of layers
with tens of millions of parameters. To make DNNs implementable on platforms
with limited resources, it is necessary to weaken the tradeoff between
performance and efficiency. In this paper, we propose a new activation unit,
which is particularly suitable for image restoration problems. In contrast to
the widespread per-pixel activation units, like ReLUs and sigmoids, our unit
implements a learnable nonlinear function with spatial connections. This
enables the net to capture much more complex features, thus requiring a
significantly smaller number of layers in order to reach the same performance.
We illustrate the effectiveness of our units through experiments with
state-of-the-art nets for denoising, de-raining, and super resolution, which
are already considered to be very small. With our approach, we are able to
further reduce these models by nearly 50% without incurring any degradation in
performance.Comment: Conference on Computer Vision and Pattern Recognition (CVPR), 201
Discovering Variable Binding Circuitry with Desiderata
Recent work has shown that computation in language models may be
human-understandable, with successful efforts to localize and intervene on both
single-unit features and input-output circuits. Here, we introduce an approach
which extends causal mediation experiments to automatically identify model
components responsible for performing a specific subtask by solely specifying a
set of \textit{desiderata}, or causal attributes of the model components
executing that subtask. As a proof of concept, we apply our method to
automatically discover shared \textit{variable binding circuitry} in LLaMA-13B,
which retrieves variable values for multiple arithmetic tasks. Our method
successfully localizes variable binding to only 9 attention heads (of the 1.6k)
and one MLP in the final token's residual stream
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Fine-tuning on generalized tasks such as instruction following, code
generation, and mathematics has been shown to enhance language models'
performance on a range of tasks. Nevertheless, explanations of how such
fine-tuning influences the internal computations in these models remain
elusive. We study how fine-tuning affects the internal mechanisms implemented
in language models. As a case study, we explore the property of entity
tracking, a crucial facet of language comprehension, where models fine-tuned on
mathematics have substantial performance gains. We identify the mechanism that
enables entity tracking and show that (i) in both the original model and its
fine-tuned versions primarily the same circuit implements entity tracking. In
fact, the entity tracking circuit of the original model on the fine-tuned
versions performs better than the full original model. (ii) The circuits of all
the models implement roughly the same functionality: Entity tracking is
performed by tracking the position of the correct entity in both the original
model and its fine-tuned versions. (iii) Performance boost in the fine-tuned
models is primarily attributed to its improved ability to handle the augmented
positional information. To uncover these findings, we employ: Patch Patching,
DCM, which automatically detects model components responsible for specific
semantics, and CMAP, a new approach for patching activations across models to
reveal improved mechanisms. Our findings suggest that fine-tuning enhances,
rather than fundamentally alters, the mechanistic operation of the model.Comment: ICLR 2024. 26 pages, 13 figures. Code and data at
https://finetuning.baulab.info
FIND: A Function Description Benchmark for Evaluating Interpretability Methods
Labeling neural network submodules with human-legible descriptions is useful
for many downstream tasks: such descriptions can surface failures, guide
interventions, and perhaps even explain important model behaviors. To date,
most mechanistic descriptions of trained networks have involved small models,
narrowly delimited phenomena, and large amounts of human labor. Labeling all
human-interpretable sub-computations in models of increasing size and
complexity will almost certainly require tools that can generate and validate
descriptions automatically. Recently, techniques that use learned models
in-the-loop for labeling have begun to gain traction, but methods for
evaluating their efficacy are limited and ad-hoc. How should we validate and
compare open-ended labeling tools? This paper introduces FIND (Function
INterpretation and Description), a benchmark suite for evaluating the building
blocks of automated interpretability methods. FIND contains functions that
resemble components of trained neural networks, and accompanying descriptions
of the kind we seek to generate. The functions span textual and numeric
domains, and involve a range of real-world complexities. We evaluate methods
that use pretrained language models (LMs) to produce descriptions of function
behavior in natural language and code. Additionally, we introduce a new
interactive method in which an Automated Interpretability Agent (AIA) generates
function descriptions. We find that an AIA, built from an LM with black-box
access to functions, can infer function structure, acting as a scientist by
forming hypotheses, proposing experiments, and updating descriptions in light
of new data. However, AIA descriptions tend to capture global function behavior
and miss local details. These results suggest that FIND will be useful for
evaluating more sophisticated interpretability methods before they are applied
to real-world models.Comment: 28 pages, 10 figure
Dual Attention GANs for Semantic Image Synthesis
In this paper, we focus on the semantic image synthesis task that aims at
transferring semantic label maps to photo-realistic images. Existing methods
lack effective semantic constraints to preserve the semantic information and
ignore the structural correlations in both spatial and channel dimensions,
leading to unsatisfactory blurry and artifact-prone results. To address these
limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize
photo-realistic and semantically-consistent images with fine details from the
input layouts without imposing extra training overhead or modifying the network
architectures of existing methods. We also propose two novel modules, i.e.,
position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention
Module (CAM), to capture semantic structure attention in spatial and channel
dimensions, respectively. Specifically, SAM selectively correlates the pixels
at each position by a spatial attention map, leading to pixels with the same
semantic label being related to each other regardless of their spatial
distances. Meanwhile, CAM selectively emphasizes the scale-wise features at
each channel by a channel attention map, which integrates associated features
among all channel maps regardless of their scales. We finally sum the outputs
of SAM and CAM to further improve feature representation. Extensive experiments
on four challenging datasets show that DAGAN achieves remarkably better results
than state-of-the-art methods, while using fewer model parameters. The source
code and trained models are available at https://github.com/Ha0Tang/DAGAN.Comment: Accepted to ACM MM 2020, camera ready (9 pages) + supplementary (10
pages
Automation in Interior Space Planning: Utilizing Conditional Generative Adversarial Network Models to Create Furniture Layouts
In interior space planning, the furnishing stage usually entails manual iterative processes, including meeting design objectives, incorporating professional input, and optimizing design performance. Machine learning has the potential to automate and improve interior design processes while maintaining creativity and quality. The aim of this study was to develop a furnishing method that leverages machine learning as a means for enhancing design processes. A secondary aim was to develop a set of evaluation metrics for assessing the quality of the results generated from such methods, enabling comparisons between the performance of different models. To achieve these aims, floor plans were tagged and assembled into a comprehensive dataset that was then employed for training and evaluating three conditional generative adversarial network models (pix2pix, BicycleGAN, and SPADE) to generate furniture layouts within given room boundaries. Post-processing methods for improving the generated results were also developed. Finally, evaluation criteria that combine measures of architectural design with standard computer vision parameters were devised. Visual architectural analyses of the results confirm that the generated rooms adhere to accepted architectural standards. The numerical results indicate that BicycleGAN outperformed the two other models. Moreover, the overall results demonstrate a machine-learning workflow that can be used to augment existing interior design processes