9 research outputs found
xUnit: Learning a Spatial Activation Function for Efficient Image Restoration
In recent years, deep neural networks (DNNs) achieved unprecedented
performance in many low-level vision tasks. However, state-of-the-art results
are typically achieved by very deep networks, which can reach tens of layers
with tens of millions of parameters. To make DNNs implementable on platforms
with limited resources, it is necessary to weaken the tradeoff between
performance and efficiency. In this paper, we propose a new activation unit,
which is particularly suitable for image restoration problems. In contrast to
the widespread per-pixel activation units, like ReLUs and sigmoids, our unit
implements a learnable nonlinear function with spatial connections. This
enables the net to capture much more complex features, thus requiring a
significantly smaller number of layers in order to reach the same performance.
We illustrate the effectiveness of our units through experiments with
state-of-the-art nets for denoising, de-raining, and super resolution, which
are already considered to be very small. With our approach, we are able to
further reduce these models by nearly 50% without incurring any degradation in
performance.Comment: Conference on Computer Vision and Pattern Recognition (CVPR), 201
Discovering Variable Binding Circuitry with Desiderata
Recent work has shown that computation in language models may be
human-understandable, with successful efforts to localize and intervene on both
single-unit features and input-output circuits. Here, we introduce an approach
which extends causal mediation experiments to automatically identify model
components responsible for performing a specific subtask by solely specifying a
set of \textit{desiderata}, or causal attributes of the model components
executing that subtask. As a proof of concept, we apply our method to
automatically discover shared \textit{variable binding circuitry} in LLaMA-13B,
which retrieves variable values for multiple arithmetic tasks. Our method
successfully localizes variable binding to only 9 attention heads (of the 1.6k)
and one MLP in the final token's residual stream
Dual Attention GANs for Semantic Image Synthesis
In this paper, we focus on the semantic image synthesis task that aims at
transferring semantic label maps to photo-realistic images. Existing methods
lack effective semantic constraints to preserve the semantic information and
ignore the structural correlations in both spatial and channel dimensions,
leading to unsatisfactory blurry and artifact-prone results. To address these
limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize
photo-realistic and semantically-consistent images with fine details from the
input layouts without imposing extra training overhead or modifying the network
architectures of existing methods. We also propose two novel modules, i.e.,
position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention
Module (CAM), to capture semantic structure attention in spatial and channel
dimensions, respectively. Specifically, SAM selectively correlates the pixels
at each position by a spatial attention map, leading to pixels with the same
semantic label being related to each other regardless of their spatial
distances. Meanwhile, CAM selectively emphasizes the scale-wise features at
each channel by a channel attention map, which integrates associated features
among all channel maps regardless of their scales. We finally sum the outputs
of SAM and CAM to further improve feature representation. Extensive experiments
on four challenging datasets show that DAGAN achieves remarkably better results
than state-of-the-art methods, while using fewer model parameters. The source
code and trained models are available at https://github.com/Ha0Tang/DAGAN.Comment: Accepted to ACM MM 2020, camera ready (9 pages) + supplementary (10
pages
Automation in Interior Space Planning: Utilizing Conditional Generative Adversarial Network Models to Create Furniture Layouts
In interior space planning, the furnishing stage usually entails manual iterative processes, including meeting design objectives, incorporating professional input, and optimizing design performance. Machine learning has the potential to automate and improve interior design processes while maintaining creativity and quality. The aim of this study was to develop a furnishing method that leverages machine learning as a means for enhancing design processes. A secondary aim was to develop a set of evaluation metrics for assessing the quality of the results generated from such methods, enabling comparisons between the performance of different models. To achieve these aims, floor plans were tagged and assembled into a comprehensive dataset that was then employed for training and evaluating three conditional generative adversarial network models (pix2pix, BicycleGAN, and SPADE) to generate furniture layouts within given room boundaries. Post-processing methods for improving the generated results were also developed. Finally, evaluation criteria that combine measures of architectural design with standard computer vision parameters were devised. Visual architectural analyses of the results confirm that the generated rooms adhere to accepted architectural standards. The numerical results indicate that BicycleGAN outperformed the two other models. Moreover, the overall results demonstrate a machine-learning workflow that can be used to augment existing interior design processes