286 research outputs found
Augmented Equivariant Attention Networks for Microscopy Image Reconstruction
It is time-consuming and expensive to take high-quality or high-resolution
electron microscopy (EM) and fluorescence microscopy (FM) images. Taking these
images could be even invasive to samples and may damage certain subtleties in
the samples after long or intense exposures, often necessary for achieving
high-quality or high resolution in the first place. Advances in deep learning
enable us to perform image-to-image transformation tasks for various types of
microscopy image reconstruction, computationally producing high-quality images
from the physically acquired low-quality ones. When training image-to-image
transformation models on pairs of experimentally acquired microscopy images,
prior models suffer from performance loss due to their inability to capture
inter-image dependencies and common features shared among images. Existing
methods that take advantage of shared features in image classification tasks
cannot be properly applied to image reconstruction tasks because they fail to
preserve the equivariance property under spatial permutations, something
essential in image-to-image transformation. To address these limitations, we
propose the augmented equivariant attention networks (AEANets) with better
capability to capture inter-image dependencies, while preserving the
equivariance property. The proposed AEANets captures inter-image dependencies
and shared features via two augmentations on the attention mechanism, which are
the shared references and the batch-aware attention during training. We
theoretically derive the equivariance property of the proposed augmented
attention model and experimentally demonstrate its consistent superiority in
both quantitative and visual results over the baseline methods.Comment: 11 pages, 8 figure
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
With the emergence of a spectrum of high-end mobile devices, many
applications that formerly required desktop-level computation capability are
being transferred to these devices. However, executing the inference of Deep
Neural Networks (DNNs) is still challenging considering high computation and
storage demands, specifically, if real-time performance with high accuracy is
needed. Weight pruning of DNNs is proposed, but existing schemes represent two
extremes in the design space: non-structured pruning is fine-grained, accurate,
but not hardware friendly; structured pruning is coarse-grained,
hardware-efficient, but with higher accuracy loss. In this paper, we introduce
a new dimension, fine-grained pruning patterns inside the coarse-grained
structures, revealing a previously unknown point in design space. With the
higher accuracy enabled by fine-grained pruning patterns, the unique insight is
to use the compiler to re-gain and guarantee high hardware efficiency. In other
words, our method achieves the best of both worlds, and is desirable across
theory/algorithm, compiler, and hardware levels. The proposed PatDNN is an
end-to-end framework to efficiently execute DNN on mobile devices with the help
of a novel model compression technique (pattern-based pruning based on extended
ADMM solution framework) and a set of thorough architecture-aware compiler- and
code generation-based optimizations (filter kernel reordering, compressed
weight storage, register load redundancy elimination, and parameter
auto-tuning). Evaluation results demonstrate that PatDNN outperforms three
state-of-the-art end-to-end DNN frameworks, TensorFlow Lite, TVM, and Alibaba
Mobile Neural Network with speedup up to 44.5x, 11.4x, and 7.1x, respectively,
with no accuracy compromise. Real-time inference of representative large-scale
DNNs (e.g., VGG-16, ResNet-50) can be achieved using mobile devices.Comment: To be published in the Proceedings of Twenty-Fifth International
Conference on Architectural Support for Programming Languages and Operating
Systems (ASPLOS 20
Tuning the Performance of a Computational Persistent Homology Package
In recent years, persistent homology has become an attractive method for data analysis. It captures topological features, such as connected components, holes, and voids from point cloud data and summarizes the way in which these features appear and disappear in a filtration sequence. In this project, we focus on improving the performanceof Eirene, a computational package for persistent homology. Eirene is a 5000-line open-source software library implemented in the dynamic programming language Julia. We use the Julia profiling tools to identify performance bottlenecks and develop novel methods to manage them, including the parallelization of some time-consuming functions on multicore/manycore hardware. Empirical results show that performance can be greatly improved
Continuous-Flow Matrix Transposition Using Memories
In this paper, we analyze how to calculate the matrix transposition in continuous flow by using a memory or group of memories. The proposed approach studies this problem for specific conditions such as square and non-square matrices, use of limited access memories and use of several memories in parallel. Contrary to previous approaches, which are based on specific cases or examples, the proposed approach derives the fundamental theory involved in the problem of matrix transposition in a continuous flow. This allows for obtaining the exact equations for the read and write addresses of the memories and other control signals in the circuits. Furthermore, the cases that involve non-square matrices, which have not been studied in detail in the literature, are analyzed in depth in this paper. Experimental results show that the proposed approach is capable of transposing matrices of 8192 times 8192 32-bit data received in series at a rate of 200 mega samples per second, which doubles the throughput of previous approaches. © 2004-2012 IEEE
- …