30,263 research outputs found
Efficient Attention: Attention with Linear Complexities
Dot-product attention has wide applications in computer vision and natural
language processing. However, its memory and computational costs grow
quadratically with the input size. Such growth prohibits its application on
high-resolution inputs. To remedy this drawback, this paper proposes a novel
efficient attention mechanism equivalent to dot-product attention but with
substantially less memory and computational costs. Its resource efficiency
allows more widespread and flexible integration of attention modules into a
network, which leads to better accuracies. Empirical evaluations demonstrated
the effectiveness of its advantages. Efficient attention modules brought
significant performance boosts to object detectors and instance segmenters on
MS-COCO 2017. Further, the resource efficiency democratizes attention to
complex models, where high costs prohibit the use of dot-product attention. As
an exemplar, a model with efficient attention achieved state-of-the-art
accuracies for stereo depth estimation on the Scene Flow dataset. Code is
available at https://github.com/cmsflash/efficient-attention.Comment: To appear at WACV 202
Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding
Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness
Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Detection
Efforts to automate the reconstruction of neural circuits from 3D electron
microscopic (EM) brain images are critical for the field of connectomics. An
important computation for reconstruction is the detection of neuronal
boundaries. Images acquired by serial section EM, a leading 3D EM technique,
are highly anisotropic, with inferior quality along the third dimension. For
such images, the 2D max-pooling convolutional network has set the standard for
performance at boundary detection. Here we achieve a substantial gain in
accuracy through three innovations. Following the trend towards deeper networks
for object recognition, we use a much deeper network than previously employed
for boundary detection. Second, we incorporate 3D as well as 2D filters, to
enable computations that use 3D context. Finally, we adopt a recursively
trained architecture in which a first network generates a preliminary boundary
map that is provided as input along with the original image to a second network
that generates a final boundary map. Backpropagation training is accelerated by
ZNN, a new implementation of 3D convolutional networks that uses multicore CPU
parallelism for speed. Our hybrid 2D-3D architecture could be more generally
applicable to other types of anisotropic 3D images, including video, and our
recursive framework for any image labeling problem
- …