13 research outputs found
GAFlow: Incorporating Gaussian Attention into Optical Flow
Optical flow, or the estimation of motion fields from image sequences, is one
of the fundamental problems in computer vision. Unlike most pixel-wise tasks
that aim at achieving consistent representations of the same category, optical
flow raises extra demands for obtaining local discrimination and smoothness,
which yet is not fully explored by existing approaches. In this paper, we push
Gaussian Attention (GA) into the optical flow models to accentuate local
properties during representation learning and enforce the motion affinity
during matching. Specifically, we introduce a novel Gaussian-Constrained Layer
(GCL) which can be easily plugged into existing Transformer blocks to highlight
the local neighborhood that contains fine-grained structural information.
Moreover, for reliable motion analysis, we provide a new Gaussian-Guided
Attention Module (GGAM) which not only inherits properties from Gaussian
distribution to instinctively revolve around the neighbor fields of each point
but also is empowered to put the emphasis on contextually related regions
during matching. Our fully-equipped model, namely Gaussian Attention Flow
network (GAFlow), naturally incorporates a series of novel Gaussian-based
modules into the conventional optical flow framework for reliable motion
analysis. Extensive experiments on standard optical flow datasets consistently
demonstrate the exceptional performance of the proposed approach in terms of
both generalization ability evaluation and online benchmark testing. Code is
available at https://github.com/LA30/GAFlow.Comment: To appear in ICCV-202
Segmentation method of U-net sheet metal engineering drawing based on CBAM attention mechanism
In the manufacturing process of heavy industrial equipment, the specific unit
in the welding diagram is first manually redrawn and then the corresponding
sheet metal parts are cut, which is inefficient. To this end, this paper
proposes a U-net-based method for the segmentation and extraction of specific
units in welding engineering drawings. This method enables the cutting device
to automatically segment specific graphic units according to visual information
and automatically cut out sheet metal parts of corresponding shapes according
to the segmentation results. This process is more efficient than traditional
human-assisted cutting. Two weaknesses in the U-net network will lead to a
decrease in segmentation performance: first, the focus on global semantic
feature information is weak, and second, there is a large dimensional
difference between shallow encoder features and deep decoder features. Based on
the CBAM (Convolutional Block Attention Module) attention mechanism, this paper
proposes a U-net jump structure model with an attention mechanism to improve
the network's global semantic feature extraction ability. In addition, a U-net
attention mechanism model with dual pooling convolution fusion is designed, the
deep encoder's maximum pooling + convolution features and the shallow encoder's
average pooling + convolution features are fused vertically to reduce the
dimension difference between the shallow encoder and deep decoder. The
dual-pool convolutional attention jump structure replaces the traditional U-net
jump structure, which can effectively improve the specific unit segmentation
performance of the welding engineering drawing. Using vgg16 as the backbone
network, experiments have verified that the IoU, mAP, and Accu of our model in
the welding engineering drawing dataset segmentation task are 84.72%, 86.84%,
and 99.42%, respectively
Deep Attention Networks for Images and Graphs
Deep learning has achieved great success in various machine learning areas, such as computer vision, natural language processing, and graph representation learning. While numerous deep neural networks (DNNs) have been proposed, the set of fundamental building blocks of DNNs remains small, including fully-connected layers, convolutions and recurrent units. Recently, the attention mechanism has shown promise in serving as a new kind of fundamental building blocks. Deep attention networks (DANs), i.e. DNNs that use the attention mechanism as a fundamental building block, have revolutionized the area of natural language processing. However, developing DANs for computer vision and graph representation learning applications is still challenging. Due to the intrinsic differences in data and applications, directly migrating DANs from textual data to images and graphs is usually either infeasible or ineffective. In this dissertation, we address this challenge by analyzing the functionality of the attention mechanism and exploring scenarios where DANs can push the limits of current DNNs. We propose several effective DANs for images and graphs.
For images, we build DANs for a variety of image-to-image transformation applications by proposing powerful attention-based building blocks. First, we start the exploration through studying a common problem in dilated convolutions, which naturally results in the use of the attention mechanism. Dilated convolutions, a variant of convolutions, have been widely applied in deep convolutional neural networks (DCNNs) for image segmentation. However, dilated convolutions suffer from the gridding artifacts, which hampers the performance. We propose two simple yet effective degridding methods by studying a decomposition of dilated convolutions, and generalize them by defining separable and shared (SS) operators. Then we connect the SS operators with the attention mechanism and propose the SS output layer, which is able to smooth the entire DCNNs by only replacing the output layer and improves the performance significantly. Second, we notice an interesting fact from the first study that, as the attention mechanism allows the SS output layer to have a receptive field of any size, the best performance is achieved when using a global receptive field. This fact motivates us to think of the attention mechanism as global operators, as opposed to local operators like convolutions. With this insight, we propose the non-local U-Nets, which are equipped with flexible attention-based global aggregation blocks, for biomedical image segmentation. In particular, we are the first to enable the attention mechanism for down-sampling and up-sampling processes. Finally, we go beyond biomedical image segmentation and extend the non-local U-Nets to global voxel transformer networks (GVTNets), which serve as a powerful open-source tool for 3D image-to-image transformation tasks. In addition to leveraging the non-local property of the attention mechanism under the supervised learning setting, we also investigate the generalization ability of the attention mechanism under the transfer learning setting. We perform thorough experiments on a wide range of real-world image-to-image transformation tasks, whose results clearly demonstrate the effectiveness and efficiency of our proposed DANs.
For graphs, we develop DANs for both graph and node classification applications. First, we focus on graph pooling, which is necessary for graph neural networks (GNNs) to perform graph classification tasks. In particular, we point out that the second-order pooling naturally satisfies the requirement of graph pooling but encounters practical problems. To overcome these problems, we propose attentional second-order pooling. Specifically, we bridge the second-order pooling with the attention mechanism and design an attention-based pooling method that can be flexibly used as either global or hierarchical graph pooling. Second, on node classification tasks, we pay attention to the problem that most GNNs lack the ability of performing effective non-local aggregation, which greatly limits the performance on disassortative graphs. In particular, it even leads to worse performance of GNNs than simple multi-layer perceptrons on some disassortative graphs. In order to address this problem, we propose a simple yet effective non-local aggregation framework with an efficient attention-guided sorting for GNNs, based on which we develop non-local GNNs. Experimental results on various graph and node classification benchmark datasets show that our DANs improve the performance significantly and consistently
Augmented Equivariant Attention Networks for Microscopy Image Reconstruction
It is time-consuming and expensive to take high-quality or high-resolution
electron microscopy (EM) and fluorescence microscopy (FM) images. Taking these
images could be even invasive to samples and may damage certain subtleties in
the samples after long or intense exposures, often necessary for achieving
high-quality or high resolution in the first place. Advances in deep learning
enable us to perform image-to-image transformation tasks for various types of
microscopy image reconstruction, computationally producing high-quality images
from the physically acquired low-quality ones. When training image-to-image
transformation models on pairs of experimentally acquired microscopy images,
prior models suffer from performance loss due to their inability to capture
inter-image dependencies and common features shared among images. Existing
methods that take advantage of shared features in image classification tasks
cannot be properly applied to image reconstruction tasks because they fail to
preserve the equivariance property under spatial permutations, something
essential in image-to-image transformation. To address these limitations, we
propose the augmented equivariant attention networks (AEANets) with better
capability to capture inter-image dependencies, while preserving the
equivariance property. The proposed AEANets captures inter-image dependencies
and shared features via two augmentations on the attention mechanism, which are
the shared references and the batch-aware attention during training. We
theoretically derive the equivariance property of the proposed augmented
attention model and experimentally demonstrate its consistent superiority in
both quantitative and visual results over the baseline methods.Comment: 11 pages, 8 figure
AI for Healthcare: Diagnosis, Clinical-Trial Matching, and Patient Recruitment
Medical diagnosis is the most critical component in the treatment of a patient. But diagnosis often is a complicated process since a myriad of diseases share the same symptoms. If a patient is diagnosed with a disease in its end-stage, potential new treatments (clinical trials) are sometimes the last option available. However, matching a patient to the correct clinical-trial requires advanced medical knowledge on behalf of the patient. In this study, we try to address the following problems and close the technical gaps, (i) Diagnosis: Advances in neural network approaches and the availability of massive labeled datasets have sparked renewed interests in automated diagnosis. We explore novel techniques to identify pathology in chest radiographs by using a labeled radiograph dataset, which is also substantially large for the domain of medical diagnosis. (ii) Clinical-Trial Matching: Given the difficulty of perusing the jargon in standard clinical trial texts, we try to complement the process by using machine learning and information retrieval methods to fetch similar health records showing the entities responsible for the match. We implement an efficient visual tool (TextMed) to aid our algorithm and make it easier for users to utilize the power of machine learning. Our tool helps in searching through a database of criteria and records and fetches the information about the query