140 research outputs found
On the optimization and generalization of overparameterized implicit neural networks
Implicit neural networks have become increasingly attractive in the machine
learning community since they can achieve competitive performance but use much
less computational resources. Recently, a line of theoretical works established
the global convergences for first-order methods such as gradient descent if the
implicit networks are over-parameterized. However, as they train all layers
together, their analyses are equivalent to only studying the evolution of the
output layer. It is unclear how the implicit layer contributes to the training.
Thus, in this paper, we restrict ourselves to only training the implicit layer.
We show that global convergence is guaranteed, even if only the implicit layer
is trained. On the other hand, the theoretical understanding of when and how
the training performance of an implicit neural network can be generalized to
unseen data is still under-explored. Although this problem has been studied in
standard feed-forward networks, the case of implicit neural networks is still
intriguing since implicit networks theoretically have infinitely many layers.
Therefore, this paper investigates the generalization error for implicit neural
networks. Specifically, we study the generalization of an implicit network
activated by the ReLU function over random initialization. We provide a
generalization bound that is initialization sensitive. As a result, we show
that gradient flow with proper random initialization can train a sufficient
over-parameterized implicit network to achieve arbitrarily small generalization
errors
Graph Neural Networks: A Feature and Structure Learning Approach
Deep neural networks (DNNs) have achieved great success on grid-like data such as images, but face tremendous challenges in learning from more generic data such as graphs. In convolutional neural networks (CNNs), for example, the trainable local filters enable the automatic extraction of high-level features. The computation with filters requires a fixed number of ordered units in the receptive fields. However, the number of neighboring units is neither fixed nor are they ordered in generic graphs, thereby hindering the applications of deep learning operations such as convolution, attention, pooling, and unpooling. To address these limitations, we propose several deep learning methods on graph data in this dissertation.
Graph deep learning methods can be categorized into graph feature learning and graph structure learning. In the category of graph feature learning, we propose to learn graph features via learnable graph convolution operations, graph attention operations, and line graph structures. In learnable graph convolution operations, we propose the learnable graph convolutional layer (LGCL). LGCL automatically selects a fixed number of neighboring nodes for each feature based on value ranking in order to transform graph data into grid-like structures in 1-D format, thereby enabling the use of regular convolutional operations on generic graphs. In graph attention operations, we propose novel hard graph attention operator (hGAO) and channel-wise graph attention operator (cGAO). hGAO uses the hard attention mechanism by attending to only important nodes. Compared to GAO, hGAO improves performance and saves computational cost by only attending to important nodes. To further reduce the requirements on computational resources, we propose the cGAO that performs attention operations along channels. cGAO avoids the dependency on the adjacency matrix, leading to dramatic reductions in computational resource requirements. Beside using original graph structures, we investigate feature learning on auxiliary graph structures such as line graph. By using line graph structures, we propose a weighted line graph that corrects biases in line graphs by assigning normalized weights to edges. Based on our weighted line graphs, we develop a weighted line graph convolution layer that takes advantage of line graph structures for better feature learning. In particular, it performs message passing operations on both the original graph and its corresponding weighted line graph. To address efficiency issues in line graph neural networks, we propose to use an incidence matrix to accurately compute the adjacency matrix of the weighted line graph, leading to dramatic reductions in computational resource usage.
In the category of graph structure learning, we propose several deep learning methods to learn new graph structures. Given images are special cases of graphs with nodes lie on 2D lattices, graph embedding tasks have a natural correspondence with image pixel-wise prediction tasks such as segmentation. While encoder-decoder architectures like U-Nets have been successfully applied on many image pixel-wise prediction tasks, similar methods are lacking for graph data. This is due to the fact that pooling and up-sampling operations are not natural on graph data. To address these challenges, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work. The gPool layer adaptively selects some nodes to form a smaller graph based on their scalar projection values on a trainable projection vector. However, gPool uses global ranking methods to sample some of the important nodes, which is not able to incorporate graph topology information in computing ranking scores. To address this issue, we propose the topology-aware pooling (TAP) layer that uses attention operators to generate ranking scores for each node by attending each node to its neighboring nodes. The ranking scores are generated locally while the selection is performed globally, which enables the pooling operation to consider topology information. We further propose the gUnpool layer as the inverse operation of the gPool layer. The gUnpool layer restores the graph into its original structure using the position information of nodes selected in the corresponding gPool layer. Based on our proposed gPool and gUnpool layers, we develop an encoder-decoder model on graph, known as the graph U-Nets.
Our experimental results on node classification graph classification tasks using both real and simulated data demonstrate the effectiveness and efficiency of our methods
MHITNet: a minimize network with a hierarchical context-attentional filter for segmenting medical ct images
In the field of medical CT image processing, convolutional neural networks
(CNNs) have been the dominant technique.Encoder-decoder CNNs utilise locality
for efficiency, but they cannot simulate distant pixel interactions
properly.Recent research indicates that self-attention or transformer layers
can be stacked to efficiently learn long-range dependencies.By constructing and
processing picture patches as embeddings, transformers have been applied to
computer vision applications. However, transformer-based architectures lack
global semantic information interaction and require a large-scale training
dataset, making it challenging to train with small data samples. In order to
solve these challenges, we present a hierarchical contextattention transformer
network (MHITNet) that combines the multi-scale, transformer, and hierarchical
context extraction modules in skip-connections. The multi-scale module captures
deeper CT semantic information, enabling transformers to encode feature maps of
tokenized picture patches from various CNN stages as input attention sequences
more effectively. The hierarchical context attention module augments global
data and reweights pixels to capture semantic context.Extensive trials on three
datasets show that the proposed MHITNet beats current best practise
- …