154 research outputs found
On the optimization and generalization of overparameterized implicit neural networks
Implicit neural networks have become increasingly attractive in the machine
learning community since they can achieve competitive performance but use much
less computational resources. Recently, a line of theoretical works established
the global convergences for first-order methods such as gradient descent if the
implicit networks are over-parameterized. However, as they train all layers
together, their analyses are equivalent to only studying the evolution of the
output layer. It is unclear how the implicit layer contributes to the training.
Thus, in this paper, we restrict ourselves to only training the implicit layer.
We show that global convergence is guaranteed, even if only the implicit layer
is trained. On the other hand, the theoretical understanding of when and how
the training performance of an implicit neural network can be generalized to
unseen data is still under-explored. Although this problem has been studied in
standard feed-forward networks, the case of implicit neural networks is still
intriguing since implicit networks theoretically have infinitely many layers.
Therefore, this paper investigates the generalization error for implicit neural
networks. Specifically, we study the generalization of an implicit network
activated by the ReLU function over random initialization. We provide a
generalization bound that is initialization sensitive. As a result, we show
that gradient flow with proper random initialization can train a sufficient
over-parameterized implicit network to achieve arbitrarily small generalization
errors
Inferring Data Preconditions from Deep Learning Models for Trustworthy Prediction in Deployment
Deep learning models are trained with certain assumptions about the data
during the development stage and then used for prediction in the deployment
stage. It is important to reason about the trustworthiness of the model's
predictions with unseen data during deployment. Existing methods for specifying
and verifying traditional software are insufficient for this task, as they
cannot handle the complexity of DNN model architecture and expected outcomes.
In this work, we propose a novel technique that uses rules derived from neural
network computations to infer data preconditions for a DNN model to determine
the trustworthiness of its predictions. Our approach, DeepInfer involves
introducing a novel abstraction for a trained DNN model that enables weakest
precondition reasoning using Dijkstra's Predicate Transformer Semantics. By
deriving rules over the inductive type of neural network abstract
representation, we can overcome the matrix dimensionality issues that arise
from the backward non-linear computation from the output layer to the input
layer. We utilize the weakest precondition computation using rules of each kind
of activation function to compute layer-wise precondition from the given
postcondition on the final output of a deep neural network. We extensively
evaluated DeepInfer on 29 real-world DNN models using four different datasets
collected from five different sources and demonstrated the utility,
effectiveness, and performance improvement over closely related work. DeepInfer
efficiently detects correct and incorrect predictions of high-accuracy models
with high recall (0.98) and high F-1 score (0.84) and has significantly
improved over prior technique, SelfChecker. The average runtime overhead of
DeepInfer is low, 0.22 sec for all unseen datasets. We also compared runtime
overhead using the same hardware settings and found that DeepInfer is 3.27
times faster than SelfChecker.Comment: Accepted for publication at the 46th International Conference on
Software Engineering (ICSE 2024
Graph Neural Networks: A Feature and Structure Learning Approach
Deep neural networks (DNNs) have achieved great success on grid-like data such as images, but face tremendous challenges in learning from more generic data such as graphs. In convolutional neural networks (CNNs), for example, the trainable local filters enable the automatic extraction of high-level features. The computation with filters requires a fixed number of ordered units in the receptive fields. However, the number of neighboring units is neither fixed nor are they ordered in generic graphs, thereby hindering the applications of deep learning operations such as convolution, attention, pooling, and unpooling. To address these limitations, we propose several deep learning methods on graph data in this dissertation.
Graph deep learning methods can be categorized into graph feature learning and graph structure learning. In the category of graph feature learning, we propose to learn graph features via learnable graph convolution operations, graph attention operations, and line graph structures. In learnable graph convolution operations, we propose the learnable graph convolutional layer (LGCL). LGCL automatically selects a fixed number of neighboring nodes for each feature based on value ranking in order to transform graph data into grid-like structures in 1-D format, thereby enabling the use of regular convolutional operations on generic graphs. In graph attention operations, we propose novel hard graph attention operator (hGAO) and channel-wise graph attention operator (cGAO). hGAO uses the hard attention mechanism by attending to only important nodes. Compared to GAO, hGAO improves performance and saves computational cost by only attending to important nodes. To further reduce the requirements on computational resources, we propose the cGAO that performs attention operations along channels. cGAO avoids the dependency on the adjacency matrix, leading to dramatic reductions in computational resource requirements. Beside using original graph structures, we investigate feature learning on auxiliary graph structures such as line graph. By using line graph structures, we propose a weighted line graph that corrects biases in line graphs by assigning normalized weights to edges. Based on our weighted line graphs, we develop a weighted line graph convolution layer that takes advantage of line graph structures for better feature learning. In particular, it performs message passing operations on both the original graph and its corresponding weighted line graph. To address efficiency issues in line graph neural networks, we propose to use an incidence matrix to accurately compute the adjacency matrix of the weighted line graph, leading to dramatic reductions in computational resource usage.
In the category of graph structure learning, we propose several deep learning methods to learn new graph structures. Given images are special cases of graphs with nodes lie on 2D lattices, graph embedding tasks have a natural correspondence with image pixel-wise prediction tasks such as segmentation. While encoder-decoder architectures like U-Nets have been successfully applied on many image pixel-wise prediction tasks, similar methods are lacking for graph data. This is due to the fact that pooling and up-sampling operations are not natural on graph data. To address these challenges, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work. The gPool layer adaptively selects some nodes to form a smaller graph based on their scalar projection values on a trainable projection vector. However, gPool uses global ranking methods to sample some of the important nodes, which is not able to incorporate graph topology information in computing ranking scores. To address this issue, we propose the topology-aware pooling (TAP) layer that uses attention operators to generate ranking scores for each node by attending each node to its neighboring nodes. The ranking scores are generated locally while the selection is performed globally, which enables the pooling operation to consider topology information. We further propose the gUnpool layer as the inverse operation of the gPool layer. The gUnpool layer restores the graph into its original structure using the position information of nodes selected in the corresponding gPool layer. Based on our proposed gPool and gUnpool layers, we develop an encoder-decoder model on graph, known as the graph U-Nets.
Our experimental results on node classification graph classification tasks using both real and simulated data demonstrate the effectiveness and efficiency of our methods
- …