167 research outputs found
MOLE: MOdular Learning FramEwork via Mutual Information Maximization
This paper is to introduce an asynchronous and local learning framework for
neural networks, named Modular Learning Framework (MOLE). This framework
modularizes neural networks by layers, defines the training objective via
mutual information for each module, and sequentially trains each module by
mutual information maximization. MOLE makes the training become local
optimization with gradient-isolated across modules, and this scheme is more
biologically plausible than BP. We run experiments on vector-, grid- and
graph-type data. In particular, this framework is capable of solving both
graph- and node-level tasks for graph-type data. Therefore, MOLE has been
experimentally proven to be universally applicable to different types of data.Comment: accepted by icml ll
Calibrated Adversarial Training
Adversarial training is an approach of increasing the robustness of models to adversarial attacks by including adversarial examples in the training set. One major challenge of producing adversarial examples is to contain sufficient perturbation in the example to flip the model's output while not making severe changes in the example's semantical content. Exuberant change in the semantical content could also change the true label of the example. Adding such examples to the training set results in adverse effects. In this paper, we present the Calibrated Adversarial Training, a method that reduces the adverse effects of semantic perturbations in adversarial training. The method produces pixel-level adaptations to the perturbations based on novel calibrated robust error. We provide theoretical analysis on the calibrated robust error and derive an upper bound for it. Our empirical results show a superior performance of the Calibrated Adversarial Training over a number of public datasets.</p
Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase
Code revert prediction, a specialized form of software defect detection, aims
to forecast or predict the likelihood of code changes being reverted or rolled
back in software development. This task is very important in practice because
by identifying code changes that are more prone to being reverted, developers
and project managers can proactively take measures to prevent issues, improve
code quality, and optimize development processes. However, compared to code
defect detection, code revert prediction has been rarely studied in previous
research. Additionally, many previous methods for code defect detection relied
on independent features but ignored relationships between code scripts.
Moreover, new challenges are introduced due to constraints in an industry
setting such as company regulation, limited features and large-scale codebase.
To overcome these limitations, this paper presents a systematic empirical study
for code revert prediction that integrates the code import graph with code
features. Different strategies to address anomalies and data imbalance have
been implemented including graph neural networks with imbalance classification
and anomaly detection. We conduct the experiments on real-world code commit
data within J.P. Morgan Chase which is extremely imbalanced in order to make a
comprehensive comparison of these different approaches for the code revert
prediction problem.Comment: SDD'23: the 1st International Workshop on Software Defect Dataset
Bridging the Performance Gap between FGSM and PGD Adversarial Training
Deep learning achieves state-of-the-art performance in many tasks but exposes
to the underlying vulnerability against adversarial examples. Across existing
defense techniques, adversarial training with the projected gradient decent
attack (adv.PGD) is considered as one of the most effective ways to achieve
moderate adversarial robustness. However, adv.PGD requires too much training
time since the projected gradient attack (PGD) takes multiple iterations to
generate perturbations. On the other hand, adversarial training with the fast
gradient sign method (adv.FGSM) takes much less training time since the fast
gradient sign method (FGSM) takes one step to generate perturbations but fails
to increase adversarial robustness. In this work, we extend adv.FGSM to make it
achieve the adversarial robustness of adv.PGD. We demonstrate that the large
curvature along FGSM perturbed direction leads to a large difference in
performance of adversarial robustness between adv.FGSM and adv.PGD, and
therefore propose combining adv.FGSM with a curvature regularization
(adv.FGSMR) in order to bridge the performance gap between adv.FGSM and
adv.PGD. The experiments show that adv.FGSMR has higher training efficiency
than adv.PGD. In addition, it achieves comparable performance of adversarial
robustness on MNIST dataset under white-box attack, and it achieves better
performance than adv.PGD under white-box attack and effectively defends the
transferable adversarial attack on CIFAR-10 dataset
Exceptional spatio-temporal behavior mining through Bayesian non-parametric modeling
Collective social media provides a vast amount of geo-tagged social posts, which contain various records on spatio-temporal behavior. Modeling spatio-temporal behavior on collective social media is an important task for applications like tourism recommendation, location prediction and urban planning. Properly accomplishing this task requires a model that allows for diverse behavioral patterns on each of the three aspects: spatial location, time, and text. In this paper, we address the following question: how to find representative subgroups of social posts, for which the spatio-temporal behavioral patterns are substantially different from the behavioral patterns in the whole dataset? Selection and evaluation are the two challenging problems for finding the exceptional subgroups. To address these problems, we propose BNPM: a Bayesian non-parametric model, to model spatio-temporal behavior and infer the exceptionality of social posts in subgroups. By training BNPM on a large amount of randomly sampled subgroups, we can get the global distribution of behavioral patterns. For each given subgroup of social posts, its posterior distribution can be inferred by BNPM. By comparing the posterior distribution with the global distribution, we can quantify the exceptionality of each given subgroup. The exceptionality scores are used to guide the search process within the exceptional model mining framework to automatically discover the exceptional subgroups. Various experiments are conducted to evaluate the effectiveness and efficiency of our method. On four real-world datasets our method discovers subgroups coinciding with events, subgroups distinguishing professionals from tourists, and subgroups whose consistent exceptionality can only be truly appreciated by combining exceptional spatio-temporal and exceptional textual behavior
Deep functional factor models: forecasting high-dimensional functional time series via Bayesian nonparametric factorization
This paper introduces the Deep Functional Factor Model (DF2M), a Bayesian nonparametric model designed for analysis of high-dimensional functional time series. DF2M is built upon the Indian Buffet Process and the multi-task Gaussian Process, incorporating a deep kernel function that captures non-Markovian and nonlinear temporal dynamics. Unlike many black-box deep learning models, DF2M offers an explainable approach to utilizing neural networks by constructing a factor model and integrating deep neural networks within the kernel function. Additionally, we develop a computationally efficient variational inference algorithm to infer DF2M. Empirical results from four real-world datasets demonstrate that DF2M provides better explainability and superior predictive accuracy compared to conventional deep learning models for high-dimensional functional time series
Hop-Count Based Self-Supervised Anomaly Detection on Attributed Networks
Recent years have witnessed an upsurge of interest in the problem of anomaly
detection on attributed networks due to its importance in both research and
practice. Although various approaches have been proposed to solve this problem,
two major limitations exist: (1) unsupervised approaches usually work much less
efficiently due to the lack of supervisory signal, and (2) existing anomaly
detection methods only use local contextual information to detect anomalous
nodes, e.g., one- or two-hop information, but ignore the global contextual
information. Since anomalous nodes differ from normal nodes in structures and
attributes, it is intuitive that the distance between anomalous nodes and their
neighbors should be larger than that between normal nodes and their neighbors
if we remove the edges connecting anomalous and normal nodes. Thus, hop counts
based on both global and local contextual information can be served as the
indicators of anomaly. Motivated by this intuition, we propose a hop-count
based model (HCM) to detect anomalies by modeling both local and global
contextual information. To make better use of hop counts for anomaly
identification, we propose to use hop counts prediction as a self-supervised
task. We design two anomaly scores based on the hop counts prediction via HCM
model to identify anomalies. Besides, we employ Bayesian learning to train HCM
model for capturing uncertainty in learned parameters and avoiding overfitting.
Extensive experiments on real-world attributed networks demonstrate that our
proposed model is effective in anomaly detection.Comment: ECML2022 Accepted. 18 page
- …