167 research outputs found

    MOLE: MOdular Learning FramEwork via Mutual Information Maximization

    Full text link
    This paper is to introduce an asynchronous and local learning framework for neural networks, named Modular Learning Framework (MOLE). This framework modularizes neural networks by layers, defines the training objective via mutual information for each module, and sequentially trains each module by mutual information maximization. MOLE makes the training become local optimization with gradient-isolated across modules, and this scheme is more biologically plausible than BP. We run experiments on vector-, grid- and graph-type data. In particular, this framework is capable of solving both graph- and node-level tasks for graph-type data. Therefore, MOLE has been experimentally proven to be universally applicable to different types of data.Comment: accepted by icml ll

    Calibrated Adversarial Training

    Get PDF
    Adversarial training is an approach of increasing the robustness of models to adversarial attacks by including adversarial examples in the training set. One major challenge of producing adversarial examples is to contain sufficient perturbation in the example to flip the model's output while not making severe changes in the example's semantical content. Exuberant change in the semantical content could also change the true label of the example. Adding such examples to the training set results in adverse effects. In this paper, we present the Calibrated Adversarial Training, a method that reduces the adverse effects of semantic perturbations in adversarial training. The method produces pixel-level adaptations to the perturbations based on novel calibrated robust error. We provide theoretical analysis on the calibrated robust error and derive an upper bound for it. Our empirical results show a superior performance of the Calibrated Adversarial Training over a number of public datasets.</p

    Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase

    Full text link
    Code revert prediction, a specialized form of software defect detection, aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development. This task is very important in practice because by identifying code changes that are more prone to being reverted, developers and project managers can proactively take measures to prevent issues, improve code quality, and optimize development processes. However, compared to code defect detection, code revert prediction has been rarely studied in previous research. Additionally, many previous methods for code defect detection relied on independent features but ignored relationships between code scripts. Moreover, new challenges are introduced due to constraints in an industry setting such as company regulation, limited features and large-scale codebase. To overcome these limitations, this paper presents a systematic empirical study for code revert prediction that integrates the code import graph with code features. Different strategies to address anomalies and data imbalance have been implemented including graph neural networks with imbalance classification and anomaly detection. We conduct the experiments on real-world code commit data within J.P. Morgan Chase which is extremely imbalanced in order to make a comprehensive comparison of these different approaches for the code revert prediction problem.Comment: SDD'23: the 1st International Workshop on Software Defect Dataset

    Bridging the Performance Gap between FGSM and PGD Adversarial Training

    Full text link
    Deep learning achieves state-of-the-art performance in many tasks but exposes to the underlying vulnerability against adversarial examples. Across existing defense techniques, adversarial training with the projected gradient decent attack (adv.PGD) is considered as one of the most effective ways to achieve moderate adversarial robustness. However, adv.PGD requires too much training time since the projected gradient attack (PGD) takes multiple iterations to generate perturbations. On the other hand, adversarial training with the fast gradient sign method (adv.FGSM) takes much less training time since the fast gradient sign method (FGSM) takes one step to generate perturbations but fails to increase adversarial robustness. In this work, we extend adv.FGSM to make it achieve the adversarial robustness of adv.PGD. We demonstrate that the large curvature along FGSM perturbed direction leads to a large difference in performance of adversarial robustness between adv.FGSM and adv.PGD, and therefore propose combining adv.FGSM with a curvature regularization (adv.FGSMR) in order to bridge the performance gap between adv.FGSM and adv.PGD. The experiments show that adv.FGSMR has higher training efficiency than adv.PGD. In addition, it achieves comparable performance of adversarial robustness on MNIST dataset under white-box attack, and it achieves better performance than adv.PGD under white-box attack and effectively defends the transferable adversarial attack on CIFAR-10 dataset

    Exceptional spatio-temporal behavior mining through Bayesian non-parametric modeling

    Get PDF
    Collective social media provides a vast amount of geo-tagged social posts, which contain various records on spatio-temporal behavior. Modeling spatio-temporal behavior on collective social media is an important task for applications like tourism recommendation, location prediction and urban planning. Properly accomplishing this task requires a model that allows for diverse behavioral patterns on each of the three aspects: spatial location, time, and text. In this paper, we address the following question: how to find representative subgroups of social posts, for which the spatio-temporal behavioral patterns are substantially different from the behavioral patterns in the whole dataset? Selection and evaluation are the two challenging problems for finding the exceptional subgroups. To address these problems, we propose BNPM: a Bayesian non-parametric model, to model spatio-temporal behavior and infer the exceptionality of social posts in subgroups. By training BNPM on a large amount of randomly sampled subgroups, we can get the global distribution of behavioral patterns. For each given subgroup of social posts, its posterior distribution can be inferred by BNPM. By comparing the posterior distribution with the global distribution, we can quantify the exceptionality of each given subgroup. The exceptionality scores are used to guide the search process within the exceptional model mining framework to automatically discover the exceptional subgroups. Various experiments are conducted to evaluate the effectiveness and efficiency of our method. On four real-world datasets our method discovers subgroups coinciding with events, subgroups distinguishing professionals from tourists, and subgroups whose consistent exceptionality can only be truly appreciated by combining exceptional spatio-temporal and exceptional textual behavior

    Deep functional factor models: forecasting high-dimensional functional time series via Bayesian nonparametric factorization

    Get PDF
    This paper introduces the Deep Functional Factor Model (DF2M), a Bayesian nonparametric model designed for analysis of high-dimensional functional time series. DF2M is built upon the Indian Buffet Process and the multi-task Gaussian Process, incorporating a deep kernel function that captures non-Markovian and nonlinear temporal dynamics. Unlike many black-box deep learning models, DF2M offers an explainable approach to utilizing neural networks by constructing a factor model and integrating deep neural networks within the kernel function. Additionally, we develop a computationally efficient variational inference algorithm to infer DF2M. Empirical results from four real-world datasets demonstrate that DF2M provides better explainability and superior predictive accuracy compared to conventional deep learning models for high-dimensional functional time series

    Hop-Count Based Self-Supervised Anomaly Detection on Attributed Networks

    Get PDF
    Recent years have witnessed an upsurge of interest in the problem of anomaly detection on attributed networks due to its importance in both research and practice. Although various approaches have been proposed to solve this problem, two major limitations exist: (1) unsupervised approaches usually work much less efficiently due to the lack of supervisory signal, and (2) existing anomaly detection methods only use local contextual information to detect anomalous nodes, e.g., one- or two-hop information, but ignore the global contextual information. Since anomalous nodes differ from normal nodes in structures and attributes, it is intuitive that the distance between anomalous nodes and their neighbors should be larger than that between normal nodes and their neighbors if we remove the edges connecting anomalous and normal nodes. Thus, hop counts based on both global and local contextual information can be served as the indicators of anomaly. Motivated by this intuition, we propose a hop-count based model (HCM) to detect anomalies by modeling both local and global contextual information. To make better use of hop counts for anomaly identification, we propose to use hop counts prediction as a self-supervised task. We design two anomaly scores based on the hop counts prediction via HCM model to identify anomalies. Besides, we employ Bayesian learning to train HCM model for capturing uncertainty in learned parameters and avoiding overfitting. Extensive experiments on real-world attributed networks demonstrate that our proposed model is effective in anomaly detection.Comment: ECML2022 Accepted. 18 page
    • …
    corecore