513 research outputs found

    Graph Convolutional Networks for Traffic Forecasting with Missing Values

    Full text link
    Traffic forecasting has attracted widespread attention recently. In reality, traffic data usually contains missing values due to sensor or communication errors. The Spatio-temporal feature in traffic data brings more challenges for processing such missing values, for which the classic techniques (e.g., data imputations) are limited: 1) in temporal axis, the values can be randomly or consecutively missing; 2) in spatial axis, the missing values can happen on one single sensor or on multiple sensors simultaneously. Recent models powered by Graph Neural Networks achieved satisfying performance on traffic forecasting tasks. However, few of them are applicable to such a complex missing-value context. To this end, we propose GCN-M, a Graph Convolutional Network model with the ability to handle the complex missing values in the Spatio-temporal context. Particularly, we jointly model the missing value processing and traffic forecasting tasks, considering both local Spatio-temporal features and global historical patterns in an attention-based memory network. We propose as well a dynamic graph learning module based on the learned local-global features. The experimental results on real-life datasets show the reliability of our proposed method.Comment: To appear in Data Mining and Knowledge Discovery (DMKD), Springe

    DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection

    Full text link
    Graph Anomaly Detection (GAD) has recently become a hot research spot due to its practicability and theoretical value. Since GAD emphasizes the application and the rarity of anomalous samples, enriching the varieties of its datasets is a fundamental work. Thus, this paper present DGraph, a real-world dynamic graph in the finance domain. DGraph overcomes many limitations of current GAD datasets. It contains about 3M nodes, 4M dynamic edges, and 1M ground-truth nodes. We provide a comprehensive observation of DGraph, revealing that anomalous nodes and normal nodes generally have different structures, neighbor distribution, and temporal dynamics. Moreover, it suggests that those unlabeled nodes are also essential for detecting fraudsters. Furthermore, we conduct extensive experiments on DGraph. Observation and experiments demonstrate that DGraph is propulsive to advance GAD research and enable in-depth exploration of anomalous nodes.Comment: 9 page

    Data Imputation Using Differential Dependency and Fuzzy Multi-Objective Linear Programming

    Get PDF
    Missing or incomplete data is a serious problem when it comes to collecting and analyzing data for forecasting, estimating, and decision making. Since data quality is so important in machine learning and its results, in most cases data imputation is much more appropriate than ignoring them. Missing data imputation is often based on considering equality, similarity, or distance of neighbors. Researchers use different approaches for neighbors\u27 equalities or similarities. Every approach has its advantages and limitations. Instead of equality, some researchers use inequalities together with a few relationships or similarity rules. In this thesis, after recalling some basic imputation methods, we discus about data imputation based on differential dependencies (DDs). DDs are conditional rules in which the closeness of the values of each pair of tuples in some attribute indicates the closeness of the values of those tuples in another attribute. Considering these rules, a few rows are created for each incomplete row and placed in the set of candidates for that row. Then from each set one row is selected such that they are not incompatible with each other. These selections are made by an integer linear programming (ILP) model. In this thesis, first, we propose an algorithm to generate DDs. Then in order to improve the previous approaches to increase the percentage of imputation, we suggest fuzzy relaxation that allows a little violation from DDs. Finally, we propose a multi-objective fuzzy linear programming to reach an imputation with more percentage of imputation in addition to decrease the summation of violations. A variety of datasets from “Kaggle” is used to support our approach

    Local Embeddings for Relational Data Integration

    Full text link
    Deep learning based techniques have been recently used with promising results for data integration problems. Some methods directly use pre-trained embeddings that were trained on a large corpus such as Wikipedia. However, they may not always be an appropriate choice for enterprise datasets with custom vocabulary. Other methods adapt techniques from natural language processing to obtain embeddings for the enterprise's relational data. However, this approach blindly treats a tuple as a sentence, thus losing a large amount of contextual information present in the tuple. We propose algorithms for obtaining local embeddings that are effective for data integration tasks on relational databases. We make four major contributions. First, we describe a compact graph-based representation that allows the specification of a rich set of relationships inherent in the relational world. Second, we propose how to derive sentences from such a graph that effectively "describe" the similarity across elements (tokens, attributes, rows) in the two datasets. The embeddings are learned based on such sentences. Third, we propose effective optimization to improve the quality of the learned embeddings and the performance of integration tasks. Finally, we propose a diverse collection of criteria to evaluate relational embeddings and perform an extensive set of experiments validating them against multiple baseline methods. Our experiments show that our framework, EmbDI, produces meaningful results for data integration tasks such as schema matching and entity resolution both in supervised and unsupervised settings.Comment: Accepted to SIGMOD 2020 as Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks. Code can be found at https://gitlab.eurecom.fr/cappuzzo/embd

    Quantifying Semantic Similarity Across Languages

    Get PDF
    corecore