466 research outputs found

    Deep Learning of Causal Structures in High Dimensions

    Full text link
    Recent years have seen rapid progress at the intersection between causality and machine learning. Motivated by scientific applications involving high-dimensional data, in particular in biomedicine, we propose a deep neural architecture for learning causal relationships between variables from a combination of empirical data and prior causal knowledge. We combine convolutional and graph neural networks within a causal risk framework to provide a flexible and scalable approach. Empirical results include linear and nonlinear simulations (where the underlying causal structures are known and can be directly compared against), as well as a real biological example where the models are applied to high-dimensional molecular data and their output compared against entirely unseen validation experiments. These results demonstrate the feasibility of using deep learning approaches to learn causal networks in large-scale problems spanning thousands of variables

    xFraud: Explainable Fraud Transaction Detection

    Full text link
    At online retail platforms, it is crucial to actively detect the risks of transactions to improve customer experience and minimize financial loss. In this work, we propose xFraud, an explainable fraud transaction prediction framework which is mainly composed of a detector and an explainer. The xFraud detector can effectively and efficiently predict the legitimacy of incoming transactions. Specifically, it utilizes a heterogeneous graph neural network to learn expressive representations from the informative heterogeneously typed entities in the transaction logs. The explainer in xFraud can generate meaningful and human-understandable explanations from graphs to facilitate further processes in the business unit. In our experiments with xFraud on real transaction networks with up to 1.1 billion nodes and 3.7 billion edges, xFraud is able to outperform various baseline models in many evaluation metrics while remaining scalable in distributed settings. In addition, we show that xFraud explainer can generate reasonable explanations to significantly assist the business analysis via both quantitative and qualitative evaluations.Comment: This is the extended version of a full paper to appear in PVLDB 15 (3) (VLDB 2022

    Appling an Improved Method Based on ARIMA Model to Predict the Short-Term Electricity Consumption Transmitted by the Internet of Things (IoT)

    Get PDF
    The rapid development of the Internet of Things (IoT) has brought a data explosion and a new set of challenges. It has been an emergency to construct a more robust and precise model to predict the electricity consumption data collected from the Internet of Things (IoT). Accurately forecasting the electricity consumption is a crucial technology for the planning of the energy resource which could lead to remarkable conservation of the building electricity consumption. This paper is focused on the electricity consumption forecasting of an office building with a small-scale dataset, and 117 daily electricity consumption of the building are involved in the dataset, among which 89 values are selected as the training dataset and the remaining 28 values as the testing dataset. The hybrid model ARIMA (autoregression integrated moving average)-SVR (support vector regression) is proposed to predict the electricity consumption with different prediction horizons ranging from 1 day to 28 days. The model performances are assessed by three evaluation indicators, respectively, are the mean squared error (MSE), the root mean square error (RMSE), and the mean absolute percentage error (MAPE). The proposed model ARIMA-SVR is compared with the other four models, respectively, are the ARIMA, ARIMA-GBR (gradient boosting regression), LSTM (long short-term memory), and GRU (gated recurrent unit) models. The experiment result shows that the ARIMA-SVR model has lower prediction errors when the prediction horizon is within 20 days, and the ARIMA model is better when the prediction horizon is in the interval of 20 to 28 days. The provided method ARIMA-SVR has higher flexibility, and it is a great choice for electricity consumption prediction with more accurate results

    Estimating networks of sustainable development goals

    Get PDF
    An increasing number of researchers and practitioners advocate for a systemic understanding of the Sustainable Development Goals (SDGs) through interdependency networks. Ironically, the burgeoning network-estimation literature seems neglected by this community. We provide an introduction to the most suitable estimation methods for SDG networks. Building a dataset with 87 development indicators in four countries over 20 years, we perform a comparative study of these methods. We find important differences in the estimated network structures as well as in synergies and trade-offs between SDGs. Finally, we provide some guidelines on the potentials and limitations of estimating SDG networks for policy advice

    A history and theory of textual event detection and recognition

    Get PDF

    Integration of multi-scale protein interactions for biomedical data analysis

    Get PDF
    With the advancement of modern technologies, we observe an increasing accumulation of biomedical data about diseases. There is a need for computational methods to sift through and extract knowledge from the diverse data available in order to improve our mechanistic understanding of diseases and improve patient care. Biomedical data come in various forms as exemplified by the various omics data. Existing studies have shown that each form of omics data gives only partial information on cells state and motivated jointly mining multi-omics, multi-modal data to extract integrated system knowledge. The interactome is of particular importance as it enables the modelling of dependencies arising from molecular interactions. This Thesis takes a special interest in the multi-scale protein interactome and its integration with computational models to extract relevant information from biomedical data. We define multi-scale interactions at different omics scale that involve proteins: pairwise protein-protein interactions, multi-protein complexes, and biological pathways. Using hypergraph representations, we motivate considering higher-order protein interactions, highlighting the complementary biological information contained in the multi-scale interactome. Based on those results, we further investigate how those multi-scale protein interactions can be used as either prior knowledge, or auxiliary data to develop machine learning algorithms. First, we design a neural network using the multi-scale organization of proteins in a cell into biological pathways as prior knowledge and train it to predict a patient's diagnosis based on transcriptomics data. From the trained models, we develop a strategy to extract biomedical knowledge pertaining to the diseases investigated. Second, we propose a general framework based on Non-negative Matrix Factorization to integrate the multi-scale protein interactome with multi-omics data. We show that our approach outperforms the existing methods, provide biomedical insights and relevant hypotheses for specific cancer types
    corecore