24 research outputs found

    LSTM Networks for Data-Aware Remaining Time Prediction of Business Process Instances

    Full text link
    Predicting the completion time of business process instances would be a very helpful aid when managing processes under service level agreement constraints. The ability to know in advance the trend of running process instances would allow business managers to react in time, in order to prevent delays or undesirable situations. However, making such accurate forecasts is not easy: many factors may influence the required time to complete a process instance. In this paper, we propose an approach based on deep Recurrent Neural Networks (specifically LSTMs) that is able to exploit arbitrary information associated to single events, in order to produce an as-accurate-as-possible prediction of the completion time of running instances. Experiments on real-world datasets confirm the quality of our proposal.Comment: Article accepted for publication in 2017 IEEE Symposium on Deep Learning (IEEE DL'17) @ SSC

    A tree-based kernel for graphs with continuous attributes

    Full text link
    The availability of graph data with node attributes that can be either discrete or real-valued is constantly increasing. While existing kernel methods are effective techniques for dealing with graphs having discrete node labels, their adaptation to non-discrete or continuous node attributes has been limited, mainly for computational issues. Recently, a few kernels especially tailored for this domain, and that trade predictive performance for computational efficiency, have been proposed. In this paper, we propose a graph kernel for complex and continuous nodes' attributes, whose features are tree structures extracted from specific graph visits. The kernel manages to keep the same complexity of state-of-the-art kernels while implicitly using a larger feature space. We further present an approximated variant of the kernel which reduces its complexity significantly. Experimental results obtained on six real-world datasets show that the kernel is the best performing one on most of them. Moreover, in most cases the approximated version reaches comparable performances to current state-of-the-art kernels in terms of classification accuracy while greatly shortening the running times.Comment: This work has been submitted to the IEEE Transactions on Neural Networks and Learning Systems for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    An Empirical Study on Budget-Aware Online Kernel Algorithms for Streams of Graphs

    Full text link
    Kernel methods are considered an effective technique for on-line learning. Many approaches have been developed for compactly representing the dual solution of a kernel method when the problem imposes memory constraints. However, in literature no work is specifically tailored to streams of graphs. Motivated by the fact that the size of the feature space representation of many state-of-the-art graph kernels is relatively small and thus it is explicitly computable, we study whether executing kernel algorithms in the feature space can be more effective than the classical dual approach. We study three different algorithms and various strategies for managing the budget. Efficiency and efficacy of the proposed approaches are experimentally assessed on relatively large graph streams exhibiting concept drift. It turns out that, when strict memory budget constraints have to be enforced, working in feature space, given the current state of the art on graph kernels, is more than a viable alternative to dual approaches, both in terms of speed and classification performance.Comment: Author's version of the manuscript, to appear in Neurocomputing (ELSEVIER

    A Systematic Assessment of Deep Learning Models for Molecule Generation

    Full text link
    In recent years the scientific community has devoted much effort in the development of deep learning models for the generation of new molecules with desirable properties (i.e. drugs). This has produced many proposals in literature. However, a systematic comparison among the different VAE methods is still missing. For this reason, we propose an extensive testbed for the evaluation of generative models for drug discovery, and we present the results obtained by many of the models proposed in literature

    Conditional Constrained Graph Variational Autoencoders for Molecule Design

    Full text link
    In recent years, deep generative models for graphs have been used to generate new molecules. These models have produced good results, leading to several proposals in the literature. However, these models may have troubles learning some of the complex laws governing the chemical world. In this work, we explore the usage of the histogram of atom valences to drive the generation of molecules in such models. We present Conditional Constrained Graph Variational Autoencoder (CCGVAE), a model that implements this key-idea in a state-of-the-art model, and shows improved results on several evaluation metrics on two commonly adopted datasets for molecule generation

    On Filter Size in Graph Convolutional Networks

    Full text link
    Recently, many researchers have been focusing on the definition of neural networks for graphs. The basic component for many of these approaches remains the graph convolution idea proposed almost a decade ago. In this paper, we extend this basic component, following an intuition derived from the well-known convolutional filters over multi-dimensional tensors. In particular, we derive a simple, efficient and effective way to introduce a hyper-parameter on graph convolutions that influences the filter size, i.e. its receptive field over the considered graph. We show with experimental results on real-world graph datasets that the proposed graph convolutional filter improves the predictive performance of Deep Graph Convolutional Networks.Comment: arXiv admin note: text overlap with arXiv:1811.0693

    Scuba:Scalable kernel-based gene prioritization

    Get PDF
    Abstract Background The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. Results We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Conclusions Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba

    A machine-learning based bio-psycho-social model for the prediction of non-obstructive and obstructive coronary artery disease

    Get PDF
    Background: Mechanisms of myocardial ischemia in obstructive and non-obstructive coronary artery disease (CAD), and the interplay between clinical, functional, biological and psycho-social features, are still far to be fully elucidated. Objectives: To develop a machine-learning (ML) model for the supervised prediction of obstructive versus non-obstructive CAD. Methods: From the EVA study, we analysed adults hospitalized for IHD undergoing conventional coronary angiography (CCA). Non-obstructive CAD was defined by a stenosis < 50% in one or more vessels. Baseline clinical and psycho-socio-cultural characteristics were used for computing a Rockwood and Mitnitski frailty index, and a gender score according to GENESIS-PRAXY methodology. Serum concentration of inflammatory cytokines was measured with a multiplex flow cytometry assay. Through an XGBoost classifier combined with an explainable artificial intelligence tool (SHAP), we identified the most influential features in discriminating obstructive versus non-obstructive CAD. Results: Among the overall EVA cohort (n = 509), 311 individuals (mean age 67 ± 11 years, 38% females; 67% obstructive CAD) with complete data were analysed. The ML-based model (83% accuracy and 87% precision) showed that while obstructive CAD was associated with higher frailty index, older age and a cytokine signature characterized by IL-1β, IL-12p70 and IL-33, non-obstructive CAD was associated with a higher gender score (i.e., social characteristics traditionally ascribed to women) and with a cytokine signature characterized by IL-18, IL-8, IL-23. Conclusions: Integrating clinical, biological, and psycho-social features, we have optimized a sex- and gender-unbiased model that discriminates obstructive and non-obstructive CAD. Further mechanistic studies will shed light on the biological plausibility of these associations. Clinical trial registration: NCT02737982

    Serum Albumin Is Inversely Associated With Portal Vein Thrombosis in Cirrhosis

    Get PDF
    We analyzed whether serum albumin is independently associated with portal vein thrombosis (PVT) in liver cirrhosis (LC) and if a biologic plausibility exists. This study was divided into three parts. In part 1 (retrospective analysis), 753 consecutive patients with LC with ultrasound-detected PVT were retrospectively analyzed. In part 2, 112 patients with LC and 56 matched controls were entered in the cross-sectional study. In part 3, 5 patients with cirrhosis were entered in the in vivo study and 4 healthy subjects (HSs) were entered in the in vitro study to explore if albumin may affect platelet activation by modulating oxidative stress. In the 753 patients with LC, the prevalence of PVT was 16.7%; logistic analysis showed that only age (odds ratio [OR], 1.024; P = 0.012) and serum albumin (OR, -0.422; P = 0.0001) significantly predicted patients with PVT. Analyzing the 112 patients with LC and controls, soluble clusters of differentiation (CD)40-ligand (P = 0.0238), soluble Nox2-derived peptide (sNox2-dp; P < 0.0001), and urinary excretion of isoprostanes (P = 0.0078) were higher in patients with LC. In LC, albumin was correlated with sCD4OL (Spearman's rank correlation coefficient [r(s)], -0.33; P < 0.001), sNox2-dp (r(s), -0.57; P < 0.0001), and urinary excretion of isoprostanes (r(s), -0.48; P < 0.0001) levels. The in vivo study showed a progressive decrease in platelet aggregation, sNox2-dp, and urinary 8-iso prostaglandin F2 alpha-III formation 2 hours and 3 days after albumin infusion. Finally, platelet aggregation, sNox2-dp, and isoprostane formation significantly decreased in platelets from HSs incubated with scalar concentrations of albumin. Conclusion: Low serum albumin in LC is associated with PVT, suggesting that albumin could be a modulator of the hemostatic system through interference with mechanisms regulating platelet activation
    corecore