24 research outputs found
LSTM Networks for Data-Aware Remaining Time Prediction of Business Process Instances
Predicting the completion time of business process instances would be a very
helpful aid when managing processes under service level agreement constraints.
The ability to know in advance the trend of running process instances would
allow business managers to react in time, in order to prevent delays or
undesirable situations. However, making such accurate forecasts is not easy:
many factors may influence the required time to complete a process instance. In
this paper, we propose an approach based on deep Recurrent Neural Networks
(specifically LSTMs) that is able to exploit arbitrary information associated
to single events, in order to produce an as-accurate-as-possible prediction of
the completion time of running instances. Experiments on real-world datasets
confirm the quality of our proposal.Comment: Article accepted for publication in 2017 IEEE Symposium on Deep
Learning (IEEE DL'17) @ SSC
A tree-based kernel for graphs with continuous attributes
The availability of graph data with node attributes that can be either
discrete or real-valued is constantly increasing. While existing kernel methods
are effective techniques for dealing with graphs having discrete node labels,
their adaptation to non-discrete or continuous node attributes has been
limited, mainly for computational issues. Recently, a few kernels especially
tailored for this domain, and that trade predictive performance for
computational efficiency, have been proposed. In this paper, we propose a graph
kernel for complex and continuous nodes' attributes, whose features are tree
structures extracted from specific graph visits. The kernel manages to keep the
same complexity of state-of-the-art kernels while implicitly using a larger
feature space. We further present an approximated variant of the kernel which
reduces its complexity significantly. Experimental results obtained on six
real-world datasets show that the kernel is the best performing one on most of
them. Moreover, in most cases the approximated version reaches comparable
performances to current state-of-the-art kernels in terms of classification
accuracy while greatly shortening the running times.Comment: This work has been submitted to the IEEE Transactions on Neural
Networks and Learning Systems for possible publication. Copyright may be
transferred without notice, after which this version may no longer be
accessibl
An Empirical Study on Budget-Aware Online Kernel Algorithms for Streams of Graphs
Kernel methods are considered an effective technique for on-line learning.
Many approaches have been developed for compactly representing the dual
solution of a kernel method when the problem imposes memory constraints.
However, in literature no work is specifically tailored to streams of graphs.
Motivated by the fact that the size of the feature space representation of many
state-of-the-art graph kernels is relatively small and thus it is explicitly
computable, we study whether executing kernel algorithms in the feature space
can be more effective than the classical dual approach. We study three
different algorithms and various strategies for managing the budget. Efficiency
and efficacy of the proposed approaches are experimentally assessed on
relatively large graph streams exhibiting concept drift. It turns out that,
when strict memory budget constraints have to be enforced, working in feature
space, given the current state of the art on graph kernels, is more than a
viable alternative to dual approaches, both in terms of speed and
classification performance.Comment: Author's version of the manuscript, to appear in Neurocomputing
(ELSEVIER
A Systematic Assessment of Deep Learning Models for Molecule Generation
In recent years the scientific community has devoted much effort in the
development of deep learning models for the generation of new molecules with
desirable properties (i.e. drugs). This has produced many proposals in
literature. However, a systematic comparison among the different VAE methods is
still missing. For this reason, we propose an extensive testbed for the
evaluation of generative models for drug discovery, and we present the results
obtained by many of the models proposed in literature
Conditional Constrained Graph Variational Autoencoders for Molecule Design
In recent years, deep generative models for graphs have been used to generate
new molecules. These models have produced good results, leading to several
proposals in the literature. However, these models may have troubles learning
some of the complex laws governing the chemical world. In this work, we explore
the usage of the histogram of atom valences to drive the generation of
molecules in such models. We present Conditional Constrained Graph Variational
Autoencoder (CCGVAE), a model that implements this key-idea in a
state-of-the-art model, and shows improved results on several evaluation
metrics on two commonly adopted datasets for molecule generation
On Filter Size in Graph Convolutional Networks
Recently, many researchers have been focusing on the definition of neural
networks for graphs. The basic component for many of these approaches remains
the graph convolution idea proposed almost a decade ago. In this paper, we
extend this basic component, following an intuition derived from the well-known
convolutional filters over multi-dimensional tensors. In particular, we derive
a simple, efficient and effective way to introduce a hyper-parameter on graph
convolutions that influences the filter size, i.e. its receptive field over the
considered graph. We show with experimental results on real-world graph
datasets that the proposed graph convolutional filter improves the predictive
performance of Deep Graph Convolutional Networks.Comment: arXiv admin note: text overlap with arXiv:1811.0693
Scuba:Scalable kernel-based gene prioritization
Abstract Background The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. Results We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Conclusions Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba
A machine-learning based bio-psycho-social model for the prediction of non-obstructive and obstructive coronary artery disease
Background: Mechanisms of myocardial ischemia in obstructive and non-obstructive coronary artery disease (CAD), and the interplay between clinical, functional, biological and psycho-social features, are still far to be fully elucidated. Objectives: To develop a machine-learning (ML) model for the supervised prediction of obstructive versus non-obstructive CAD. Methods: From the EVA study, we analysed adults hospitalized for IHD undergoing conventional coronary angiography (CCA). Non-obstructive CAD was defined by a stenosis < 50% in one or more vessels. Baseline clinical and psycho-socio-cultural characteristics were used for computing a Rockwood and Mitnitski frailty index, and a gender score according to GENESIS-PRAXY methodology. Serum concentration of inflammatory cytokines was measured with a multiplex flow cytometry assay. Through an XGBoost classifier combined with an explainable artificial intelligence tool (SHAP), we identified the most influential features in discriminating obstructive versus non-obstructive CAD. Results: Among the overall EVA cohort (n = 509), 311 individuals (mean age 67 ± 11 years, 38% females; 67% obstructive CAD) with complete data were analysed. The ML-based model (83% accuracy and 87% precision) showed that while obstructive CAD was associated with higher frailty index, older age and a cytokine signature characterized by IL-1β, IL-12p70 and IL-33, non-obstructive CAD was associated with a higher gender score (i.e., social characteristics traditionally ascribed to women) and with a cytokine signature characterized by IL-18, IL-8, IL-23. Conclusions: Integrating clinical, biological, and psycho-social features, we have optimized a sex- and gender-unbiased model that discriminates obstructive and non-obstructive CAD. Further mechanistic studies will shed light on the biological plausibility of these associations. Clinical trial registration: NCT02737982
Serum Albumin Is Inversely Associated With Portal Vein Thrombosis in Cirrhosis
We analyzed whether serum albumin is independently associated with portal vein thrombosis (PVT) in liver cirrhosis (LC) and if a biologic plausibility exists. This study was divided into three parts. In part 1 (retrospective analysis), 753 consecutive patients with LC with ultrasound-detected PVT were retrospectively analyzed. In part 2, 112 patients with LC and 56 matched controls were entered in the cross-sectional study. In part 3, 5 patients with cirrhosis were entered in the in vivo study and 4 healthy subjects (HSs) were entered in the in vitro study to explore if albumin may affect platelet activation by modulating oxidative stress. In the 753 patients with LC, the prevalence of PVT was 16.7%; logistic analysis showed that only age (odds ratio [OR], 1.024; P = 0.012) and serum albumin (OR, -0.422; P = 0.0001) significantly predicted patients with PVT. Analyzing the 112 patients with LC and controls, soluble clusters of differentiation (CD)40-ligand (P = 0.0238), soluble Nox2-derived peptide (sNox2-dp; P < 0.0001), and urinary excretion of isoprostanes (P = 0.0078) were higher in patients with LC. In LC, albumin was correlated with sCD4OL (Spearman's rank correlation coefficient [r(s)], -0.33; P < 0.001), sNox2-dp (r(s), -0.57; P < 0.0001), and urinary excretion of isoprostanes (r(s), -0.48; P < 0.0001) levels. The in vivo study showed a progressive decrease in platelet aggregation, sNox2-dp, and urinary 8-iso prostaglandin F2 alpha-III formation 2 hours and 3 days after albumin infusion. Finally, platelet aggregation, sNox2-dp, and isoprostane formation significantly decreased in platelets from HSs incubated with scalar concentrations of albumin. Conclusion: Low serum albumin in LC is associated with PVT, suggesting that albumin could be a modulator of the hemostatic system through interference with mechanisms regulating platelet activation