42,878 research outputs found
Incremental Predictive Process Monitoring: How to Deal with the Variability of Real Environments
A characteristic of existing predictive process monitoring techniques is to
first construct a predictive model based on past process executions, and then
use it to predict the future of new ongoing cases, without the possibility of
updating it with new cases when they complete their execution. This can make
predictive process monitoring too rigid to deal with the variability of
processes working in real environments that continuously evolve and/or exhibit
new variant behaviors over time. As a solution to this problem, we propose the
use of algorithms that allow the incremental construction of the predictive
model. These incremental learning algorithms update the model whenever new
cases become available so that the predictive model evolves over time to fit
the current circumstances. The algorithms have been implemented using different
case encoding strategies and evaluated on a number of real and synthetic
datasets. The results provide a first evidence of the potential of incremental
learning strategies for predicting process monitoring in real environments, and
of the impact of different case encoding strategies in this setting
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
Evaluation of recommender systems in streaming environments
Evaluation of recommender systems is typically done with finite datasets.
This means that conventional evaluation methodologies are only applicable in
offline experiments, where data and models are stationary. However, in real
world systems, user feedback is continuously generated, at unpredictable rates.
Given this setting, one important issue is how to evaluate algorithms in such a
streaming data environment. In this paper we propose a prequential evaluation
protocol for recommender systems, suitable for streaming data environments, but
also applicable in stationary settings. Using this protocol we are able to
monitor the evolution of algorithms' accuracy over time. Furthermore, we are
able to perform reliable comparative assessments of algorithms by computing
significance tests over a sliding window. We argue that besides being suitable
for streaming data, prequential evaluation allows the detection of phenomena
that would otherwise remain unnoticed in the evaluation of both offline and
online recommender systems.Comment: Workshop on 'Recommender Systems Evaluation: Dimensions and Design'
(REDD 2014), held in conjunction with RecSys 2014. October 10, 2014, Silicon
Valley, United State
Logical analysis of data as a tool for the analysis of probabilistic discrete choice behavior
Probabilistic Discrete Choice Models (PDCM) have been extensively used to interpret the behavior of heterogeneous decision makers that face discrete alternatives. The classification approach of Logical Analysis of Data (LAD) uses discrete optimization to generate patterns, which are logic formulas characterizing the different classes. Patterns can be seen as rules explaining the phenomenon under analysis. In this work we discuss how LAD can be used as the first phase of the specification of PDCM. Since in this task the number of patterns generated may be extremely large, and many of them may be nearly equivalent, additional processing is necessary to obtain practically meaningful information. Hence, we propose computationally viable techniques to obtain small sets of patterns that constitute meaningful representations of the phenomenon and allow to discover significant associations between subsets of explanatory variables and the output. We consider the complex socio-economic problem of the analysis of the utilization of the Internet in Italy, using real data gathered by the Italian National Institute of Statistics
- …