16,385 research outputs found
Incremental Predictive Process Monitoring: How to Deal with the Variability of Real Environments
A characteristic of existing predictive process monitoring techniques is to
first construct a predictive model based on past process executions, and then
use it to predict the future of new ongoing cases, without the possibility of
updating it with new cases when they complete their execution. This can make
predictive process monitoring too rigid to deal with the variability of
processes working in real environments that continuously evolve and/or exhibit
new variant behaviors over time. As a solution to this problem, we propose the
use of algorithms that allow the incremental construction of the predictive
model. These incremental learning algorithms update the model whenever new
cases become available so that the predictive model evolves over time to fit
the current circumstances. The algorithms have been implemented using different
case encoding strategies and evaluated on a number of real and synthetic
datasets. The results provide a first evidence of the potential of incremental
learning strategies for predicting process monitoring in real environments, and
of the impact of different case encoding strategies in this setting
PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach
The problem of evaluating the performance of soccer players is attracting the
interest of many companies and the scientific community, thanks to the
availability of massive data capturing all the events generated during a match
(e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated
and widely accepted metric for measuring performance quality in all of its
facets. In this paper, we design and implement PlayeRank, a data-driven
framework that offers a principled multi-dimensional and role-aware evaluation
of the performance of soccer players. We build our framework by deploying a
massive dataset of soccer-logs and consisting of millions of match events
pertaining to four seasons of 18 prominent soccer competitions. By comparing
PlayeRank to known algorithms for performance evaluation in soccer, and by
exploiting a dataset of players' evaluations made by professional soccer
scouts, we show that PlayeRank significantly outperforms the competitors. We
also explore the ratings produced by {\sf PlayeRank} and discover interesting
patterns about the nature of excellent performances and what distinguishes the
top players from the others. At the end, we explore some applications of
PlayeRank -- i.e. searching players and player versatility --- showing its
flexibility and efficiency, which makes it worth to be used in the design of a
scalable platform for soccer analytics
Explanation-Based Auditing
To comply with emerging privacy laws and regulations, it has become common
for applications like electronic health records systems (EHRs) to collect
access logs, which record each time a user (e.g., a hospital employee) accesses
a piece of sensitive data (e.g., a patient record). Using the access log, it is
easy to answer simple queries (e.g., Who accessed Alice's medical record?), but
this often does not provide enough information. In addition to learning who
accessed their medical records, patients will likely want to understand why
each access occurred. In this paper, we introduce the problem of generating
explanations for individual records in an access log. The problem is motivated
by user-centric auditing applications, and it also provides a novel approach to
misuse detection. We develop a framework for modeling explanations which is
based on a fundamental observation: For certain classes of databases, including
EHRs, the reason for most data accesses can be inferred from data stored
elsewhere in the database. For example, if Alice has an appointment with Dr.
Dave, this information is stored in the database, and it explains why Dr. Dave
looked at Alice's record. Large numbers of data accesses can be explained using
general forms called explanation templates. Rather than requiring an
administrator to manually specify explanation templates, we propose a set of
algorithms for automatically discovering frequent templates from the database
(i.e., those that explain a large number of accesses). We also propose
techniques for inferring collaborative user groups, which can be used to
enhance the quality of the discovered explanations. Finally, we have evaluated
our proposed techniques using an access log and data from the University of
Michigan Health System. Our results demonstrate that in practice we can provide
explanations for over 94% of data accesses in the log.Comment: VLDB201
- …