4 research outputs found
Äriprotsessi tulemuste ennustav ja korralduslik seire
Viimastel aastatel on erinevates valdkondades tegutsevad ettevõtted üles näidanud kasvavat huvi masinõppel põhinevate rakenduste kasutusele võtmiseks. Muuhulgas otsitakse võimalusi oma äriprotsesside efektiivsuse tõstmiseks, kasutades ennustusmudeleid protsesside jooksvaks seireks. Sellised ennustava protsessiseire meetodid võtavad sisendiks sündmuslogi, mis koosneb hulgast lõpetatud äriprotsessi juhtumite sündmusjadadest, ning kasutavad masinõppe algoritme ennustusmudelite treenimiseks. Saadud mudelid teevad ennustusi lõpetamata (antud ajahetkel aktiivsete) protsessijuhtumite jaoks, võttes sisendiks sündmuste jada, mis selle hetkeni on toimunud ning ennustades kas järgmist sündmust antud juhtumis, juhtumi lõppemiseni jäänud aega või instantsi lõpptulemust. Lõpptulemusele orienteeritud ennustava protsessiseire meetodid keskenduvad ennustamisele, kas protsessijuhtum lõppeb soovitud või ebasoovitava lõpptulemusega. Süsteemi kasutaja saab ennustuste alusel otsustada, kas sekkuda antud protsessijuhtumisse või mitte, eesmärgiga ära hoida ebasoovitavat lõpptulemust või leevendada selle negatiivseid tagajärgi. Erinevalt puhtalt ennustavatest süsteemidest annavad korralduslikud protsessiseire meetodid kasutajale ka soovitusi, kas ja kuidas antud juhtumisse sekkuda, eesmärgiga optimeerida mingit kindlat kasulikkusfunktsiooni. Käesolev doktoritöö uurib, kuidas treenida, hinnata ja kasutada ennustusmudeleid äriprotsesside lõpptulemuste ennustava ja korraldusliku seire raames. Doktoritöö pakub välja taksonoomia olemasolevate meetodite klassifitseerimiseks ja võrdleb neid katseliselt. Lisaks pakub töö välja raamistiku tekstiliste andmete kasutamiseks antud ennustusmudelites. Samuti pakume välja ennustuste ajalise stabiilsuse mõiste ning koostame raamistiku korralduslikuks protsessiseireks, mis annab kasutajatele soovitusi, kas protsessi sekkuda või mitte. Katsed näitavad, et väljapakutud lahendused täiendavad olemasolevaid meetodeid ning aitavad kaasa ennustava protsessiseire süsteemide rakendamisele reaalsetes süsteemides.Recent years have witnessed a growing adoption of machine learning techniques for business improvement across various fields. Among other emerging applications, organizations are exploiting opportunities to improve the performance of their business processes by using predictive models for runtime monitoring. Such predictive process monitoring techniques take an event log (a set of completed business process execution traces) as input and use machine learning techniques to train predictive models. At runtime, these techniques predict either the next event, the remaining time, or the final outcome of an ongoing case, given its incomplete execution trace consisting of the events performed up to the present moment in the given case. In particular, a family of techniques called outcome-oriented predictive process monitoring focuses on predicting whether a case will end with a desired or an undesired outcome. The user of the system can use the predictions to decide whether or not to intervene, with the purpose of preventing an undesired outcome or mitigating its negative effects. Prescriptive process monitoring systems go beyond purely predictive ones, by not only generating predictions but also advising the user if and how to intervene in a running case in order to optimize a given utility function. This thesis addresses the question of how to train, evaluate, and use predictive models for predictive and prescriptive monitoring of business process outcomes. The thesis proposes a taxonomy and performs a comparative experimental evaluation of existing techniques in the field. Moreover, we propose a framework for incorporating textual data to predictive monitoring systems. We introduce the notion of temporal stability to evaluate these systems and propose a prescriptive process monitoring framework for advising users if and how to act upon the predictions. The results suggest that the proposed solutions complement the existing techniques and can be useful for practitioners in implementing predictive process monitoring systems in real life
Automatic refinement of large-scale cross-domain knowledge graphs
Knowledge graphs are a way to represent complex structured and unstructured information
integrated into an ontology, with which one can reason about the existing
information to deduce new information or highlight inconsistencies. Knowledge
graphs are divided into the terminology box (TBox), also known as ontology, and
the assertions box (ABox). The former consists of a set of schema axioms defining
classes and properties which describe the data domain. Whereas the ABox consists
of a set of facts describing instances in terms of the TBox vocabulary.
In the recent years, there have been several initiatives for creating large-scale
cross-domain knowledge graphs, both free and commercial, with DBpedia, YAGO,
and Wikidata being amongst the most successful free datasets. Those graphs are
often constructed with the extraction of information from semi-structured knowledge,
such as Wikipedia, or unstructured text from the web using NLP methods. It
is unlikely, in particular when heuristic methods are applied and unreliable sources
are used, that the knowledge graph is fully correct or complete. There is a tradeoff
between completeness and correctness, which is addressed differently in each
knowledge graph’s construction approach.
There is a wide variety of applications for knowledge graphs, e.g. semantic
search and discovery, question answering, recommender systems, expert systems
and personal assistants. The quality of a knowledge graph is crucial for its applications.
In order to further increase the quality of such large-scale knowledge graphs,
various automatic refinement methods have been proposed. Those methods try to
infer and add missing knowledge to the graph, or detect erroneous pieces of information.
In this thesis, we investigate the problem of automatic knowledge graph
refinement and propose methods that address the problem from two directions, automatic
refinement of the TBox and of the ABox.
In Part I we address the ABox refinement problem. We propose a method for
predicting missing type assertions using hierarchical multilabel classifiers and ingoing/
outgoing links as features. We also present an approach to detection of relation
assertion errors which exploits type and path patterns in the graph. Moreover,
we propose an approach to correction of relation errors originating from confusions
between entities. Also in the ABox refinement direction, we propose a knowledge
graph model and process for synthesizing knowledge graphs for benchmarking
ABox completion methods.
In Part II we address the TBox refinement problem. We propose methods for inducing flexible relation constraints from the ABox, which are expressed using
SHACL.We introduce an ILP refinement step which exploits correlations between
numerical attributes and relations in order to the efficiently learn Horn rules with
numerical attributes. Finally, we investigate the introduction of lexical information
from textual corpora into the ILP algorithm in order to improve quality of induced
class expressions