3 research outputs found
On the use of domain knowledge for process model repair
Process models are important for supporting organizations in documenting, understanding and monitoring their business. When
these process models become outdated, they need to be revised to accurately describe the new status quo of the processes
in the organization. Process model repair techniques help at automatically revising the existing model from behavior traced
in event logs. So far, such techniques have focused on identifying which parts of the model to change and how to change
them, but they do not use knowledge from practitioners to inform the revision. As a consequence, fragments of the model
may change in a way that defies existing regulations or represents outdated information that was wrongly considered from the
event log. This paper uses concepts from theory revision to provide formal foundations for process model repair that exploits
domain knowledge. Specifically, it conceptualizes (1) what are unchangeable fragments in the model and (2) the role that
various traces in the event log should play when it comes to model repair. A scenario of use is presented that demonstrates
the benefits of this conceptualization. The current state of existing process model repair techniques is compared against the
proposed concepts. The results show that only two existing techniques partially consider the concepts presented in this paper
for model repair.Peer Reviewe
Probabilistic Models for Scalable Knowledge Graph Construction
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy
extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph