Search CORE

64,482 research outputs found

Native Directly Follows Operator

Author: Dijkman Remco M.
Syamsiyah Alifah
van Dongen Boudewijn F.
Publication venue
Publication date: 05/06/2018
Field of study

Typical legacy information systems store data in relational databases. Process mining is a research discipline that analyzes this data to obtain insights into processes. Many different process mining techniques can be applied to data. In current techniques, an XES event log serves as a basis for analysis. However, because of the static characteristic of an XES event log, we need to create one XES file for each process mining question, which leads to overhead and inflexibility. As an alternative, people attempt to perform process mining directly on the data source using so-called intermediate structures. In previous work, we investigated methods to build intermediate structures on source data by executing a basic SQL query on the database. However, the nested form in the SQL query can cause performance issues on the database side. Therefore, in this paper, we propose a native SQL operator for direct process discovery on relational databases. We define a native operator for the simplest form of the intermediate structure, called the "directly follows relation". This approach has been evaluated with big event data and the experimental results show that it performs faster than the state-of-the-art of database approaches.Comment: 12 page

arXiv.org e-Print Archive

Pure OAI Repository

Relational Algebra for In-Database Process Mining

Author: Dijkman Remco
Gao Juntao
Grefen Paul
ter Hofstede Arthur
Publication venue
Publication date: 26/06/2017
Field of study

The execution logs that are used for process mining in practice are often obtained by querying an operational database and storing the result in a flat file. Consequently, the data processing power of the database system cannot be used anymore for this information, leading to constrained flexibility in the definition of mining patterns and limited execution performance in mining large logs. Enabling process mining directly on a database - instead of via intermediate storage in a flat file - therefore provides additional flexibility and efficiency. To help facilitate this ideal of in-database process mining, this paper formally defines a database operator that extracts the 'directly follows' relation from an operational database. This operator can both be used to do in-database process mining and to flexibly evaluate process mining related queries, such as: "which employee most frequently changes the 'amount' attribute of a case from one task to the next". We define the operator using the well-known relational algebra that forms the formal underpinning of relational databases. We formally prove equivalence properties of the operator that are useful for query optimization and present time-complexity properties of the operator. By doing so this paper formally defines the necessary relational algebraic elements of a 'directly follows' operator, which are required for implementation of such an operator in a DBMS

arXiv.org e-Print Archive

Pure OAI Repository

Towards Efficient Path Query on Social Network with Hybrid RDF Management

Author: Chen Wei
Gai Lei
Qiu Changhe
Wang Tengjiao
Xu Zhichao
Publication venue
Publication date: 01/01/2014
Field of study

The scalability and exibility of Resource Description Framework(RDF) model make it ideally suited for representing online social networks(OSN). One basic operation in OSN is to find chains of relations,such as k-Hop friends. Property path query in SPARQL can express this type of operation, but its implementation suffers from performance problem considering the ever growing data size and complexity of OSN.In this paper, we present a main memory/disk based hybrid RDF data management framework for efficient property path query. In this hybrid framework, we realize an efficient in-memory algebra operator for property path query using graph traversal, and estimate the cost of this operator to cooperate with existing cost-based optimization. Experiments on benchmark and real dataset demonstrated that our approach can achieve a good tradeoff between data load expense and online query performance

arXiv.org e-Print Archive

Crossref

Impliance: A Next Generation Information Management Appliance

Author: Bhattacharjee Bishwaranjan
Ercegovac Vuk
Glider Joseph
Golding Richard
Lohman Guy
Markl Volke
Pirahesh Hamid
Rao Jun
Rees Robert
Reiss Frederick
Shekita Eugene
Swart Garret
Publication venue
Publication date: 22/12/2006
Field of study

ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

arXiv.org e-Print Archive

CiteSeerX

A layered framework for pattern-based ontology evolution

Author: Abgaz Yalemisew
Javed Muhammad
Pahl Claus
Publication venue
Publication date: 01/01/2011
Field of study

The challenge of ontology-driven modelling of information components is well known in both academia and industry. In this paper, we present a novel approach to deal with customisation and abstraction of ontology-based model evolution. As a result of an empirical study, we identify a layered change operator framework based on the granularity, domain-speciﬁcity and abstraction of changes. The implementation of the operator framework is supported through layered change logs. Layered change logs capture the objective of ontology changes at a higher level of granularity and support a comprehensive understanding of ontology evolution. The layered change logs are formalised using a graph-based approach. We identify the recurrent ontology change patterns from an ontology change log for their reuse. The identiﬁed patterns facilitate optimizing and improving the deﬁnition of domain-speciﬁc change patterns

CiteSeerX

Irish Universities

DCU Online Research Access Service