Search CORE

53,066 research outputs found

Reverse Engineering Databases for Knowledge Discovery

Author: Stephen Mc Kearney
Publication venue
Publication date: 11/04/2020
Field of study

Abstract Many data mining tools cannot be used directly to analyze the complex sets of relations which are found in large database systems. In our experience, data miners rely on a well-defined data model, or the knowledge of a data expert, to isolate and extract candidate data sets prior to mining the data. For many databases, typically large legacy systems, a reliable data model is often unavailable and access to the data expert can be limited. In this paper we use reverse engineering techniques to infer a model of the database. Reverse engineering a database can be seen as knowledge discovery in its own right and the resulting data model may be made available to data mining tools as background knowledge. In addition, minable data sets can be produced from the inferred data model and analyzed using conventional data mining tools. Our approach reduces the data miner's reliance on a well-defmed data model and the data expert

CiteSeerX

Mining Target-Oriented Sequential Patterns with Time-Intervals

Author: Chueh Hao-En
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 05/09/2010
Field of study

A target-oriented sequential pattern is a sequential pattern with a concerned itemset in the end of pattern. A time-interval sequential pattern is a sequential pattern with time-intervals between every pair of successive itemsets. In this paper we present an algorithm to discover target-oriented sequential pattern with time-intervals. To this end, the original sequences are reversed so that the last itemsets can be arranged in front of the sequences. The contrasts between reversed sequences and the concerned itemset are then used to exclude the irrelevant sequences. Clustering analysis is used with typical sequential pattern mining algorithm to extract the sequential patterns with time-intervals between successive itemsets. Finally, the discovered time-interval sequential patterns are reversed again to the original order for searching the target patterns.Comment: 11 pages, 9 table

arXiv.org e-Print Archive

Crossref

A Local Density-Based Approach for Local Outlier Detection

Author: He Haibo
Tang Bo
Publication venue
Publication date: 27/06/2016
Field of study

This paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Density-based Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of using only

k

nearest neighbors, we further consider reverse nearest neighbors and shared nearest neighbors of an object for density distribution estimation. Some theoretical properties of the proposed RDOS including its expected value and false alarm probability are derived. A comprehensive experimental study on both synthetic and real-life data sets demonstrates that our approach is more effective than state-of-the-art outlier detection methods.Comment: 22 pages, 14 figures, submitted to Pattern Recognition Letter

arXiv.org e-Print Archive

Crossref

DigitalCommons@URI

SQL Query Completion for Data Exploration

Author: Guilly Marie Le
Petit Jean-Marc
Scuturici Vasile-Marian
Publication venue
Publication date: 07/02/2018
Field of study

Within the big data tsunami, relational databases and SQL are still there and remain mandatory in most of cases for accessing data. On the one hand, SQL is easy-to-use by non specialists and allows to identify pertinent initial data at the very beginning of the data exploration process. On the other hand, it is not always so easy to formulate SQL queries: nowadays, it is more and more frequent to have several databases available for one application domain, some of them with hundreds of tables and/or attributes. Identifying the pertinent conditions to select the desired data, or even identifying relevant attributes is far from trivial. To make it easier to write SQL queries, we propose the notion of SQL query completion: given a query, it suggests additional conditions to be added to its WHERE clause. This completion is semantic, as it relies on the data from the database, unlike current completion tools that are mostly syntactic. Since the process can be repeated over and over again -- until the data analyst reaches her data of interest --, SQL query completion facilitates the exploration of databases. SQL query completion has been implemented in a SQL editor on top of a database management system. For the evaluation, two questions need to be studied: first, does the completion speed up the writing of SQL queries? Second , is the completion easily adopted by users? A thorough experiment has been conducted on a group of 70 computer science students divided in two groups (one with the completion and the other one without) to answer those questions. The results are positive and very promising

arXiv.org e-Print Archive

HAL

Hal-Diderot

Semantic business process management: a vision towards using semantic web services for business process management

Author: Bussler Chris
Domingue John
Fensel Dieter
Hepp Martin
Leymann Frank
Wahler Alexander
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Business process management (BPM) is the approach to manage the execution of IT-supported business operations from a business expert's view rather than from a technical perspective. However, the degree of mechanization in BPM is still very limited, creating inertia in the necessary evolution and dynamics of business processes, and BPM does not provide a truly unified view on the process space of an organization. We trace back the problem of mechanization of BPM to an ontological one, i.e. the lack of machine-accessible semantics, and argue that the modeling constructs of semantic Web services frameworks, especially WSMO, are a natural fit to creating such a representation. As a consequence, we propose to combine SWS and BPM and create one consolidated technology, which we call semantic business process management (SBPM

CiteSeerX

Open Research Online (The Open University)