Search CORE

5,670 research outputs found

A new approach for discovering business process models from event logs.

Author: Baesens Bart
Goedertier Stijn
Haesen Raf
Martens David
Vanthienen Jan
Publication venue
Publication date
Field of study

Process mining is the automated acquisition of process models from the event logs of information systems. Although process mining has many useful applications, not all inherent difficulties have been sufficiently solved. A first difficulty is that process mining is often limited to a setting of non-supervised learnings since negative information is often not available. Moreover, state transitions in processes are often dependent on the traversed path, which limits the appropriateness of search techniques based on local information in the event log. Another difficulty is that case data and resource properties that can also influence state transitions are time-varying properties, such that they cannot be considered ascross-sectional.This article investigates the use of first-order, ILP classification learners for process mining and describes techniques for dealing with each of the above mentioned difficulties. To make process mining a supervised learning task, we propose to include negative events in the event log. When event logs contain no negative information, a technique is described to add artificial negative examples to a process log. To capture history-dependent behavior the article proposes to take advantage of the multi-relational nature of ILP classification learners. Multi-relational process mining allows to search for patterns among multiple event rows in the event log, effectively basing its search on global information. To deal with time-varying case data and resource properties, a closed-world version of the Event Calculus has to be added as background knowledge, transforming the event log effectively in a temporal database. First experiments on synthetic event logs show that first-order classification learners are capable of predicting the behavior with high accuracy, even under conditions of noise.Credit; Credit scoring; Models; Model; Applications; Performance; Space; Decision; Yield; Real life; Risk; Evaluation; Rules; Neural networks; Networks; Classification; Research; Business; Processes; Event; Information; Information systems; Systems; Learning; Data; Behavior; Patterns; IT; Event calculus; Knowledge; Database; Noise;

Research Papers in Economics

Genetic Programming for Automating the Development of Data Management Algorithms in Information Technology Systems

Author: Fernando J. Von Zuben
Gabriel A. Archanjo
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

University of Helsinki Department of Computer Science Annual Report 1998

Author
Publication venue: University of Helsinki, Department of Computer Science
Publication date: 01/01/1999
Field of study

Helsingin yliopiston digitaalinen arkisto

Data mining in soft computing framework: a survey

Author: Mitra P.
Mitra S.
Pal S. K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included

The use of data-mining for the automatic formation of tactics

Author: Bundy A.
Duncan H.
Levine J.
Pollet M.
Storkey A.
Publication venue
Publication date: 01/07/2004
Field of study

This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques

University of Strathclyde Institutional Repository

Recommended from our members

Automated synthesis of data extraction and transformation programs

Author: Yaghmazadeh Navid
Publication venue
Publication date: 27/08/2018
Field of study

Due to the abundance of data in today’s data-rich world, end-users increasingly need to perform various data extraction and transformation tasks. While many of these tedious tasks can be performed in a programmatic way, most end-users lack the required programming expertise to automate them and end up spending their valuable time in manually performing various data- related tasks. The field of program synthesis aims to overcome this problem by automatically generating programs from informal specifications, such as input-output examples or natural language. This dissertation focuses on the design and implementation of new systems for automating important classes of data transformation and extraction tasks. It introduces solutions for automating data manipulation tasks on fully- structured data formats like relational tables, or on semi-structured formats such as XML and JSON documents. First, we describe a novel algorithm for synthesizing hierarchical data transformations from input-output examples. A key novelty of our approach is that it reduces the synthesis of tree transformations to the simpler problem of synthesizing transformations over the paths of the tree. We also describe a new and effective algorithm for learning path transformations that combines logical SMT-based reasoning with machine learning techniques based on decision trees. Next, we present a new methodology for learning programs that migrate tree-structured documents to relational table representations from input-output examples. Our approach achieves its goal by decomposing the synthesis task to two subproblems of (A) learning the column extraction logic, and (B) learning the row extraction logic. We propose a technique for learning column extraction programs using deterministic finite automata, and a new algorithm for predicate learning which combines integer linear programing and logic minimization. Finally, we address the problem of automating data extraction tasks from natural language. Specifically, we focus on data retrieval from relational databases and describe a novel approach for learning SQL queries from English descriptions. The method we describe is fully automatic and database-agnostic (i.e., does not require customization for each database). Our method combines semantic parsing techniques from the NLP community with novel programming languages ideas involving probabilistic type inhabitation and automated sketch repair.Computer Science

Texas ScholarWorks

The Center for Eukaryotic Structural Genomics

Author: BA Griffin
Brian F. Volkman
Brian G. Fox
CJ Oldfield
Craig A. Bingman
DA Vinarov
DA Vinarov
DA Vinarov
David J. Aceti
EJ Levin
F DiMaio
Frank C. Vojtik
FW Studier
G Cornilescu
George N. Phillips
HE Klock
HK Sreenath
HR Eghbalnia
HR Eghbalnia
HR Eghbalnia
JC Norvell
John G. Primm
John L. Markley
Karl W. Nichols
L Wang
L Wang
L Wang
M Takeda
P Sobrado
PG Blommel
PG Blommel
PG Blommel
RC Tyler
RC Tyler
RM Bannen
RO Frederick
Ronnie O. Frederick
Russell L. Wrobel
S Thao
Sarata C. Sahu
Shin-ichi Makino
T Borggrefe
WB Jeon
X Pan
Z Zolnai
Zsolt Zolnai
Publication venue: Springer Netherlands
Publication date: 01/01/2009
Field of study

The Center for Eukaryotic Structural Genomics (CESG) is a “specialized” or “technology development” center supported by the Protein Structure Initiative (PSI). CESG’s mission is to develop improved methods for the high-throughput solution of structures from eukaryotic proteins, with a very strong weighting toward human proteins of biomedical relevance. During the first three years of PSI-2, CESG selected targets representing 601 proteins from Homo sapiens, 33 from mouse, 10 from rat, 139 from Galdieria sulphuraria, 35 from Arabidopsis thaliana, 96 from Cyanidioschyzon merolae, 80 from Plasmodium falciparum, 24 from yeast, and about 25 from other eukaryotes. Notably, 30% of all structures of human proteins solved by the PSI Centers were determined at CESG. Whereas eukaryotic proteins generally are considered to be much more challenging targets than prokaryotic proteins, the technology now in place at CESG yields success rates that are comparable to those of the large production centers that work primarily on prokaryotic proteins. We describe here the technological innovations that underlie CESG’s platforms for bioinformatics and laboratory information management, target selection, protein production, and structure determination by X-ray crystallography or NMR spectroscopy

Crossref

Springer - Publisher Connector

PubMed Central