Search CORE

2,080 research outputs found

Discovering duplicate tasks in transition systems for the simplification of process models

Author: A Adriansyah
A Burattin
BF Dongen van
D Fahland
ER Gansner
J Carmona
J De San Pedro
J Herbst
J Li
JA Carmona
JCAM Buijs
JL Song
S Goedertier
SC Johnson
SJJ Leemans
T Murata
W Aalst van der
WMP Aalst van der
WMP Aalst van der
WMP Aalst van der
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This work presents a set of methods to improve the understandability of process models. Traditionally, simplification methods trade off quality metrics, such as fitness or precision. Conversely, the methods proposed in this paper produce simplified models while preserving or even increasing fidelity metrics. The first problem addressed in the paper is the discovery of duplicate tasks. A new method is proposed that avoids overfitting by working on the transition system generated by the log. The method is able to discover duplicate tasks even in the presence of concurrency and choice. The second problem is the structural simplification of the model by identifying optional and repetitive tasks. The tasks are substituted by annotated events that allow the removal of silent tasks and reduce the complexity of the model. An important feature of the methods proposed in this paper is that they are independent from the actual miner used for process discovery.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

An Optimal Approach for Mining Rare Causal Associations to Detect ADR Signal Pairs

Author
Publication venue
Publication date
Field of study

Abstract- Adverse Drug Reaction (ADR) is one of the most important issues in the assessment of drug safety. In fact, many adverse drug reactions are not discovered during limited premarketing clinical trials; instead, they are only observed after long term post-marketing surveillance of drug usage. In light of this, the detection of adverse drug reactions, as early as possible, is an important topic of research for the pharmaceutical industry. Recently, large numbers of adverse events and the development of data mining technology have motivated the development of statistical and data mining methods for the detection of ADRs. These stand-alone methods, with no integration into knowledge discovery systems, are tedious and inconvenient for users and the processes for exploration are time-consuming. This paper proposes an interactive system platform for the detection of ADRs. By integrating an ADR data warehouse and innovative data mining techniques, the proposed system not only supports OLAP style multidimensional analysis of ADRs, but also allows the interactive discovery of associations between drugs and symptoms, called a drug-ADR association rule, which can be further, developed using other factors of interest to the user, such as demographic information. The experiments indicate that interesting and valuable drug-ADR association rules can be efficiently mined. Index Terms- In this paper, we try to employ a knowledgebased approach to capture the degree of causality of an event pair within each sequence and we are going to match the data which was previously referred or suggested for treatment. � It is majorly used for Immediate Treatment for patients. However, mining the relationships between Drug and its Signal Reaction will be treated by In-Experienced Physician’

CiteSeerX

Artificial Intelligence and Soft Computing

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Knowledge Discovery in Databases: An Information Retrieval Perspective

Author: Ong C.
Publication venue
Publication date: 01/12/2000
Field of study

The current trend of increasing capabilities in data generation and collection has resulted in an urgent need for data mining applications, also called knowledge discovery in databases. This paper identifies and examines the issues involved in extracting useful grains of knowledge from large amounts of data. It describes a framework to categorise data mining systems. The author also gives an overview of the issues pertaining to data pre processing, as well as various information gathering methodologies and techniques. The paper covers some popular tools such as classification, clustering, and generalisation. A summary of statistical and machine learning techniques used currently is also provided

MPG.PuRe

AAPOR Report on Big Data

Author: Abe Usher
Cliff Lampe
Frauke Kreuter
Gisela Alvarez
Julia Lane
Lilli Japec
Marcus Berg
Naoki Fujita
Paul Biemer
Paul Decker
Publication venue: American Association For Public Opinion Research
Publication date: 02/02/2015
Field of study

In recent years we have seen an increase in the amount of statistics in society describing different phenomena based on so called Big Data. The term Big Data is used for a variety of data as explained in the report, many of them characterized not just by their large volume, but also by their variety and velocity, the organic way in which they are created, and the new types of processes needed to analyze them and make inference from them. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for survey research.There is a great potential in Big Data but there are some fundamental challenges that have to be resolved before its full potential can be realized. In this report we give examples of different types of Big Data and their potential for survey research. We also describe the Big Data process and discuss its main challenges

IssueLab

Order Flow and Exchange Rate Dynamics

Author: Martin D.D. Evans
Richard K. Lyons
Publication venue
Publication date
Field of study

Macroeconomic models of nominal exchange rates perform poorly. In sample, R2 statistics as high as 10 percent are rare. Out of sample, these models are typically out-forecast by a na‹ve random walk. This paper presents a model of a new kind. Instead of relying exclusively on macroeconomic determinants, the model includes a determinant from the field of microstructure-order flow. Order flow is the proximate determinant of price in all microstructure models. This is a radically different approach to exchange rate determination. It is also strikingly successful in accounting for realized rates. Our model of daily exchange-rate changes produces R2 statistics above 50 percent. Out of sample, our model produces significantly better short-horizon forecasts than a random walk. For the DM/

spot market as a whole, we find that

1 billion of net dollar purchases increases the DM price of a dollar by about 1 pfennig.

Research Papers in Economics

Community Participation, Teacher Effort, and Educational Outcome: The Case of El Salvador's EDUCO Program

Author: Yasuyuki Sawada
Publication venue
Publication date
Field of study

Based on a principal-agent model, this paper investigates the organizational structure that made the El Salvador's primary school decentralization program (EDUCO program) successful. First, we employ the "augmented" reduced form educational production function by incorporating parents and community involvement as a major organizational input. We observe consistently positive and statistically significant EDUCO participation effects on standardized test scores. Then we estimated teacher compensation function, teacher effort functions, and input demand functions by utilizing the theoretical implications of a principal (parental association)-agent (teacher) framework. While the EDUCO school teachers receive piece rate, depending on their performance, wage payment is relatively fixed in the traditional schools. Empirical results indicate that the slope of wage equation is positively affected by the degree of community participation. This finding can be interpreted as the optimal intensity of incentive. Hence, teacher's effort level in the traditional schools is consistently lower than that in the EDUCO schools, indicating a moral hazard problem. Community participation through parental group's classroom visits seems to enhance the teacher effort level and thus increases students' academic performance indirectly. Parental associations can affect not only teacher effort and their performance by imposing an appropriate incentive scheme but also school-level inputs by decentralized school management. Our empirical results support the view that decentralization of education system should involve delegation of school administration and teacher management to the community group.economic analysis of social sector reform, the optimal intensity of incentive condition, moral hazard, education production function, fixed effects instrumental variable estimation

Research Papers in Economics

GMM Estimation of Empirical Growth Models

Author: Anke Hoeffler
Jonathan Temple
Stephen Bond
Publication venue
Publication date
Field of study

This paper highlights a problem in using the first-difference GMM panel data estimator cross-country growth regressions. When the time series are persistent, the first-differenced GMM estimator can be poorly behaved, since lagged levels of the series provide only weak instruments for subsequent first-differences. Revisiting the work of Caselli, Esquivel and Lefort (1996), we show that this problem may be serious in practice. We suggest using a more efficient GMM estimator that exploits stationarity restrictions, and this approach is shown to give more reasonable results than first-differenced GMM in our estimation of an empirical growth model.convergence, growth, generalised method of moments, weak instruments.

Research Papers in Economics

A valid theory on probabilistic causation

Author: Jose M. Vidal-Sanz
Publication venue
Publication date
Field of study

In this paper several definitions of probabilistic causation are considered, and their main drawbacks discussed. Current notions of probabilistic causality have symmetry limitations (e.g. correlation and statistical dependence are symmetric notions). To avoid the symmetry problem, non-reciprocal causality is often defined in terms of dynamic asymmetry. But these notions are likely to consider spurious regularities. In this paper we present a definition of causality that does non have symmetry inconsistences. It is a natural extension of propositional causality in formal logics, and it can be easily analyzed with statistical inference. The modeling problems are also discussed using empirical processes.Causality, Empirical Processes and Classification Theory, 62M30, 62M15, 62G20

Research Papers in Economics