2,080 research outputs found

    Discovering duplicate tasks in transition systems for the simplification of process models

    Get PDF
    This work presents a set of methods to improve the understandability of process models. Traditionally, simplification methods trade off quality metrics, such as fitness or precision. Conversely, the methods proposed in this paper produce simplified models while preserving or even increasing fidelity metrics. The first problem addressed in the paper is the discovery of duplicate tasks. A new method is proposed that avoids overfitting by working on the transition system generated by the log. The method is able to discover duplicate tasks even in the presence of concurrency and choice. The second problem is the structural simplification of the model by identifying optional and repetitive tasks. The tasks are substituted by annotated events that allow the removal of silent tasks and reduce the complexity of the model. An important feature of the methods proposed in this paper is that they are independent from the actual miner used for process discovery.Peer ReviewedPostprint (author's final draft

    An Optimal Approach for Mining Rare Causal Associations to Detect ADR Signal Pairs

    Get PDF
    Abstract- Adverse Drug Reaction (ADR) is one of the most important issues in the assessment of drug safety. In fact, many adverse drug reactions are not discovered during limited premarketing clinical trials; instead, they are only observed after long term post-marketing surveillance of drug usage. In light of this, the detection of adverse drug reactions, as early as possible, is an important topic of research for the pharmaceutical industry. Recently, large numbers of adverse events and the development of data mining technology have motivated the development of statistical and data mining methods for the detection of ADRs. These stand-alone methods, with no integration into knowledge discovery systems, are tedious and inconvenient for users and the processes for exploration are time-consuming. This paper proposes an interactive system platform for the detection of ADRs. By integrating an ADR data warehouse and innovative data mining techniques, the proposed system not only supports OLAP style multidimensional analysis of ADRs, but also allows the interactive discovery of associations between drugs and symptoms, called a drug-ADR association rule, which can be further, developed using other factors of interest to the user, such as demographic information. The experiments indicate that interesting and valuable drug-ADR association rules can be efficiently mined. Index Terms- In this paper, we try to employ a knowledgebased approach to capture the degree of causality of an event pair within each sequence and we are going to match the data which was previously referred or suggested for treatment. � It is majorly used for Immediate Treatment for patients. However, mining the relationships between Drug and its Signal Reaction will be treated by In-Experienced Physician’

    Artificial Intelligence and Soft Computing

    Full text link

    Knowledge Discovery in Databases: An Information Retrieval Perspective

    Get PDF
    The current trend of increasing capabilities in data generation and collection has resulted in an urgent need for data mining applications, also called knowledge discovery in databases. This paper identifies and examines the issues involved in extracting useful grains of knowledge from large amounts of data. It describes a framework to categorise data mining systems. The author also gives an overview of the issues pertaining to data pre processing, as well as various information gathering methodologies and techniques. The paper covers some popular tools such as classification, clustering, and generalisation. A summary of statistical and machine learning techniques used currently is also provided

    AAPOR Report on Big Data

    Get PDF
    In recent years we have seen an increase in the amount of statistics in society describing different phenomena based on so called Big Data. The term Big Data is used for a variety of data as explained in the report, many of them characterized not just by their large volume, but also by their variety and velocity, the organic way in which they are created, and the new types of processes needed to analyze them and make inference from them. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for survey research.There is a great potential in Big Data but there are some fundamental challenges that have to be resolved before its full potential can be realized. In this report we give examples of different types of Big Data and their potential for survey research. We also describe the Big Data process and discuss its main challenges

    Order Flow and Exchange Rate Dynamics

    Get PDF
    Macroeconomic models of nominal exchange rates perform poorly. In sample, R2 statistics as high as 10 percent are rare. Out of sample, these models are typically out-forecast by a na‹ve random walk. This paper presents a model of a new kind. Instead of relying exclusively on macroeconomic determinants, the model includes a determinant from the field of microstructure-order flow. Order flow is the proximate determinant of price in all microstructure models. This is a radically different approach to exchange rate determination. It is also strikingly successful in accounting for realized rates. Our model of daily exchange-rate changes produces R2 statistics above 50 percent. Out of sample, our model produces significantly better short-horizon forecasts than a random walk. For the DM/spotmarketasawhole,wefindthat spot market as a whole, we find that 1 billion of net dollar purchases increases the DM price of a dollar by about 1 pfennig.

    Community Participation, Teacher Effort, and Educational Outcome: The Case of El Salvador's EDUCO Program

    Get PDF
    Based on a principal-agent model, this paper investigates the organizational structure that made the El Salvador's primary school decentralization program (EDUCO program) successful. First, we employ the "augmented" reduced form educational production function by incorporating parents and community involvement as a major organizational input. We observe consistently positive and statistically significant EDUCO participation effects on standardized test scores. Then we estimated teacher compensation function, teacher effort functions, and input demand functions by utilizing the theoretical implications of a principal (parental association)-agent (teacher) framework. While the EDUCO school teachers receive piece rate, depending on their performance, wage payment is relatively fixed in the traditional schools. Empirical results indicate that the slope of wage equation is positively affected by the degree of community participation. This finding can be interpreted as the optimal intensity of incentive. Hence, teacher's effort level in the traditional schools is consistently lower than that in the EDUCO schools, indicating a moral hazard problem. Community participation through parental group's classroom visits seems to enhance the teacher effort level and thus increases students' academic performance indirectly. Parental associations can affect not only teacher effort and their performance by imposing an appropriate incentive scheme but also school-level inputs by decentralized school management. Our empirical results support the view that decentralization of education system should involve delegation of school administration and teacher management to the community group.economic analysis of social sector reform, the optimal intensity of incentive condition, moral hazard, education production function, fixed effects instrumental variable estimation

    GMM Estimation of Empirical Growth Models

    Get PDF
    This paper highlights a problem in using the first-difference GMM panel data estimator cross-country growth regressions. When the time series are persistent, the first-differenced GMM estimator can be poorly behaved, since lagged levels of the series provide only weak instruments for subsequent first-differences. Revisiting the work of Caselli, Esquivel and Lefort (1996), we show that this problem may be serious in practice. We suggest using a more efficient GMM estimator that exploits stationarity restrictions, and this approach is shown to give more reasonable results than first-differenced GMM in our estimation of an empirical growth model.convergence, growth, generalised method of moments, weak instruments.

    A valid theory on probabilistic causation

    Get PDF
    In this paper several definitions of probabilistic causation are considered, and their main drawbacks discussed. Current notions of probabilistic causality have symmetry limitations (e.g. correlation and statistical dependence are symmetric notions). To avoid the symmetry problem, non-reciprocal causality is often defined in terms of dynamic asymmetry. But these notions are likely to consider spurious regularities. In this paper we present a definition of causality that does non have symmetry inconsistences. It is a natural extension of propositional causality in formal logics, and it can be easily analyzed with statistical inference. The modeling problems are also discussed using empirical processes.Causality, Empirical Processes and Classification Theory, 62M30, 62M15, 62G20
    corecore