Search CORE

50,852 research outputs found

Artifact Lifecycle Discovery

Author: Dumas Marlon
Fahland Dirk
Popova Viara
Publication venue
Publication date: 01/01/2013
Field of study

Artifact-centric modeling is a promising approach for modeling business processes based on the so-called business artifacts - key entities driving the company's operations and whose lifecycles define the overall business process. While artifact-centric modeling shows significant advantages, the overwhelming majority of existing process mining methods cannot be applied (directly) as they are tailored to discover monolithic process models. This paper addresses the problem by proposing a chain of methods that can be applied to discover artifact lifecycle models in Guard-Stage-Milestone notation. We decompose the problem in such a way that a wide range of existing (non-artifact-centric) process discovery and analysis methods can be reused in a flexible manner. The methods presented in this paper are implemented as software plug-ins for ProM, a generic open-source framework and architecture for implementing process mining tools

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Counterexample-Guided Data Augmentation

Author: Dreossi Tommaso
Ghosh Shromona
Keutzer Kurt
Sangiovanni-Vincentelli Alberto
Seshia Sanjit A.
Yue Xiangyu
Publication venue
Publication date: 01/01/2018
Field of study

We present a novel framework for augmenting data sets for machine learning based on counterexamples. Counterexamples are misclassified examples that have important properties for retraining and improving the model. Key components of our framework include a counterexample generator, which produces data items that are misclassified by the model and error tables, a novel data structure that stores information pertaining to misclassifications. Error tables can be used to explain the model's vulnerabilities and are used to efficiently generate counterexamples for augmentation. We show the efficacy of the proposed framework by comparing it to classical augmentation techniques on a case study of object detection in autonomous driving based on deep neural networks

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

DNA-inspired online behavioral modeling and its application to spambot detection

Author: Cresci Stefano
Di Pietro Roberto
Petrocchi Marinella
Spognardi Angelo
Tesconi Maurizio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

We propose a strikingly novel, simple, and effective approach to model online user behavior: we extract and analyze digital DNA sequences from user online actions and we use Twitter as a benchmark to test our proposal. We obtain an incisive and compact DNA-inspired characterization of user actions. Then, we apply standard DNA analysis techniques to discriminate between genuine and spambot accounts on Twitter. An experimental campaign supports our proposal, showing its effectiveness and viability. To the best of our knowledge, we are the first ones to identify and adapt DNA-inspired techniques to online user behavioral modeling. While Twitter spambot detection is a specific use case on a specific social media, our proposed methodology is platform and technology agnostic, hence paving the way for diverse behavioral characterization tasks

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

PUblication MAnagement

Online Research Database In Technology

Archivio istituzionale della ricerca - Università di Padova

On mining complex sequential data by means of FCA and pattern structures

Author: Buzmakov Aleksey
Egho Elias
Jay Nicolas
Kuznetsov Sergei O.
Napoli Amedeo
Raïssi Chedy
Publication venue
Publication date: 09/04/2015
Field of study

Nowadays data sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of "complex" sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of Formal Concept Analysis (FCA) and its extension based on "pattern structures". Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e., a data reduction of sequential structures), are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analyzing interesting patient patterns from a French healthcare data set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use case which is the main motivation for this work. Keywords: data mining; formal concept analysis; pattern structures; projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems. The paper is created in the wake of the conference on Concept Lattice and their Applications (CLA'2013). 27 pages, 9 figures, 3 table

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

The Bases of Association Rules of High Confidence

Author: Adaricheva Kira
Cabot-Miller Justin
Nation J. B.
Segal Oren
Sharafudinov Anuar
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 05/08/2018
Field of study

We develop a new approach for distributed computing of the association rules of high confidence in a binary table. It is derived from the D-basis algorithm in K. Adaricheva and J.B. Nation (TCS 2017), which is performed on multiple sub-tables of a table given by removing several rows at a time. The set of rules is then aggregated using the same approach as the D-basis is retrieved from a larger set of implications. This allows to obtain a basis of association rules of high confidence, which can be used for ranking all attributes of the table with respect to a given fixed attribute using the relevance parameter introduced in K. Adaricheva et al. (Proceedings of ICFCA-2015). This paper focuses on the technical implementation of the new algorithm. Some testing results are performed on transaction data and medical data.Comment: Presented at DTMN, Sydney, Australia, July 28, 201

arXiv.org e-Print Archive

Crossref