Search CORE

17,135 research outputs found

Analytical Challenges in Modern Tax Administration: A Brief History of Analytics at the IRS

Author: Butler Jeff
Publication venue: Ohio State University. Moritz College of Law
Publication date: 01/01/2020
Field of study

Process Mining of Programmable Logic Controllers: Input/Output Event Logs

Author: Darabi Houshang
Mokhtarian Ilia
Theis Julian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/03/2019
Field of study

This paper presents an approach to model an unknown Ladder Logic based Programmable Logic Controller (PLC) program consisting of Boolean logic and counters using Process Mining techniques. First, we tap the inputs and outputs of a PLC to create a data flow log. Second, we propose a method to translate the obtained data flow log to an event log suitable for Process Mining. In a third step, we propose a hybrid Petri net (PN) and neural network approach to approximate the logic of the actual underlying PLC program. We demonstrate the applicability of our proposed approach on a case study with three simulated scenarios

arXiv.org e-Print Archive

Crossref

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato

Call Graph Evolution Analytics over a Version Series of an Evolving Software System

Author: Chaturvedi Animesh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/10/2022
Field of study

Call Graph evolution analytics can aid a software engineer when maintaining or evolving a software system. This paper proposes Call Graph Evolution Analytics to extract information from an evolving call graph ECG = CG_1, CG_2,... CG_N for their version series VS = V_1, V_2, ... V_N of an evolving software system. This is done using Call Graph Evolution Rules (CGERs) and Call Graph Evolution Subgraphs (CGESs). Similar to association rule mining, the CGERs are used to capture co-occurrences of dependencies in the system. Like subgraph patterns in a call graph, the CGESs are used to capture evolution of dependency patterns in evolving call graphs. Call graph analytics on the evolution in these patterns can identify potentially affected dependencies (or procedure calls) that need attention. The experiments are done on the evolving call graphs of 10 large evolving systems to support dependency evolution management. We also consider results from a detailed study for evolving call graphs of Maven-Core's version series

arXiv.org e-Print Archive

Molecular Model of Dynamic Social Network Based on E-mail communication

Author: B. Bringmann
D Liben-Nowell
D Watts
D. Braha
D. Harel
M. Bronstein
P Kazienko
RA Hill
S. Boccaletti
SH Strogatz
W Torgerson
W Weidlich
Y Shavitt
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2013
Field of study

In this work we consider an application of physically inspired sociodynamical model to the modelling of the evolution of email-based social network. Contrary to the standard approach of sociodynamics, which assumes expressing of system dynamics with heuristically defined simple rules, we postulate the inference of these rules from the real data and their application within a dynamic molecular model. We present how to embed the n-dimensional social space in Euclidean one. Then, inspired by the Lennard-Jones potential, we define a data-driven social potential function and apply the resultant force to a real e-mail communication network in a course of a molecular simulation, with network nodes taking on the role of interacting particles. We discuss all steps of the modelling process, from data preparation, through embedding and the molecular simulation itself, to transformation from the embedding space back to a graph structure. The conclusions, drawn from examining the resultant networks in stable, minimum-energy states, emphasize the role of the embedding process projecting the non–metric social graph into the Euclidean space, the significance of the unavoidable loss of information connected with this procedure and the resultant preservation of global rather than local properties of the initial network. We also argue applicability of our method to some classes of problems, while also signalling the areas which require further research in order to expand this applicability domain

Crossref

Springer

Springer - Publisher Connector

Bournemouth University Research Online

King's Research Portal

BCFA: Bespoke Control Flow Analysis for CFA at Scale

Author: Benjamin Livshits V.
Bourdoncle François
Cobleigh Jamieson M.
Dyer Robert
Dyer Robert
Lam Patrick
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/05/2020
Field of study

Many data-driven software engineering tasks such as discovering programming patterns, mining API specifications, etc., perform source code analysis over control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be expensive and performance of the analysis heavily depends on the underlying CFG traversal strategy. State-of-the-art analysis frameworks use a fixed traversal strategy. We argue that a single traversal strategy does not fit all kinds of analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a control flow analysis (CFA) and a large number of CFGs, BCFA selects the most efficient traversal strategy for each CFG. BCFA extracts a set of properties of the CFA by analyzing the code of the CFA and combines it with properties of the CFG, such as branching factor and cyclicity, for selecting the optimal traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a set of representative static analyses that mainly involve traversing CFGs and two large datasets containing 287 thousand and 162 million CFGs. Our results show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Prediction of Emerging Technologies Based on Analysis of the U.S. Patent Citation Network

Author: A. Hargadon
A. Jaffe
A. Pyka
A. Sood
A. Usher
A. Verbeek
A. Vespignani
B. Milman
C. Chen
C. Sternitzke
C. Weng
D. Harhoff
E. Duguet
E. Garfield
E. Garfield
F. Murray
F. Narin
G. McMillanm
G. Palla
H. Moed
H. Small
H. Small
J. Alcacer
J. Hagedoorn
J. Lanjouw
J. Podolny
J. Podolny
J. Schumpeter
J. Ward
Jan Tobochnik
K. Debackere
K. Lai
K. OuYang
K. Strandburg
K. Strandburg
Katherine Strandburg
Kinga Makovi
L. Fleming
L. Fleming
L. Fleming
L. Leydesdorff
László Zalányi
M. Girvan
M. Meyer
M. Meyer
M. Mogee
M. Mogee
M. Mogee
M. Newman
M. Newman
M. Wallace
M. Weitzman
N. Shibata
N. Shibata
N. Shibata
O. Sorenson
P. Almeida
P. Pons
P. Saviotti
P. Saviotti
P. Saviotti
P. Érdi
P.C. Lee
Péter Volf
Péter Érdi
R. Fontana
R. Henderson
R. Kostoff
R. Kostoff
R. Kostoff
R. Tijssen
S. Chang
Y. Kajikawa
Y. Kajikawa
Z. Huang
Z. Huang
Zoltán Somogyvári
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/04/2013
Field of study

The network of patents connected by citations is an evolving graph, which provides a representation of the innovation process. A patent citing another implies that the cited patent reflects a piece of previously existing knowledge that the citing patent builds upon. A methodology presented here (i) identifies actual clusters of patents: i.e. technological branches, and (ii) gives predictions about the temporal changes of the structure of the clusters. A predictor, called the {citation vector}, is defined for characterizing technological development to show how a patent cited by other patents belongs to various industrial fields. The clustering technique adopted is able to detect the new emerging recombinations, and predicts emerging new technology clusters. The predictive ability of our new method is illustrated on the example of USPTO subcategory 11, Agriculture, Food, Textiles. A cluster of patents is determined based on citation data up to 1991, which shows significant overlap of the class 442 formed at the beginning of 1997. These new tools of predictive analytics could support policy decision making processes in science and technology, and help formulate recommendations for action

arXiv.org e-Print Archive

Crossref