3,106 research outputs found
Efficient Incremental Breadth-Depth XML Event Mining
Many applications log a large amount of events continuously. Extracting
interesting knowledge from logged events is an emerging active research area in
data mining. In this context, we propose an approach for mining frequent events
and association rules from logged events in XML format. This approach is
composed of two-main phases: I) constructing a novel tree structure called
Frequency XML-based Tree (FXT), which contains the frequency of events to be
mined; II) querying the constructed FXT using XQuery to discover frequent
itemsets and association rules. The FXT is constructed with a single-pass over
logged data. We implement the proposed algorithm and study various performance
issues. The performance study shows that the algorithm is efficient, for both
constructing the FXT and discovering association rules
Bidirectional Growth based Mining and Cyclic Behaviour Analysis of Web Sequential Patterns
Web sequential patterns are important for analyzing and understanding users
behaviour to improve the quality of service offered by the World Wide Web. Web
Prefetching is one such technique that utilizes prefetching rules derived
through Cyclic Model Analysis of the mined Web sequential patterns. The more
accurate the prediction and more satisfying the results of prefetching if we
use a highly efficient and scalable mining technique such as the Bidirectional
Growth based Directed Acyclic Graph. In this paper, we propose a novel
algorithm called Bidirectional Growth based mining Cyclic behavior Analysis of
web sequential Patterns (BGCAP) that effectively combines these strategies to
generate prefetching rules in the form of 2-sequence patterns with Periodicity
and threshold of Cyclic Behaviour that can be utilized to effectively prefetch
Web pages, thus reducing the users perceived latency. As BGCAP is based on
Bidirectional pattern growth, it performs only (log n+1) levels of recursion
for mining n Web sequential patterns. Our experimental results show that
prefetching rules generated using BGCAP is 5-10 percent faster for different
data sizes and 10-15% faster for a fixed data size than TD-Mine. In addition,
BGCAP generates about 5-15 percent more prefetching rules than TD-Mine.Comment: 19 page
Analysing the behaviour of robot teams through relational sequential pattern mining
This report outlines the use of a relational representation in a Multi-Agent
domain to model the behaviour of the whole system. A desired property in this
systems is the ability of the team members to work together to achieve a common
goal in a cooperative manner. The aim is to define a systematic method to
verify the effective collaboration among the members of a team and comparing
the different multi-agent behaviours. Using external observations of a
Multi-Agent System to analyse, model, recognize agent behaviour could be very
useful to direct team actions. In particular, this report focuses on the
challenge of autonomous unsupervised sequential learning of the team's
behaviour from observations. Our approach allows to learn a symbolic sequence
(a relational representation) to translate raw multi-agent, multi-variate
observations of a dynamic, complex environment, into a set of sequential
behaviours that are characteristic of the team in question, represented by a
set of sequences expressed in first-order logic atoms. We propose to use a
relational learning algorithm to mine meaningful frequent patterns among the
relational sequences to characterise team behaviours. We compared the
performance of two teams in the RoboCup four-legged league environment, that
have a very different approach to the game. One uses a Case Based Reasoning
approach, the other uses a pure reactive behaviour.Comment: 25 page
Failure prediction for high-performance computing systems
The failure rate in high-performance computing (HPC) systems continues to escalate as the number of components in these systems increases. This affects the scalability and the performance of parallel applications in large-scale HPC systems. Fault tolerance (FT) mechanisms help mitigating the impact of failures on parallel applications. However, utilizing such mechanisms requires additional overhead. Besides, the overuse of FT mechanisms results in unnecessarily large overhead in the parallel applications. Knowing when and where failures will occur can greatly reduce the excessive overhead. As such, failure prediction is critical in order to effectively utilize FT mechanisms. In addition, it also helps in system administration and management, as the predicted failure can be handled beforehand with limited impact to the running systems.
This dissertation proposes new proficiency metrics for failure prediction based on failure impact in UPC environment that the existing proficiency metrics tire unable to reflect. Furthermore, an efficient log message clustering algorithm is proposed for system event log data preprocessing and analysis. Then, two novel association rule mining approaches are introduced and employed for HPC failure prediction. Finally, the performances of the existing and the proposed association rule mining methods are compared and analyzed
Data-driven conceptual modeling: how some knowledge drivers for the enterprise might be mined from enterprise data
As organizations perform their business, they analyze, design and manage a variety of processes represented in models with different scopes and scale of complexity. Specifying these processes requires a certain level of modeling competence. However, this condition does not seem to be balanced with adequate capability of the person(s) who are responsible for the task of defining and modeling an organization or enterprise operation.
On the other hand, an enterprise typically collects various records of all events occur during the operation of their processes. Records, such as the start and end of the tasks in a process instance, state transitions of objects impacted by the process execution, the message exchange during the process execution, etc., are maintained in enterprise repositories as various logs, such as event logs, process logs, effect logs, message logs, etc. Furthermore, the growth rate in the volume of these data generated by enterprise process execution has increased manyfold in just a few years.
On top of these, models often considered as the dashboard view of an enterprise. Models represents an abstraction of the underlying reality of an enterprise. Models also served as the knowledge driver through which an enterprise can be managed. Data-driven extraction offers the capability to mine these knowledge drivers from enterprise data and leverage the mined models to establish the set of enterprise data that conforms with the desired behaviour.
This thesis aimed to generate models or knowledge drivers from enterprise data to enable some type of dashboard view of enterprise to provide support for analysts. The rationale for this has been started as the requirement to improve an existing process or to create a new process. It was also mentioned models can also serve as a collection of effectors through which an organization or an enterprise can be managed.
The enterprise data refer to above has been identified as process logs, effect logs, message logs, and invocation logs. The approach in this thesis is to mine these logs to generate process, requirement, and enterprise architecture models, and how goals get fulfilled based on collected operational data.
The above a research question has been formulated as whether it is possible to derive the knowledge drivers from the enterprise data, which represent the running operation of the enterprise, or in other words, is it possible to use the available data in the enterprise repository to generate the knowledge drivers? .
In Chapter 2, review of literature that can provide the necessary background knowledge to explore the above research question has been presented. Chapter 3 presents how process semantics can be mined. Chapter 4 suggest a way to extract a requirements model. The Chapter 5 presents a way to discover the underlying enterprise architecture and Chapter 6 presents a way to mine how goals get orchestrated. Overall finding have been discussed in Chapter 7 to derive some conclusions
- …