50,915 research outputs found
Heuristics Miners for Streaming Event Data
More and more business activities are performed using information systems.
These systems produce such huge amounts of event data that existing systems are
unable to store and process them. Moreover, few processes are in steady-state
and due to changing circumstances processes evolve and systems need to adapt
continuously. Since conventional process discovery algorithms have been defined
for batch processing, it is difficult to apply them in such evolving
environments. Existing algorithms cannot cope with streaming event data and
tend to generate unreliable and obsolete results.
In this paper, we discuss the peculiarities of dealing with streaming event
data in the context of process mining. Subsequently, we present a general
framework for defining process mining algorithms in settings where it is
impossible to store all events over an extended period or where processes
evolve while being analyzed. We show how the Heuristics Miner, one of the most
effective process discovery algorithms for practical applications, can be
modified using this framework. Different stream-aware versions of the
Heuristics Miner are defined and implemented in ProM. Moreover, experimental
results on artificial and real logs are reported
MalStone: Towards A Benchmark for Analytics on Large Data Clouds
Developing data mining algorithms that are suitable for cloud computing
platforms is currently an active area of research, as is developing cloud
computing platforms appropriate for data mining. Currently, the most common
benchmark for cloud computing is the Terasort (and related) benchmarks.
Although the Terasort Benchmark is quite useful, it was not designed for data
mining per se. In this paper, we introduce a benchmark called MalStone that is
specifically designed to measure the performance of cloud computing middleware
that supports the type of data intensive computing common when building data
mining models. We also introduce MalGen, which is a utility for generating data
on clouds that can be used with MalStone
From Linked Data to Relevant Data -- Time is the Essence
The Semantic Web initiative puts emphasis not primarily on putting data on
the Web, but rather on creating links in a way that both humans and machines
can explore the Web of data. When such users access the Web, they leave a trail
as Web servers maintain a history of requests. Web usage mining approaches have
been studied since the beginning of the Web given the log's huge potential for
purposes such as resource annotation, personalization, forecasting etc.
However, the impact of any such efforts has not really gone beyond generating
statistics detailing who, when, and how Web pages maintained by a Web server
were visited.Comment: 1st International Workshop on Usage Analysis and the Web of Data
(USEWOD2011) in the 20th International World Wide Web Conference (WWW2011),
Hyderabad, India, March 28th, 201
- …