50,915 research outputs found

    Heuristics Miners for Streaming Event Data

    Full text link
    More and more business activities are performed using information systems. These systems produce such huge amounts of event data that existing systems are unable to store and process them. Moreover, few processes are in steady-state and due to changing circumstances processes evolve and systems need to adapt continuously. Since conventional process discovery algorithms have been defined for batch processing, it is difficult to apply them in such evolving environments. Existing algorithms cannot cope with streaming event data and tend to generate unreliable and obsolete results. In this paper, we discuss the peculiarities of dealing with streaming event data in the context of process mining. Subsequently, we present a general framework for defining process mining algorithms in settings where it is impossible to store all events over an extended period or where processes evolve while being analyzed. We show how the Heuristics Miner, one of the most effective process discovery algorithms for practical applications, can be modified using this framework. Different stream-aware versions of the Heuristics Miner are defined and implemented in ProM. Moreover, experimental results on artificial and real logs are reported

    MalStone: Towards A Benchmark for Analytics on Large Data Clouds

    Full text link
    Developing data mining algorithms that are suitable for cloud computing platforms is currently an active area of research, as is developing cloud computing platforms appropriate for data mining. Currently, the most common benchmark for cloud computing is the Terasort (and related) benchmarks. Although the Terasort Benchmark is quite useful, it was not designed for data mining per se. In this paper, we introduce a benchmark called MalStone that is specifically designed to measure the performance of cloud computing middleware that supports the type of data intensive computing common when building data mining models. We also introduce MalGen, which is a utility for generating data on clouds that can be used with MalStone

    From Linked Data to Relevant Data -- Time is the Essence

    Full text link
    The Semantic Web initiative puts emphasis not primarily on putting data on the Web, but rather on creating links in a way that both humans and machines can explore the Web of data. When such users access the Web, they leave a trail as Web servers maintain a history of requests. Web usage mining approaches have been studied since the beginning of the Web given the log's huge potential for purposes such as resource annotation, personalization, forecasting etc. However, the impact of any such efforts has not really gone beyond generating statistics detailing who, when, and how Web pages maintained by a Web server were visited.Comment: 1st International Workshop on Usage Analysis and the Web of Data (USEWOD2011) in the 20th International World Wide Web Conference (WWW2011), Hyderabad, India, March 28th, 201
    • …
    corecore