9 research outputs found

    Using Answer Set Programming for pattern mining

    Get PDF
    Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show that the incremental resolution is more efficient than the non-incremental one, but both ASP programs are less efficient than dedicated algorithms. Nonetheless, this approach can be seen as a first step toward a generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014

    Mining top-k regular episodes from sensor streams

    Get PDF
    International audienceThe monitoring of human activities plays an important role in health-care applications and for the data mining community. Existing approaches work on activities recognition occurring in sensor data streams. However, regular behaviors have not been studied. Thus, we here introduce a new approach to discover top-k most regular episodes from sensors streams, TKRES. The top-k approach allows us to control the size of the output, thus preventing overwhelming result analysis for the supervisor. TKRES is based on the use of a simple top-k list and a k-tree structure for maintaining the top-k episodes and their occurrence information. We also investigate and report the performances of TKRES on two real-life smart home datasets

    Mining Weighted Frequent Closed Episodes over Multiple Sequences

    Get PDF
    Frequent episode discovery is introduced to mine useful and interesting temporal patterns from sequential data. The existing episode mining methods mainly focused on mining from a single long sequence consisting of events with time constraints. However, there can be multiple sequences of different importance as the persons or entities associated with each sequence can be of different importance. Aiming to mine episodes in multiple sequences of different importance, we first define a new kind of episodes, i.e., the weighted frequent closed episodes, to take sequence importance, episode distribution and occurrence frequency into account together. Secondly, to facilitate the mining of such new episodes, we present a new concept called maximal duration serial episodes to cut a whole sequence into multiple maximum episodes using duration constraints, and discuss its properties for episode shrinking processing. Finally, based on the theoretical properties, we propose a two-phase approach to efficiently mine these new episodes. In Phase I, we adopt a level-wise episode shrinking framework to discover the candidate frequent closed episodes with the same prefixes, and in Phase II, we match the candidates with different prefixes to find the frequent close episodes. Experiments on simulated and real datasets demonstrate that the proposed episode mining strategy has good mining effectiveness and efficiency

    Mining Positional Data Streams

    Get PDF
    Abstract. We study frequent pattern mining from positional data streams. Existing approaches require discretised data to identify atomic events and are not applicable in our continuous setting. We propose an efficient trajectory-based preprocessing to identify similar movements and a distributed pattern mining algorithm to identify frequent trajectories. We empirically evaluate all parts of the processing pipeline

    Code Clone Discovery Based on Concolic Analysis

    Get PDF
    Software is often large, complicated and expensive to build and maintain. Redundant code can make these applications even more costly and difficult to maintain. Duplicated code is often introduced into these systems for a variety of reasons. Some of which include developer churn, deficient developer application comprehension and lack of adherence to proper development practices. Code redundancy has several adverse effects on a software application including an increased size of the codebase and inconsistent developer changes due to elevated program comprehension needs. A code clone is defined as multiple code fragments that produce similar results when given the same input. There are generally four types of clones that are recognized. They range from simple type-1 and 2 clones, to the more complicated type-3 and 4 clones. Numerous clone detection mechanisms are able to identify the simpler types of code clone candidates, but far fewer claim the ability to find the more difficult type-3 clones. Before CCCD, MeCC and FCD were the only clone detection techniques capable of finding type-4 clones. A drawback of MeCC is the excessive time required to detect clones and the likely exploration of an unreasonably large number of possible paths. FCD requires extensive amounts of random data and a significant period of time in order to discover clones. This dissertation presents a new process for discovering code clones known as Concolic Code Clone Discovery (CCCD). This technique discovers code clone candidates based on the functionality of the application, not its syntactical nature. This means that things like naming conventions and comments in the source code have no effect on the proposed clone detection process. CCCD finds clones by first performing concolic analysis on the targeted source code. Concolic analysis combines concrete and symbolic execution in order to traverse all possible paths of the targeted program. These paths are represented by the generated concolic output. A diff tool is then used to determine if the concolic output for a method is identical to the output produced for another method. Duplicated output is indicative of a code clone. CCCD was validated against several open source applications along with clones of all four types as defined by previous research. The results demonstrate that CCCD was able to detect all types of clone candidates with a high level of accuracy. In the future, CCCD will be used to examine how software developers work with type-3 and type-4 clones. CCCD will also be applied to various areas of security research, including intrusion detection mechanisms

    Mining Closed Episodes with Simultaneous Events

    No full text
    Sequential pattern discovery is a well-studied field in data mining. Episodes are sequential patterns describing events that often occur in the vicinity of each other. Episodes can impose restrictions to the order of the events, which makes them a versatile technique for describing complex patterns in the sequence. Most of the research on episodes deals with special cases such as serial, parallel, and injective episodes, while discovering general episodes is understudied. In this paper we extend the definition of an episode in order to be able to represent cases where events often occur simultaneously. We present an efficient and novel miner for discovering frequent and closed general episodes. Such a task presents unique challenges. Firstly, we cannot define closure based on frequency. We solve this by computing a more conservative closure that we use to reduce the search space and discover the closed episodes as a postprocessing step. Secondly, episodes are traditionally presented as directed acyclic graphs. We argue that this representation has drawbacks leading to redundancy in the output. We solve these drawbacks by defining a subset relationship in such a way that allows us to remove the redundant episodes. We demonstrate the efficiency of our algorithm and the need for using closed episodes empirically on synthetic and real-world datasets
    corecore