8 research outputs found

    A new constraint for mining sets in sequences

    Full text link

    SPMLS: An Efficient Sequential Pattern Mining Algorithm with candidate Generation and Frequency Testing

    Get PDF
    India. Abstract- Sequential pattern mining is a fundamental and essential field of data mining because of its extensive scope of applications spanning from the forecasting the user shopping patterns, and scientific discoveries. The objective is to discover frequently appeared sequential patterns in given set of sequences. Now-a-days, many studies have contributed to the efficiency of sequential pattern mining algorithms. Most existing algorithms have verified to be effective, however, when mining long frequent sequences in database, these algorithms do not work well. In this paper, we propose an efficient pattern mining algorithm, SPMLS, Sequential Pattern Mining on Long Sequences for mining long sequential patterns in a given database. SPMLS takes up an iterative process of candidate-generation which is followed by frequency-testing in two phases, event-wise and sequence-wise. Event-wise phase presents a new candidate pruning approach which improves the efficiency of the mining process. Sequence-wise phase integrates considerations of intra-event and inter-event constraints. Simulations are carried out on both synthetic and real datasets to evaluate the performance of SPMLS

    T-Patterns Revisited: Mining for Temporal Patterns in Sensor Data

    Get PDF
    The trend to use large amounts of simple sensors as opposed to a few complex sensors to monitor places and systems creates a need for temporal pattern mining algorithms to work on such data. The methods that try to discover re-usable and interpretable patterns in temporal event data have several shortcomings. We contrast several recent approaches to the problem, and extend the T-Pattern algorithm, which was previously applied for detection of sequential patterns in behavioural sciences. The temporal complexity of the T-pattern approach is prohibitive in the scenarios we consider. We remedy this with a statistical model to obtain a fast and robust algorithm to find patterns in temporal data. We test our algorithm on a recent database collected with passive infrared sensors with millions of events

    A new constraint for mining sets in sequences

    No full text
    International audienceDiscovering interesting patterns in event sequences is a popular taskin the field of data mining. Most existing methods try to do thisbased on some measure of cohesion to determine an occurrence of apattern, and a frequency threshold to determine if the pattern occursoften enough. We introduce a new constraint based on a newinterestingness measure combining the cohesion and the frequency of apattern. For a dataset consisting of a single sequence, the cohesion is measured as the average length of thesmallest intervals containing the pattern for each occurrence of itsevents, and the frequency is measured as the probability of observingan event of that pattern. We present a similarconstraint for datasets consisting of multiple sequences. We presentalgorithms to efficiently identify the thus defined interestingpatterns, given a dataset and a user-defined threshold. After applyingour method to both synthetic and real-life data, we conclude that itindeed gives intuitive results in a number of applications

    A new constraint for mining sets in sequences

    No full text
    International audienceDiscovering interesting patterns in event sequences is a popular taskin the field of data mining. Most existing methods try to do thisbased on some measure of cohesion to determine an occurrence of apattern, and a frequency threshold to determine if the pattern occursoften enough. We introduce a new constraint based on a newinterestingness measure combining the cohesion and the frequency of apattern. For a dataset consisting of a single sequence, the cohesion is measured as the average length of thesmallest intervals containing the pattern for each occurrence of itsevents, and the frequency is measured as the probability of observingan event of that pattern. We present a similarconstraint for datasets consisting of multiple sequences. We presentalgorithms to efficiently identify the thus defined interestingpatterns, given a dataset and a user-defined threshold. After applyingour method to both synthetic and real-life data, we conclude that itindeed gives intuitive results in a number of applications
    corecore