45,137 research outputs found
A network algorithm to discover sequential patterns
This paper addresses the discovery of sequential patterns in very large databases. Most of the existing algorithms use lattice structures in the space search that are very demanding computationally. The output of these algorithms generates a large number of rules. The aim of this work is to create a
swift algorithm for the discovery of sequential patterns with a low time
complexity. In this work, we also want to define tools that allow us to simplify
the work of the final user, by offering a new visualization of the sequences, while bypassing the analysis of thousands of association rules
iWAP: ASingle Pass Approach for Web Access Sequential Pattern Mining
With the explosive growth of data availability on the World Wide Web, web usage mining becomes very essential for improving designs of websites, analyzing system performance as well as network communications, understanding user reaction, motivation and building adaptive websites. Web Access Pattern mining (WAP-mine) is a sequential pattern mining technique for discovering frequent web log access sequences. It first stores the frequent part of original web access sequence database on a prefix tree called WAP-tree and mines the frequent sequences from that tree according to a user given minimum support threshold. Therefore, this method is not applicable for incremental and interactive mining. In this paper, we propose an algorithm, improved Web Access Pattern (iWAP) mining, to find web access patterns from web logs more efficiently than the WAP-mine algorithm. Our proposed approach can discover all web access sequential patterns with a single pass of web log databases. Moreover, it is applicable for interactive and incremental mining which are not provided by the earlier one. The experimental and performance studies show that the proposed algorithm is in general an order of magnitude faster than the existing WAP-mine algorithm
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Mining Target-Oriented Sequential Patterns with Time-Intervals
A target-oriented sequential pattern is a sequential pattern with a concerned
itemset in the end of pattern. A time-interval sequential pattern is a
sequential pattern with time-intervals between every pair of successive
itemsets. In this paper we present an algorithm to discover target-oriented
sequential pattern with time-intervals. To this end, the original sequences are
reversed so that the last itemsets can be arranged in front of the sequences.
The contrasts between reversed sequences and the concerned itemset are then
used to exclude the irrelevant sequences. Clustering analysis is used with
typical sequential pattern mining algorithm to extract the sequential patterns
with time-intervals between successive itemsets. Finally, the discovered
time-interval sequential patterns are reversed again to the original order for
searching the target patterns.Comment: 11 pages, 9 table
- …