4,083 research outputs found
A Subsequence Interleaving Model for Sequential Pattern Mining
Recent sequential pattern mining methods have used the minimum description
length (MDL) principle to define an encoding scheme which describes an
algorithm for mining the most compressing patterns in a database. We present a
novel subsequence interleaving model based on a probabilistic model of the
sequence database, which allows us to search for the most compressing set of
patterns without designing a specific encoding scheme. Our proposed algorithm
is able to efficiently mine the most relevant sequential patterns and rank them
using an associated measure of interestingness. The efficient inference in our
model is a direct result of our use of a structural expectation-maximization
framework, in which the expectation-step takes the form of a submodular
optimization problem subject to a coverage constraint. We show on both
synthetic and real world datasets that our model mines a set of sequential
patterns with low spuriousness and redundancy, high interpretability and
usefulness in real-world applications. Furthermore, we demonstrate that the
quality of the patterns from our approach is comparable to, if not better than,
existing state of the art sequential pattern mining algorithms.Comment: 10 pages in KDD 2016: Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Minin
On the Sequential Pattern and Rule Mining in the Analysis of Cyber Security Alerts
Data mining is well-known for its ability to extract concealed and indistinct patterns in the data, which is a common task in the field of cyber security. However, data mining is not always used to its full potential among cyber security community. In this paper, we discuss usability of sequential pattern and rule mining, a subset of data mining methods, in an analysis of cyber security alerts. First, we survey the use case of data mining, namely alert correlation and attack prediction. Subsequently, we evaluate sequential pattern and rule mining methods to find the one that is both fast and provides valuable results while dealing with the peculiarities of security alerts. An experiment was performed using the dataset of real alerts from an alert sharing platform. Finally, we present lessons learned from the experiment and a comparison of the selected methods based on their performance and soundness of the results
Efficient chain structure for high-utility sequential pattern mining
High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, which considers both utility and sequence factors to derive the set of high-utility sequential patterns (HUSPs) from the quantitative databases. Several works have been presented to reduce the computational cost by variants of pruning strategies. In this paper, we present an efficient sequence-utility (SU)-chain structure, which can be used to store more relevant information to improve mining performance. Based on the SU-Chain structure, the existing pruning strategies can also be utilized here to early prune the unpromising candidates and obtain the satisfied HUSPs. Experiments are then compared with the state-of-the-art HUSPM algorithms and the results showed that the SU-Chain-based model can efficiently improve the efficiency performance than the existing HUSPM algorithms in terms of runtime and number of the determined candidates
- …