8,101 research outputs found
Sequential Patterns Post-processing for Structural Relation Patterns Mining
Sequential patterns mining is an important data-mining technique used to identify frequently observed sequential
occurrence of items across ordered transactions over time. It has been extensively studied in the literature, and there
exists a diversity of algorithms. However, more complex structural patterns are often hidden behind sequences.
This article begins with the introduction of a model for the representation of sequential patterns—Sequential
Patterns Graph—which motivates the search for new structural relation patterns. An integrative framework for
the discovery of these patterns–Postsequential Patterns Mining–is then described which underpins the postprocessing
of sequential patterns. A corresponding data-mining method based on sequential patterns postprocessing
is proposed and shown to be effective in the search for concurrent patterns. From experiments conducted on three
component algorithms, it is demonstrated that sequential patterns-based concurrent patterns mining provides
an efficient method for structural knowledge discover
New probabilistic interest measures for association rules
Mining association rules is an important technique for discovering meaningful
patterns in transaction databases. Many different measures of interestingness
have been proposed for association rules. However, these measures fail to take
the probabilistic properties of the mined data into account. In this paper, we
start with presenting a simple probabilistic framework for transaction data
which can be used to simulate transaction data when no associations are
present. We use such data and a real-world database from a grocery outlet to
explore the behavior of confidence and lift, two popular interest measures used
for rule mining. The results show that confidence is systematically influenced
by the frequency of the items in the left hand side of rules and that lift
performs poorly to filter random noise in transaction data. Based on the
probabilistic framework we develop two new interest measures, hyper-lift and
hyper-confidence, which can be used to filter or order mined association rules.
The new measures show significantly better performance than lift for
applications where spurious rules are problematic
From sequential patterns to concurrent branch patterns: a new post sequential patterns mining approach
A thesis submitted for the degree of Doctor ofPhilosophy
of the University of BedfordshireSequential patterns mining is an important pattern discovery technique used to identify frequently observed sequential occurrence of items across ordered transactions over time. It has been intensively studied and there exists a great diversity of algorithms. However, there is a major problem associated with the conventional sequential patterns mining in that patterns derived are often large and not very easy to understand or use. In addition, more complex relations among events are often hidden behind sequences.
A novel model for sequential patterns called Sequential Patterns Graph (SPG) is proposed. The construction algorithm of SPG is presented with experimental results to substantiate the concept. The thesis then sets out to define some new structural patterns such as concurrent branch patterns, exclusive patterns and iterative patterns which are generally hidden behind sequential patterns. Finally, an integrative framework, named Post Sequential Patterns Mining (PSPM), which is based on sequential patterns mining, is also proposed for the discovery and visualisation of structural patterns.
This thesis is intended to prove that discrete sequential patterns derived from traditional sequential patterns mining can be modelled graphically using SPG. It is concluded from experiments and theoretical studies that SPG is not only a minimal representation of sequential patterns mining, but it also represents the interrelation among patterns and establishes further the foundation for mining structural knowledge (i.e. concurrent branch patterns, exclusive patterns and iterative patterns). from experiments conducted on both synthetic and real datasets, it is shown that Concurrent Branch Patterns (CBP) mining is an effective and efficient mining algorithm suitable for concurrent branch patterns
Mining Frequent Graph Patterns with Differential Privacy
Discovering frequent graph patterns in a graph database offers valuable
information in a variety of applications. However, if the graph dataset
contains sensitive data of individuals such as mobile phone-call graphs and
web-click graphs, releasing discovered frequent patterns may present a threat
to the privacy of individuals. {\em Differential privacy} has recently emerged
as the {\em de facto} standard for private data analysis due to its provable
privacy guarantee. In this paper we propose the first differentially private
algorithm for mining frequent graph patterns.
We first show that previous techniques on differentially private discovery of
frequent {\em itemsets} cannot apply in mining frequent graph patterns due to
the inherent complexity of handling structural information in graphs. We then
address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling
based algorithm. Unlike previous work on frequent itemset mining, our
techniques do not rely on the output of a non-private mining algorithm.
Instead, we observe that both frequent graph pattern mining and the guarantee
of differential privacy can be unified into an MCMC sampling framework. In
addition, we establish the privacy and utility guarantee of our algorithm and
propose an efficient neighboring pattern counting technique as well.
Experimental results show that the proposed algorithm is able to output
frequent patterns with good precision
- …