696 research outputs found
Hypothesis-Driven Specialization-based Analysis of Gene Expression Association Rules
During the development of many diseases such as cancer and diabetes, the pattern of gene expression within certain cells changes. A vital part of understanding these diseases will come from understanding the factors governing gene expression. This thesis work focused on mining association rules in the context of gene expression. We designed and developed a tool that enables domain experts to interactively analyze association rules that describe relationships in genetic data. Association rules in their native form deal with sets of items and associations among them. But domain experts hypothesize that additional factors like relative ordering and spacing of these items are important aspects governing gene expression. We proposed hypothesis-based specializations of association rules to identify biologically significant relationships. Our approach also alleviates the limitations inherent in the conventional association rule mining that uses a support-confidence framework by providing filtering and reordering of association rules according to other measures of interestingness in addition to support and confidence. Our tool supports visualization of genetic data in the context of a rule, which facilitates rule analysis and rule specialization. The improvement in different measures of interestingness (e.g., confidence, lift, and p-value) enabled by our approach is used to evaluate the significance of the specialized rules
Mining Frequent Itemsets Using Genetic Algorithm
In general frequent itemsets are generated from large data sets by applying
association rule mining algorithms like Apriori, Partition, Pincer-Search,
Incremental, Border algorithm etc., which take too much computer time to
compute all the frequent itemsets. By using Genetic Algorithm (GA) we can
improve the scenario. The major advantage of using GA in the discovery of
frequent itemsets is that they perform global search and its time complexity is
less compared to other algorithms as the genetic algorithm is based on the
greedy approach. The main aim of this paper is to find all the frequent
itemsets from given data sets using genetic algorithm
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
There has been much attention given recently to the task
of finding interesting patterns in temporal databases. Since there are so
many different approaches to the problem of discovering temporal patterns,
we first present a characterization of different discovery tasks and
then focus on one task of discovering interesting patterns of events in
temporal sequences. Given an (infinite) temporal database or a sequence
of events one can, in general, discover an infinite number of temporal
patterns in this data. Therefore, it is important to specify some measure
of interestingness for discovered patterns and then select only the patterns
interesting according to this measure. We present a probabilistic
measure of interestingness based on unexpectedness, whereby a pattern P
is deemed interesting if the ratio of the actual number of occurrences of
P exceeds the expected number of occurrences of P by some user defined
threshold. We then make use of a subset of the propositional, linear temporal
logic and present an efficient algorithm that discovers unexpected
patterns in temporal data. Finally, we apply this algorithm to synthetic
data, UNIX operating system calls, and Web logfiles and present the
results of these experiments.Information Systems Working Papers Serie
Detection of Interesting Traffic Accident Patterns by Association Rule Mining
In recent years, the accident rate related to traffic is high. Analyzing the crash data and extracting useful information from it can help in taking respective measures to decrease this rate or prevent the crash from happening. Related research has been done in the past which involved proposing various measures and algorithms to obtain interesting crash patterns from the crash records. The main problem is that large numbers of patterns were produced and vast number of these patterns would be obvious or not interesting. A deeper analysis of the data is required in order to get the interesting patterns. In order to overcome this situation, we have proposed a new approach to detect the most associated sequential patterns in the crash data. We also make use of the technique, “Association Rule Mining” to mine interesting traffic accident patterns from the crash records. The main goal of this research is to detect the most associated sequential patterns (MASP) and mine patterns within the data sets generated by MASP using a modified FP-growth approach in regular association rule mining. We have designed and implemented data structures for efficient implementation of algorithms. The results extracted can be further queried for pattern analysis to get a deeper understanding. Efficient memory management is one of the main objectives during the implementation of the algorithms. Linked list based tree structures have been used for searching the patterns. The results obtained seemed to be very promising and the detected MASPs contained most of the attributes which gave a deeper insight into the crash data and the patterns were found to be very interesting. A prototype application is developed in C# .NET
- …