1,921 research outputs found
Novel Algorithms for Cross-Ontology Multi-Level Data Mining
The wide spread use of ontologies in many scientific areas creates a wealth of ontologyannotated data and necessitates the development of ontology-based data mining algorithms. We have developed generalization and mining algorithms for discovering cross-ontology relationships via ontology-based data mining. We present new interestingness measures to evaluate the discovered cross-ontology relationships. The methods presented in this dissertation employ generalization as an ontology traversal technique for the discovery of interesting and informative relationships at multiple levels of abstraction between concepts from different ontologies. The generalization algorithms combine ontological annotations with the structure and semantics of the ontologies themselves to discover interesting crossontology relationships. The first algorithm uses the depth of ontological concepts as a guide for generalization. The ontology annotations are translated to higher levels of abstraction one level at a time accompanied by incremental association rule mining. The second algorithm conducts a generalization of ontology terms to all their ancestors via transitive ontology relations and then mines cross-ontology multi-level association rules from the generalized transactions. Our interestingness measures use implicit knowledge conveyed by the relation semantics of the ontologies to capture the usefulness of cross-ontology relationships. We describe the use of information theoretic metrics to capture the interestingness of cross-ontology relationships and the specificity of ontology terms with respect to an annotation dataset. Our generalization and data mining agorithms are applied to the Gene Ontology and the postnatal Mouse Anatomy Ontology. The results presented in this work demonstrate that our generalization algorithms and interestingness measures discover more interesting and better quality relationships than approaches that do not use generalization. Our algorithms can be used by researchers and ontology developers to discover inter-ontology connections. Additionally, the cross-ontology relationships discovered using our algorithms can be used by researchers to understand different aspects of entities that interest them
Statistical strategies for pruning all the uninteresting association rules
We propose a general framework to describe formally the
problem of capturing the intensity of implication for
association rules through statistical metrics.
In this framework we present properties that influence the
interestingness of a rule, analyze the conditions that
lead a measure to perform a perfect prune at a time,
and define a final proper order to sort the surviving
rules. We will discuss why none of the currently employed
measures can capture objective interestingness, and
just the combination of some of them, in a multi-step fashion,
can be reliable. In contrast, we propose a new simple modification
of the Pearson coefficient that will meet all the necessary
requirements. We statistically infer the convenient cut-off
threshold for this new metric by empirically describing its
distribution function through simulation. Final experiments
serve to show the ability of our proposal.Postprint (published version
Towards a semantic and statistical selection of association rules
The increasing growth of databases raises an urgent need for more accurate
methods to better understand the stored data. In this scope, association rules
were extensively used for the analysis and the comprehension of huge amounts of
data. However, the number of generated rules is too large to be efficiently
analyzed and explored in any further process. Association rules selection is a
classical topic to address this issue, yet, new innovated approaches are
required in order to provide help to decision makers. Hence, many interesting-
ness measures have been defined to statistically evaluate and filter the
association rules. However, these measures present two major problems. On the
one hand, they do not allow eliminating irrelevant rules, on the other hand,
their abun- dance leads to the heterogeneity of the evaluation results which
leads to confusion in decision making. In this paper, we propose a two-winged
approach to select statistically in- teresting and semantically incomparable
rules. Our statis- tical selection helps discovering interesting association
rules without favoring or excluding any measure. The semantic comparability
helps to decide if the considered association rules are semantically related
i.e comparable. The outcomes of our experiments on real datasets show promising
results in terms of reduction in the number of rules
- …