Search CORE

101 research outputs found

Multiple Hypothesis Testing in Pattern Discovery

Author: Garriga Gemma C.
Hanhijärvi Sami
Puolamäki Kai
Publication venue
Publication date: 01/01/2009
Field of study

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used with a generic data mining algorithm. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive) in the strong sense. We evaluate the performance of our solution on both real and generated data. The results show that our method controls the FWER while maintaining the power of the test.Comment: 28 page

arXiv.org e-Print Archive

Aaltodoc Publication Archive

From an implicational system to its corresponding D-basis

Author: Adaricheva Kira
Cordero-Ortega Pablo
Enciso-García-Oliveros Manuel
Mora-Bonilla Angel
Rodríguez-Lorenzo Estrella
Publication venue
Publication date: 30/10/2015
Field of study

Closure system is a fundamental concept appearing in several areas such as databases, formal concept analysis, artificial intelligence, etc. It is well-known that there exists a connection between a closure operator on a set and the lattice of its closed sets. Furthermore, the closure system can be replaced by a set of implications but this set has usually a lot of redundancy inducing non desired properties. In the literature, there is a common interest in the search of the mini- mality of a set of implications because of the importance of bases. The well-known Duquenne-Guigues basis satisfies this minimality condition. However, several authors emphasize the relevance of the optimality in order to reduce the size of implications in the basis. In addition to this, some bases have been defined to improve the computation of closures relying on the directness property. The efficiency of computation with the direct basis is achieved due to the fact that the closure is computed in one traversal. In this work, we focus on the D-basis, which is ordered-direct. An open problem is to obtain it from an arbitrary implicational system, so it is our aim in this paper. We introduce a method to compute the D-basis by means of minimal generators calculated using the Simplification Logic for implications.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Supported by Grants TIN2011-28084 and TIN2014-59471-P of the Science and Innovation Ministry of Spain, which is co-financed by the European Social Fund

Repositorio Institucional Universidad de Málaga

Interactive Constrained Association Rule Mining

Author: Bussche Jan Van den
Goethals Bart
Publication venue
Publication date: 01/01/2003
Field of study

We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second International Conference on Knowledge Discovery and Data Mining (DaWaK 2000

arXiv.org e-Print Archive

CiteSeerX

Multi-Sorted Inverse Frequent Itemsets Mining: On-Going Research

Author: Piccolo Antonio
Saccà Domenico
Serra Edoardo
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2016
Field of study

Inverse frequent itemset mining (IFM) consists of generating artificial transactional databases reflecting patterns of real ones, in particular, satisfying given frequency constraints on the itemsets. An extension of IFM called many-sorted IFM, is introduced where the schemes for the datasets to be generated are those typical of Big Tables, as required in emerging big data applications, e.g., social network analytics

Boise State University - ScholarWorks