Search CORE

53,064 research outputs found

Computing Multi-Relational Sufficient Statistics for Large Databases

Author: Qian Zhensong
Schulte Oliver
Sun Yan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/08/2014
Field of study

Databases contain information about which relationships do and do not hold among entities. To make this information accessible for statistical analysis requires computing sufficient statistics that combine information from different database tables. Such statistics may involve any number of {\em positive and negative} relationships. With a naive enumeration approach, computing sufficient statistics for negative relationships is feasible only for small databases. We solve this problem with a new dynamic programming algorithm that performs a virtual join, where the requisite counts are computed without materializing join tables. Contingency table algebra is a new extension of relational algebra, that facilitates the efficient implementation of this M\"obius virtual join operation. The M\"obius Join scales to large datasets (over 1M tuples) with complex schemas. Empirical evaluation with seven benchmark datasets showed that information about the presence and absence of links can be exploited in feature selection, association rule mining, and Bayesian network learning.Comment: 11pages, 8 figures, 8 tables, CIKM'14,November 3--7, 2014, Shanghai, Chin

arXiv.org e-Print Archive

CiteSeerX

Re-mining item associations: methodology and a case study in apparel retailing

Author: Atan Tankut
Demiriz Ayhan
Ertek Gurdal
Ertek Gürdal
Kula Ufuk
Publication venue: 'Elsevier BV'
Publication date: 25/11/2010
Field of study

Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analysis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques

Sakarya Üniversitesi Kurumsal Açık Akademik Arşivi

Isik University Academic Open Access

Sabanci University Research Database

A review of associative classification mining

Author: Thabtah Fadi
Publication venue
Publication date: 01/01/2007
Field of study

Associative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper

CiteSeerX

University of Huddersfield Repository

Mining semantic rules for optimizing transport assignments in hospitals

Author: Bonte Pieter
De Turck Filip
Hoogstoel Eveline
Ongenae Femke
Publication venue
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

New probabilistic interest measures for association rules

Author: Hahsler Michael
Hornik Kurt
Publication venue
Publication date: 07/02/2008
Field of study

Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic

arXiv.org e-Print Archive

CiteSeerX