141,216 research outputs found

    Query-Constraint-Based Mining of Association Rules for Exploratory Analysis of Clinical Datasets in the National Sleep Research Resource

    Get PDF
    Background: Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics. Methods: We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint. Results: Applying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules. Conclusions: QARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems

    Penerapan Data Mining Association Rule Menggunakan Algoritma Apriori Dalam Meningkatkan Strategi Pemasaran Produk Motor Yamaha

    Get PDF
    Kendaraan roda dua merupakan salah satu transportasi yang banyak diminati oleh masyarakat Indonesia. Tingginya tingkat persaingan dalam penjualan produk kendaraan roda dua menuntut para manajer PT. Arista Mitra Lestari untuk terus berusaha meningkatkan mutu dan pelayanan perusahaan. Pelanggan adalah faktor penting dalam menjalankan usaha, namun dalam prakteknya pemasaran yang dilakukan oleh para manajer PT. Arista Mitra Lestari masih kurang efektif. Untuk mengatasi masalah ini dibutuhkan suatu strategi yang dapat membantu meningkatkan pemasaran kendaraan roda dua, salah satu cara yang dapat dilakukan adalah dengan memanfaatkan data penjualan yang dimiliki oleh perusahaan. Data mining dapat digunakan untuk mengolah data penjualan perusahaan dengan mencari association rule pada variable alamat customer dan produk kendaraan, dimana untuk mendapat aturan yang sesuai digunakan algoritma apriori. Dari hasil analisis association rule pada data penjualan, dengan masukan support 1% dan confidence 50% telah diperoleh 6 aturan dengan nilai ratio lift > 1 yang menunjukan bahwa aturan tersebut valid. Berdasarkan dari aturan yang diperoleh diketahui bahwa terdapat 2 aturan yang menunjukan bahwa pelanggan daerah Compreng memiliki kecenderungan untuk membeli Vixion, 2 aturan menunjukan bahwa pelanggan daerah Pusakanegara memiliki kecenderungan untuk membeli Fino, dan 2 aturan menunjukan bahwa pelanggan daerah Pamanukan dan Pusakanegara memiliki kecenderungan untuk membeli Mio dan Vixion. Hasil dari analisis ini dapat dijadikan alat bantu bagi para manajer PT. Arista Arista Mitra Lestari dalam mengambil keputusan yang lebih baik dalam memasarkan produk kendaraan roda dua. Kata kunci:Algoritma Apriori, Association Rule, Data Mining, Pemasaran produk Motorcycle is one of the transportation that got much in demand by the people of Indonesia. The high level of competition in the market of motorcycle require managers of PT. Arista Mitra Lestari to continuously improve the company quality and service. Customers is a very important factor in a world of bussines, but in practice the marketing that was done by the manager of PT. Arista Mitra Lestari still not very effective. To solve this problem the company need a strategy that can help to improve the marketing of motorcycle, one way to do is to utilize the company sales data. Data mining can be used to process the sales data association rule company by looking at the variable of customer and vehicle products,to calculate that apriori algorithm was used. From the analysis of association rule on sales data, with inputs support 1% and 50% confidence had gained 6 rules with the lift value ratio> 1 which indicates that the rule is valid. Based on the rules obtained is known that there are two rules shows that the customer area around Compreng have a tendency to buy Vixion, 2 rules shows that the customer area around Pusakanegara have a tendency to buy Fino, and the second rule indicates that the customer area around Pamanukan and Pusakanegara have a tendency to buy Mio and Vixion. The results of this analysis can be used as tools for managers PT. Arista Arista Mitra Lestari in making better decisions in motorcycle market products. Keywords: Apriori Algorithm, Association Rule, Data Mining, Marketing Produc

    Web Usage Mining with Evolutionary Extraction of Temporal Fuzzy Association Rules

    Get PDF
    In Web usage mining, fuzzy association rules that have a temporal property can provide useful knowledge about when associations occur. However, there is a problem with traditional temporal fuzzy association rule mining algorithms. Some rules occur at the intersection of fuzzy sets' boundaries where there is less support (lower membership), so the rules are lost. A genetic algorithm (GA)-based solution is described that uses the flexible nature of the 2-tuple linguistic representation to discover rules that occur at the intersection of fuzzy set boundaries. The GA-based approach is enhanced from previous work by including a graph representation and an improved fitness function. A comparison of the GA-based approach with a traditional approach on real-world Web log data discovered rules that were lost with the traditional approach. The GA-based approach is recommended as complementary to existing algorithms, because it discovers extra rules. (C) 2013 Elsevier B.V. All rights reserved

    A Framework for High-Accuracy Privacy-Preserving Mining

    Full text link
    To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of data records have been proposed recently. In this paper, we present a generalized matrix-theoretic model of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, we demonstrate that (a) the prior techniques differ only in their settings for the model parameters, and (b) through appropriate choice of parameter settings, we can derive new perturbation techniques that provide highly accurate mining results even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the model parameters are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at a very marginal cost in accuracy. While our model is valid for random-perturbation-based privacy-preserving mining in general, we specifically evaluate its utility here with regard to frequent-itemset mining on a variety of real datasets. The experimental results indicate that our mechanisms incur substantially lower identity and support errors as compared to the prior techniques

    Re-mining item associations: methodology and a case study in apparel retailing

    Get PDF
    Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analysis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques

    Too Trivial To Test? An Inverse View on Defect Prediction to Identify Methods with Low Fault Risk

    Get PDF
    Background. Test resources are usually limited and therefore it is often not possible to completely test an application before a release. To cope with the problem of scarce resources, development teams can apply defect prediction to identify fault-prone code regions. However, defect prediction tends to low precision in cross-project prediction scenarios. Aims. We take an inverse view on defect prediction and aim to identify methods that can be deferred when testing because they contain hardly any faults due to their code being "trivial". We expect that characteristics of such methods might be project-independent, so that our approach could improve cross-project predictions. Method. We compute code metrics and apply association rule mining to create rules for identifying methods with low fault risk. We conduct an empirical study to assess our approach with six Java open-source projects containing precise fault data at the method level. Results. Our results show that inverse defect prediction can identify approx. 32-44% of the methods of a project to have a low fault risk; on average, they are about six times less likely to contain a fault than other methods. In cross-project predictions with larger, more diversified training sets, identified methods are even eleven times less likely to contain a fault. Conclusions. Inverse defect prediction supports the efficient allocation of test resources by identifying methods that can be treated with less priority in testing activities and is well applicable in cross-project prediction scenarios.Comment: Submitted to PeerJ C
    corecore