8 research outputs found

    Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems

    Get PDF
    Frequent itemset mining leads to the discovery of associations and correlations among items in large transactional databases. Apriori is a classical frequent itemset mining algorithm, which employs iterative passes over database combining with generation of candidate itemsets based on frequent itemsets found at the previous iteration, and pruning of clearly infrequent itemsets. The Dynamic Itemset Counting (DIC) algorithm is a variation of Apriori, which tries to reduce the number of passes made over a transactional database while keeping the number of itemsets counted in a pass relatively low. In this paper, we address the problem of accelerating DIC on the Intel Xeon Phi many-core system for the case when the transactional database fits in main memory. Intel Xeon Phi provides a large number of small compute cores with vector processing units. The paper presents a parallel implementation of DIC based on OpenMP technology and thread-level parallelism. We exploit the bit-based internal layout for transactions and itemsets. This technique reduces the memory space for storing the transactional database, simplifies the support count via logical bitwise operation, and allows for vectorization of such a step. Experimental evaluation on the platforms of the Intel Xeon CPU and the Intel Xeon Phi coprocessor with large synthetic and real databases showed good performance and scalability of the proposed algorithm.Comment: Accepted for publication in Journal of Computing and Information Technology (http://cit.fer.hr

    Association Rules Mining Based Clinical Observations

    Full text link
    Healthcare institutes enrich the repository of patients' disease related information in an increasing manner which could have been more useful by carrying out relational analysis. Data mining algorithms are proven to be quite useful in exploring useful correlations from larger data repositories. In this paper we have implemented Association Rules mining based a novel idea for finding co-occurrences of diseases carried by a patient using the healthcare repository. We have developed a system-prototype for Clinical State Correlation Prediction (CSCP) which extracts data from patients' healthcare database, transforms the OLTP data into a Data Warehouse by generating association rules. The CSCP system helps reveal relations among the diseases. The CSCP system predicts the correlation(s) among primary disease (the disease for which the patient visits the doctor) and secondary disease/s (which is/are other associated disease/s carried by the same patient having the primary disease).Comment: 5 pages, MEDINFO 2010, C. Safran et al. (Eds.), IOS Pres

    Data Mining Techniques in the Diagnosis of Tuberculosis

    Get PDF

    Early Detection and Prevention of Oral Cancer: Association Rule Mining on Investigations

    Get PDF
    Abstract: -Early detection and prevention of oral cancer is critical, as it can increase the survival chances considerably, allow for simpler treatment and result in a better quality of life for survivors. In this research paper, the popular association rule mining algorithm, apriori is used to find the spread of cancer with the help of various investigations and then assess the chance of survival of the patient. This is achieved by extracting a set of significant rules among various laboratory tests and investigations like FNAC of neck node, LFT, Biopsy, USG, CT scan-MRI and survivability of the oral cancer patients. The rules clearly show that if FNAC of neck node, USG and CT scan/ MRI is positive then chance of survival is reduced. However, if LFT is normal, probability of survival is high. If diagnostic-biopsy results in squamous-cell-carcinoma then it clearly indicate oral cancer, which may lead to high mortality if appropriate treatment is not initiated. The experimental results demonstrate that all the generated rules hold the highest confidence level, thereby, making investigations very essential to understand the spread of cancer after clinical examination for early detection and prevention of oral cancer

    Pencarian Pola Asosiasi Data Nasabah Bank Menggunakan Algoritma Apriori

    Get PDF
    Data Mining merupakan salah satu teknik untuk menemukan, mencari, atau menggali informasi atau pengetahuan baru yang kemudian menghasilkan informasi yang berguna. Fungsi data mining yang sering digunakan adalah untuk klasifikasi, klasterisasi, estimasi, prediksi, serta penemuan pola asosiasi (association rule mining). Association rule mining merupakan salah satu teknik dalam data mining yang berguna untuk menemukan pola asosiasi tersembunyi dalam suatu basis data. Pola asosiasi yang ditemukan nantinya berupa rule-rule asosiasi antara itemset dengan masing-masing nilai bobot asosiasinya. Pada penelitian ini, akan dilakukan pencarian pola asosiasi terhadap data nasabah di suatu bank, dengan melibatkan faktor umur, jenis kelamin, domisili, pendapatan, status pernikahan, jumlah anak, kepemilikan mobil, kepemilikan rekening tabungan, kepemilikan rekening giro, kepemilikan hipotek, dan status pembelian Personal Equity Plan. Hasil dari penelitian ini adalah terbentuknya pola bahwa seorang nasabah akan memiliki kemungkinan 100% untuk ikut dalam program Personal Equity Plan jika berumur antara 35 sampai 51 tahun dan mempunyai 1 orang anak serta mempunyai rekening giro. Sebaliknya, seorang nasabah kemungkinan 100% untuk tidak ikut dalam program Personal Equity Plan jika berpenghasilan antara US24.387sampaiUS 24.387 sampai US 43.758, dengan status menikah, dan belum mempunyai anak, serta memiliki rekening giro, dan tidak memiliki hipotek, selain itu seorang nasabah juga memiliki kemungkinan 100% tidak ikut dalam program Personal Equity Plan jika berumur mulai 52 tahun keatas, dengan status menikah, dan belum memiliki anak, serta mempunyai rekening tabungan dan rekening giro

    Comparing Association Rules and Deep Neural Networks on Medical Data

    Get PDF
    Deep neural networks are today's most popular tool for building predictive models across various different disciplines. A decade ago, the most popular predictive modeling technique was association rule mining. In this work, we carefully compare these two techniques in an effort to identify a more effective model with which to predict heart disease, a multi-prediction problem. Both techniques require significant knowledge, manual tuning, and experimentation to determine optimal parameters. Our goal was to build a predictive model that is at least as good as the best association rules across our entire data set. Promising results were obtained for some examples, while others still remain unclear. Making predictive models with medical data continues to be a challenging problem to solve that requires more attention from the scientific community.Computer Science, Department o

    A Formal Concept Analysis Approach to Association Rule Mining: The QuICL Algorithms

    Get PDF
    Association rule mining (ARM) is the task of identifying meaningful implication rules exhibited in a data set. Most research has focused on extracting frequent item (FI) sets and thus fallen short of the overall ARM objective. The FI miners fail to identify the upper covers that are needed to generate a set of association rules whose size can be exploited by an end user. An alternative to FI mining can be found in formal concept analysis (FCA), a branch of applied mathematics. FCA derives a concept lattice whose concepts identify closed FI sets and connections identify the upper covers. However, most FCA algorithms construct a complete lattice and therefore include item sets that are not frequent. An iceberg lattice, on the other hand, is a concept lattice whose concepts contain only FI sets. Only three algorithms to construct an iceberg lattice were found in literature. Given that an iceberg concept lattice provides an analysis tool to succinctly identify association rules, this study investigated additional algorithms to construct an iceberg concept lattice. This report presents the development and analysis of the Quick Iceberg Concept Lattice (QuICL) algorithms. These algorithms provide incremental construction of an iceberg lattice. QuICL uses recursion instead of iteration to navigate the lattice and establish connections, thereby eliminating costly processing incurred by past algorithms. The QuICL algorithms were evaluated against leading FI miners and FCA construction algorithms using benchmarks cited in literature. Results demonstrate that QuICL provides performance on the order of FI miners yet additionally derive the upper covers. QuICL, when combined with known algorithms to extract a basis of association rules from a lattice, offer a best known ARM solution. Beyond this, the QuICL algorithms have proved to be very efficient, providing an order of magnitude gains over other incremental lattice construction algorithms. For example, on the Mushroom data set, QuICL completes in less than 3 seconds. Past algorithms exceed 200 seconds. On T10I4D100k, QuICL completes in less than 120 seconds. Past algorithms approach 10,000 seconds. QuICL is proved to be the best known all around incremental lattice construction algorithm. Runtime complexity is shown to be O(l d i) where l is the cardinality of the lattice, d is the average degree of the lattice, and i is a mean function on the frequent item extents
    corecore