35 research outputs found

    Using Optimization-Based Classification Method for Massive Datasets

    Get PDF
    Optimization-based algorithms, such as Multi-Criteria Linear programming (MCLP), have shown their effectiveness in classification. Nevertheless, due to the limitation of computation power and memory, it is difficult to apply MCLP, or similar optimization methods, to huge datasets. As the size of today’s databases is continuously increasing, it is highly important that data mining algorithms are able to perform their functions regardless of dataset sizes. The objectives of this paper are: (1) to propose a new stratified random sampling and majority-vote ensemble approach, and (2) to compare this approach with the plain MCLP approach (which uses only part of the training set), and See5 (which is a decision-tree-based classification tool designed to analyze substantial datasets), on KDD99 and KDD2004 datasets. The results indicate that this new approach not only has the potential to handle arbitrary-size of datasets, but also outperforms the plain MCLP approach and achieves comparable classification accuracy to See5

    An Improved Approximation Algorithm for the Hard Uniform Capacitated k-median Problem

    Full text link
    In the kk-median problem, given a set of locations, the goal is to select a subset of at most kk centers so as to minimize the total cost of connecting each location to its nearest center. We study the uniform hard capacitated version of the kk-median problem, in which each selected center can only serve a limited number of locations. Inspired by the algorithm of Charikar, Guha, Tardos and Shmoys, we give a (6+10α)(6+10\alpha)-approximation algorithm for this problem with increasing the capacities by a factor of 2+2α,α≥42+\frac{2}{\alpha}, \alpha\geq 4, which improves the previous best (32l2+28l+7)(32 l^2+28 l+7)-approximation algorithm proposed by Byrka, Fleszar, Rybicki and Spoerhase violating the capacities by factor 2+3l−1,l∈{2,3,4,… }2+\frac{3}{l-1}, l\in \{2,3,4,\dots\}.Comment: 19 pages, 1 figur

    Approximating kk-Median via Pseudo-Approximation

    Full text link
    We present a novel approximation algorithm for kk-median that achieves an approximation guarantee of 1+3+ϵ1+\sqrt{3}+\epsilon, improving upon the decade-old ratio of 3+ϵ3+\epsilon. Our approach is based on two components, each of which, we believe, is of independent interest. First, we show that in order to give an α\alpha-approximation algorithm for kk-median, it is sufficient to give a \emph{pseudo-approximation algorithm} that finds an α\alpha-approximate solution by opening k+O(1)k+O(1) facilities. This is a rather surprising result as there exist instances for which opening k+1k+1 facilities may lead to a significant smaller cost than if only kk facilities were opened. Second, we give such a pseudo-approximation algorithm with α=1+3+ϵ\alpha= 1+\sqrt{3}+\epsilon. Prior to our work, it was not even known whether opening k+o(k)k + o(k) facilities would help improve the approximation ratio.Comment: 18 page

    PENERAPAN MARKET BASKET ANALYSIS PADA POLA PEMBELIAN BARANG OLEH KONSUMEN MENGGUNAKAN METODE ALGORITMA APRIORI

    Get PDF
    Penerapan data mining banyak digunakan dalam berbagai macam bidang, terlebih lagi dalam bidang bisnis retail yang ada. Pola pembelian barang oleh konsumen menjadi tujuan utama yang perlu dihadapi dalam bisnis retail tersebut. Pengetahuan tentang pola pembelian barang oleh konsumen tersebut dapat dimanfaatkan dalam menyusun tata letak produk guna mempermudah konsumen dalam proses pembelian. Pada penelitian ini digunakan algoritma apriori dalam menentukan pola pembelian konsumen dalam satu keranjang belanja. Data yang digunakan dalam penelitian ini merupakan data sekunder yang diperoleh dari transaksi pembelian konsumen pada toko retail pada periode Januari hingga Februari 2020. Data yang digunakan berjumlah 200 transaksi pembelian yang terjadi. Sebelum dicari aturan asosiasi terlebih dahulu menghitung nilai support masing-masing item, apabila nilai support kurang dari minimum support yang ditentukan maka item akan dipangkas sedangkan item yang memiliki nilai yang memenuhi kriteria minimum support, termasuk kedalam frequent k-itemset. Setelah semua frequent k-itemset terbentuk selanjutnya akan dicari pola asosiasi (association rules) dengan didasarkan pada nilai minimmum confidence, item yang memenuhi kriteria nilai minimum confidence akan digunakan dalam mengambil aturan pembelian.  Pada penelitian ini digunakan nilai minimum support sebesar 1% dan nilai minimum confidence sebesar 80%, dari nilai minimum support dan minimum confidence yang digunakan tersebut diperoleh 10 aturan atau pola pembelian barang oleh konsumen yang dapat dijadikan acuan dalam menyusun tata letak pada penyimpanan barang yang dijual. Kata Kunci: Data mining, Market Basket Analysis, Algoritma Aprior

    Application of Classification Data Mining Technique for Pattern Analysis of Student Graduation Data with Emerging Pattern Method

    Get PDF
    Data mining has been applied in various fields of life because it is very helpful in extracting information from large data sets. Student graduation data is one example of data that can be extracted for information and become a recommendation. This study used a classification data mining technique to extract information from the student graduation data. The classification technique used was the Emerging Pattern method to search for patterns in the student graduation data. The data in this study were graduation data for students of the Statistics Study Program, Faculty of Mathematics and Natural Sciences, Tanjungpura University, from 2013-2018. The sample data used amounted to 186 records. Attributes used in this study include as many as four attributes, including gender, batch, GPA, and TUTEP scores. This research began by finding the class and frequency values obtained. It was continued by calculating each item set's support, growth rate, and confidence values. This study obtained the highest confidence value among all the attributes owned, namely 91% in the 2013 batch itemized list and the 2018 batch. Female students dominated the class attribute. TUTEP dominated the TUTEP value attribute with a score of 425, and the GPA attribute of 3.51-4.00 dominated the class with a confidence value of 60%
    corecore