Search CORE

35 research outputs found

Using Optimization-Based Classification Method for Massive Datasets

Author: Chen Zhengxin
Kou Gang
Peng Yi
Shi Yong
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2005
Field of study

Optimization-based algorithms, such as Multi-Criteria Linear programming (MCLP), have shown their effectiveness in classification. Nevertheless, due to the limitation of computation power and memory, it is difficult to apply MCLP, or similar optimization methods, to huge datasets. As the size of today’s databases is continuously increasing, it is highly important that data mining algorithms are able to perform their functions regardless of dataset sizes. The objectives of this paper are: (1) to propose a new stratified random sampling and majority-vote ensemble approach, and (2) to compare this approach with the plain MCLP approach (which uses only part of the training set), and See5 (which is a decision-tree-based classification tool designed to analyze substantial datasets), on KDD99 and KDD2004 datasets. The results indicate that this new approach not only has the potential to handle arbitrary-size of datasets, but also outperforms the plain MCLP approach and achieves comparable classification accuracy to See5

AIS Electronic Library (AISeL)

An Improved Approximation Algorithm for the Hard Uniform Capacitated k-median Problem

Author: Li Shanfei
Publication venue
Publication date: 01/01/2014
Field of study

In the

k

-median problem, given a set of locations, the goal is to select a subset of at most

k

centers so as to minimize the total cost of connecting each location to its nearest center. We study the uniform hard capacitated version of the

k

-median problem, in which each selected center can only serve a limited number of locations. Inspired by the algorithm of Charikar, Guha, Tardos and Shmoys, we give a

(6+10\alpha)

-approximation algorithm for this problem with increasing the capacities by a factor of

2+\frac{2}{\alpha}, \alpha\geq 4

, which improves the previous best

(32 l^2+28 l+7)

-approximation algorithm proposed by Byrka, Fleszar, Rybicki and Spoerhase violating the capacities by factor

2+\frac{3}{l-1}, l\in \{2,3,4,\dots\}

.Comment: 19 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Approximating $k$ -Median via Pseudo-Approximation

Author: Li Shi
Svensson Ola
Publication venue
Publication date: 01/11/2012
Field of study

We present a novel approximation algorithm for

k

-median that achieves an approximation guarantee of

1+\sqrt{3}+\epsilon

, improving upon the decade-old ratio of

3+\epsilon

. Our approach is based on two components, each of which, we believe, is of independent interest. First, we show that in order to give an

\alpha

-approximation algorithm for

k

-median, it is sufficient to give a \emph{pseudo-approximation algorithm} that finds an

\alpha

-approximate solution by opening

k+O(1)

facilities. This is a rather surprising result as there exist instances for which opening

k+1

facilities may lead to a significant smaller cost than if only

k

facilities were opened. Second, we give such a pseudo-approximation algorithm with

\alpha= 1+\sqrt{3}+\epsilon

. Prior to our work, it was not even known whether opening

k + o(k)

facilities would help improve the approximation ratio.Comment: 18 page

arXiv.org e-Print Archive

CiteSeerX

PENERAPAN MARKET BASKET ANALYSIS PADA POLA PEMBELIAN BARANG OLEH KONSUMEN MENGGUNAKAN METODE ALGORITMA APRIORI

Author: Perdana Hendra
Ramadana Wahyu Diyan
Satyahadewi Neva
Publication venue: 'Tanjungpura University'
Publication date: 31/05/2022
Field of study

Penerapan data mining banyak digunakan dalam berbagai macam bidang, terlebih lagi dalam bidang bisnis retail yang ada. Pola pembelian barang oleh konsumen menjadi tujuan utama yang perlu dihadapi dalam bisnis retail tersebut. Pengetahuan tentang pola pembelian barang oleh konsumen tersebut dapat dimanfaatkan dalam menyusun tata letak produk guna mempermudah konsumen dalam proses pembelian. Pada penelitian ini digunakan algoritma apriori dalam menentukan pola pembelian konsumen dalam satu keranjang belanja. Data yang digunakan dalam penelitian ini merupakan data sekunder yang diperoleh dari transaksi pembelian konsumen pada toko retail pada periode Januari hingga Februari 2020. Data yang digunakan berjumlah 200 transaksi pembelian yang terjadi. Sebelum dicari aturan asosiasi terlebih dahulu menghitung nilai support masing-masing item, apabila nilai support kurang dari minimum support yang ditentukan maka item akan dipangkas sedangkan item yang memiliki nilai yang memenuhi kriteria minimum support, termasuk kedalam frequent k-itemset. Setelah semua frequent k-itemset terbentuk selanjutnya akan dicari pola asosiasi (association rules) dengan didasarkan pada nilai minimmum confidence, item yang memenuhi kriteria nilai minimum confidence akan digunakan dalam mengambil aturan pembelian. Pada penelitian ini digunakan nilai minimum support sebesar 1% dan nilai minimum confidence sebesar 80%, dari nilai minimum support dan minimum confidence yang digunakan tersebut diperoleh 10 aturan atau pola pembelian barang oleh konsumen yang dapat dijadikan acuan dalam menyusun tata letak pada penyimpanan barang yang dijual. Kata Kunci: Data mining, Market Basket Analysis, Algoritma Aprior

BIMASTER

Application of Classification Data Mining Technique for Pattern Analysis of Student Graduation Data with Emerging Pattern Method

Author: Handayani Aditya
Perdana Hendra
Satyahadewi Neva
Publication venue: 'Universitas Pattimura'
Publication date: 24/04/2023
Field of study

Data mining has been applied in various fields of life because it is very helpful in extracting information from large data sets. Student graduation data is one example of data that can be extracted for information and become a recommendation. This study used a classification data mining technique to extract information from the student graduation data. The classification technique used was the Emerging Pattern method to search for patterns in the student graduation data. The data in this study were graduation data for students of the Statistics Study Program, Faculty of Mathematics and Natural Sciences, Tanjungpura University, from 2013-2018. The sample data used amounted to 186 records. Attributes used in this study include as many as four attributes, including gender, batch, GPA, and TUTEP scores. This research began by finding the class and frequency values obtained. It was continued by calculating each item set's support, growth rate, and confidence values. This study obtained the highest confidence value among all the attributes owned, namely 91% in the 2013 batch itemized list and the 2018 batch. Female students dominated the class attribute. TUTEP dominated the TUTEP value attribute with a score of 425, and the GPA attribute of 3.51-4.00 dominated the class with a confidence value of 60%

OJS UNPATTI Publication Center (Universitas Pattimura)

SPLITTING METHODS FOR DECISION TREE INDUCTION: A COMPARISON OF TWO FAMILIES

Author: Giles Kendall
Osei-Bryson Kweku-Muata
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2002
Field of study

AIS Electronic Library (AISeL)