3 research outputs found

    Reduksi Dimensi Set Data dengan Drc pada Metode Klasifikasi Svm dengan Upaya Penambahan Komponen Ketiga

    Get PDF
    Set data yang diolah dalam sistem seperti data mining, information retrieval, computer vision, atau sistem-sistem lain yang menggunakan set data sebagai basis data utama dalam menyelesaikan kasus yang ditangani, bisa memiliki ukuran yang sangat besar dalam hal jumlah fitur yang digunakan. Banyak keuntungan yang didapat jika dilakukan reduksi dimensi. Kunci keuntungannya adalah banyak algoritma data mining yang bekerja dengan baik jika dimensi lebih rendah. Penelitian ini mengembangan metode Dimension Reduction Technique for K-Means Clustering Algorithm (DRC) dengan menambahkan komponen ketiga yaitu z. Hasilnya, kinerja akurasi metode yang diusulkan (DRC 3 DIM) dalam mereduksi dimensi pada metode klasifikasi SVM mampu memberikan akurasi yang tetap relatif baik ketika jumlah dimensi awal masih tidak banyak. Sedangkan waktu komputasi yang dibutuhkan, baik untuk training maupun prediksi masih dapat ditoleransi untuk dapat digunakan, setelah mempertimbangkan bahwa waktu training dan prediksi berada pada level pertengahan ketika dibandingkan dengan metode pembanding

    A Fuzzy Clustering Algorithm for High Dimensional Streaming Data

    Get PDF
    In this paper we propose a dimension reduced weighted fuzzy clustering algorithm (sWFCM-HD). The algorithm can be used for high dimensional datasets having streaming behavior. Such datasets can be found in the area of sensor networks, data originated from web click stream and data collected by internet traffic flow etc. These data’s have two special properties which separate them from other datasets: a) They have streaming behavior and b) They have higher dimensions. Optimized fuzzy clustering algorithm has already been proposed for datasets having streaming behavior or higher dimensions. But as per our information, nobody has proposed any optimized fuzzy clustering algorithm for data sets having both the properties, i.e., data sets with higher dimension and also continuously arriving streaming behavior. Experimental analysis shows that our proposed algorithm (sWFCM-HD) improves performance in terms of memory consumption as well as execution time Keywords-K-Means, Fuzzy C-Means, Weighted Fuzzy C-Means, Dimension Reduction, Clustering

    Data mining for heart failure : an investigation into the challenges in real life clinical datasets

    Get PDF
    Clinical data presents a number of challenges including missing data, class imbalance, high dimensionality and non-normal distribution. A motivation for this research is to investigate and analyse the manner in which the challenges affect the performance of algorithms. The challenges were explored with the help of a real life heart failure clinical dataset known as Hull LifeLab, obtained from a live cardiology clinic at the Hull Royal Infirmary Hospital. A Clinical Data Mining Workflow (CDMW) was designed with three intuitive stages, namely, descriptive, predictive and prescriptive. The naming of these stages reflects the nature of the analysis that is possible within each stage; therefore a number of different algorithms are employed. Most algorithms require the data to be distributed in a normal manner. However, the distribution is not explicitly used within the algorithms. Approaches based on Bayes use the properties of the distributions very explicitly, and thus provides valuable insight into the nature of the data.The first stage of the analysis is to investigate if the assumptions made for Bayes hold, e.g. the strong independence assumption and the assumption of a Gaussian distribution. The next stage is to investigate the role of missing values. Results found that imputation does not affect the performance as much as those records which are initially complete. These records are often not outliers, but contain problem variables. A method was developed to identify these. The effect of skews in the data was also investigated within the CDMW. However, it was found that methods based on Bayes were able to handle these, albeit with a small variability in performance. The thesis provides an insight into the reasons why clinical data often causes problems. Even the issue of imbalanced classes is not an issue, for Bayes is independent of this
    corecore