36,649 research outputs found

    ANALISIS KLASTER DENGAN METODE ENSEMBLE ROCK UNTUK DATA CAMPURAN : Studi Kasus Stunting di Provinsi Jawa Barat

    Get PDF
    Masalah stunting merupakan salah satu problematika gizi yang dihadapi di dunia, umumnya dialami oleh negara-negara berkembang dan miskin. Dalam pengukuran stunting memerlukan banyak variabel yang perlu dipertimbangkan untuk menentukan seorang balita mengalami stunting atau tidak. Oleh karena itu dapat dilakukan pengklasteran untuk data stunting tahun 2018 di Provinsi Jawa Barat dengan beberapa faktor yang dapat mempengaruhi stunting. Data stunting berupa data campuran (data kategorik dan data numerik). Salah satu permasalahan dalam analisis klaster adalah jika data yang akan dianalisis berupa data campuran. Pengklasteran untuk data campuran menggunakan metode ensemble ROCK (Robust Clustering using linKs) dengan cara menggabungkan output pengklasteran untuk data kategorik dan data numerik. Metode yang diterapkan untuk data kategorik adalah ROCK sedangkan untuk data numerik digunakan metode hirarki agglomeratif. Untuk menentukan hasil pengklasteran terbaik ditentukan berdasarkan kriteria rasio antara simpangan baku dalam (Sw) dan simpangan baku anatar kelompok (Sb) yang terkecil. Berdasarkan 735 responden , metode ensemble ROCK dengan nilai θ=0,10 menghasilkan dua klaster dengan nilai rasio sebesar 0,0145 adalah hasil pengklasteran yang terbaik dengan karakteristik hasil pengklasteran metode ensemble ROCK yang diperoleh menjelaskan bahwa klaster satu lebih baik daripada klaster dua karena masih banyaknya anak yang mengalami obesitas dan simpulan PB/TB/U dengan kategori pendek yang banyak. Stunting is one of the nutritional problems faced in the world, generally experienced by developing and poor countries. In measuring stunting requires many variables that need to be considered to determine whether a toddler experiences stunting or not. Therefore, clustering can be done for 2018 stunting data in the Province of West Java with several factors that can affect stunting. Stunting data is in the form of mixed data (categorical data and numeric data). One of the problems in cluster analysis is if the data to be analyzed is mixed data. Clustering for mixed data uses the ROCK ensemble method (Robust Clustering using linKs) by combining clustering output for categorical data and nuclear data. The method applied for categorical data is ROCK while for numerical data the agglomerative hierarchy method is used. The best clustering results are determined based on the ratio criteria between the standard deviation in (Sw) and the smallest standard deviation between groups (Sb). Based on 735 respondents, the ROCK ensemble method with a value of 0, = 0.10 produces two clusters with a ratio value of 0,0145 is the best clustering result with the characteristics of the ROCK ensemble clustering method obtained explains that cluster one is better than cluster two because there are still many children obese and PB / TB / U conclusions with many short categories

    Metode Ensembel Rock dan SWFM untuk pengelompokan data campuran Numerik dan kategorik pada kasus aksesi jeruk

    Get PDF
    Salah satu permasalahan yang sering ditemui dalam analisis kelompok adalah data yang berskala campuran numerik dan kategorik. Metode untuk mengelompokkan data campuran diantaranya metode ensembel dan metode Similarity Weight and Filter Method (SWFM). Tahap pengelompokan metode ensembel dilakukan dengan algoritma pengelompokan data kategorik, salah satunya adalah metode ROCK (RObust Clustering using linKs). Terdapat banyak penelitian dan pengembangan mengenai kedua metode tersebut, namun penelitian mengenai perbandingan kinerja dari kedua metode masih terbatas. Oleh karena itu, penelitian ini melakukan perbandingan kinerja antara metode ensembel ROCK dan ensembel SWFM. Kedua metode digunakan pada studi kasus pengelompokkan aksesi jeruk hasil fusi protoplasma yang merupakan data campuran numerik dan kategorik. Metode pengelompokan terbaik ditentukan dengan kriteria rasio antara simpangan baku di dalam kelompok (SW) dan simpangan baku antar kelompok (SB) terkecil. Berdasarkan 25 objek pengamatan pada studi kasus, metode ensembel ROCK dengan nilai θ sebesar 0,27 menghasilkan tiga kelompok dengan nilai rasio sebesar 0,1358, sedangkan metode ensembel SWFM menghasilkan dua kelompok dengan nilai rasio sebesar 0,3059. Hasil tersebut menunjukkan bahwa metode ensembel ROCK memberikan kinerja pengelompokan lebih baik daripada metode ensembel SWFM. Karakteristik hasil ensembel ROCK yang diperoleh adalah (a) kelompok 1 beranggotakan 10 aksesi dengan buah berukuran kecil tetapi berat, berkulit tebal, berwarna dominan kuning kehijauan, permukaan dominan halus, tekstur pulp beragam, dan berkadar air sedang, (b) kelompok 2 beranggotakan 7 aksesi dengan buah berukuran sedang tetapi berat, berkulit tipis, warna kulit beragam, permukaan beragam, tekstur pulp beragam, dan berkadar air tinggi, dan (c) kelompok 3 beranggotakan 3 aksesi dengan buah berukuran besar tetapi ringan, berkulit tebal, berwarna kuning kehijauan, permukaan halus, pulp lembut, dan berkadar air sedang. ============================================================================= One of the problems often encountered in clustering analysis (cluster) is a mixed numerical and categorical dataset. The method is used to analyze the mixed dataset including ensemble method and Similarity Weight and Filter Method (SWFM) method. In the ensemble method, the clustering is done with categorical data clustering algorithm, one of them is a ROCK method (Robust Clustering using links). There is a lot of research and development concerning both methods, but research about performance comparative between methods is still limited. Therefore, this study do a performance comparison between the ensemble ROCK method and ensemble SWFM method. Both of these method are used for the case study about clustering of citrus accessions which have a mixed numerical and categorical dataset. Best clustering method is determined by the smallest rasio of standard deviation in groups (SW) and standard deviation between groups (SB). Clustering result for 25 observation obtained by using the ensemble ROCK method with values of θ is 0,27 produces three groups of data with ratio value of 0,1358, while the ensemble SWFM method produces two groups of data with ratio value of 0,3059. These results suggest that ROCK ensemble method provides better performance than the ensemble SWFM method. Characteristics of ensemble ROCK’s results are (a) group 1 consisted of 10 accessions with a small but heavy fruit, thick-skinned, the color dominant greenish yellow, predominantly smooth surface, diverse pulp, and medium water content, (b) group 2 consisted of 7 accession with medium size but heavy fruit, thin-skinned, variety color, diverse surface, diverse pulp, and high water content, and (c) group 3 consists of 3 accession with large but lightweight fruit, thick-skinned, yellow-green color, smooth surface, soft pulp, and medium water content

    Ensemble Fuzzy, K-Prototypes & Density Peaks Clustering Mixed) pada Pengelompokan Data Pelamar Bidikmisi Sejawa-Timur Tahun 2016-2017

    Get PDF
    Metode Pengelompokan pada data mining berbeda dengan metode konvensional yang biasa digunakan untuk pengelompokkan. Perbedaannya adalah data mining memiliki dimensi data yang tinggi yaitu bisa terdiri dari puluhan ribu atau jutaan record dengan puluhan ataupun ratusan atribut. Selain itu pada data mining data bisa terdiri dari tipe data campuran seperti data numerik dan kategorik. Permasalahan yang sering ditemui dalam analisis pengelompokan adalah data yang berskala campuran numerik dan kategorik. Penelitian ini bertujuan untuk membandingkan hasil pengelompokan dari. Ensembel Fuzzy, K-Prototypes dan DPC-M. Ketiga Algoritma ini diterapkan untuk mengelompokkan pelamar beasiswa Bidikmisi di Jawa Timur selama tahun 2016-2017. Secara umum, validasi pengelompokan dapat dikategorikan ke dalam tiga kelas, yaitu validasi pengelompokan internal, validasi pengelompokan eksternal, dan validasi relatif. Pada penelitian ini kita fokus pada indeks validitas internal dan eksternal kelompok, berdasarkan hasil penelitian menunjukkan bahwa, secara keseluruhan, Algoritma Ensembel Fuzzy memiliki hasil pengelompokan yang lebih baik daripada Algoritma K-Prototypes dan DPC-M. =================================================================================================== The Clustering method in data mining differs from the conventional method commonly used for clustering. The difference is that data mining has a high data dimension that can consist of tens of thousands or millions of records with tens or hundreds of attributes. In addition to data mining data can consist of mixed data types such as numerical and categorical data. The problems that are often encountered in clustering analysis are numerical and categorical mixed data. This study aims to compare the results of clustering from. Fuzzy Ensembles, K-Prototypes and DPC-M. These three algorithms are applied to classify Bidikmisi scholarship applicants in East Java during 2016-2017. In general, clustering validation can be categorized into three classes, which are internal clustering validation, external clustering validation, and relative validation. In this study we focus on internal and external group validity indexes, based on the results of the research indicating that, overall, The Fuzzy Ensemble Algorithm has better clustering results than K-Prototypes Algorithm and DPC-M

    A General Spatio-Temporal Clustering-Based Non-local Formulation for Multiscale Modeling of Compartmentalized Reservoirs

    Full text link
    Representing the reservoir as a network of discrete compartments with neighbor and non-neighbor connections is a fast, yet accurate method for analyzing oil and gas reservoirs. Automatic and rapid detection of coarse-scale compartments with distinct static and dynamic properties is an integral part of such high-level reservoir analysis. In this work, we present a hybrid framework specific to reservoir analysis for an automatic detection of clusters in space using spatial and temporal field data, coupled with a physics-based multiscale modeling approach. In this work a novel hybrid approach is presented in which we couple a physics-based non-local modeling framework with data-driven clustering techniques to provide a fast and accurate multiscale modeling of compartmentalized reservoirs. This research also adds to the literature by presenting a comprehensive work on spatio-temporal clustering for reservoir studies applications that well considers the clustering complexities, the intrinsic sparse and noisy nature of the data, and the interpretability of the outcome. Keywords: Artificial Intelligence; Machine Learning; Spatio-Temporal Clustering; Physics-Based Data-Driven Formulation; Multiscale Modelin

    An ensemble approach to the analysis of weighted networks

    Get PDF
    We present a new approach to the calculation of measures in weighted networks, based on the translation of a weighted network into an ensemble of edges. This leads to a straightforward generalization of any measure defined on unweighted networks, such as the average degree of the nearest neighbours, the clustering coefficient, the `betweenness', the distance between two nodes and the diameter of a network. All these measures are well established for unweighted networks but have hitherto proven difficult to define for weighted networks. Further to introducing this approach we demonstrate its advantages by applying the clustering coefficient constructed in this way to two real-world weighted networks.Comment: 4 pages 3 figure

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Applying weighted network measures to microarray distance matrices

    Full text link
    In recent work we presented a new approach to the analysis of weighted networks, by providing a straightforward generalization of any network measure defined on unweighted networks. This approach is based on the translation of a weighted network into an ensemble of edges, and is particularly suited to the analysis of fully connected weighted networks. Here we apply our method to several such networks including distance matrices, and show that the clustering coefficient, constructed by using the ensemble approach, provides meaningful insights into the systems studied. In the particular case of two data sets from microarray experiments the clustering coefficient identifies a number of biologically significant genes, outperforming existing identification approaches.Comment: Accepted for publication in J. Phys.
    • …
    corecore