7 research outputs found

    Klasterisasi Menggunakan Agglomerative Hierarchical Clustering Untuk Memodelkan Wilayah Banjir

    Get PDF
    Setiap tahun selama musim hujan masalah banjir di provinsi Jawa Timur adalah bencana yang sering terjadi. Berdasarkan catatan Badan Penanggulangan Bencana (BNPB) dari 2014 hingga 2015 ada 574 bencana banjir di provinsi Jawa Timur, Banyak faktor yang menyebabkan terjadinya bencana banjir diantaranya adalah lambatnya informasi yang didapat sehingga diperlukan suatu pemodelan wilayah potensi banjir di Jawa Timur dengan menggunakan metode yang lebih akurat dan efisien dengan menggunakan suatu metode Agglomerative Hierarchical Clustering (AHC). Metode ini akan digunakan untuk melakukan pengelompokkan dengan uji performansi menggunakan metode cophenetic correlation coefficient. Hasil dari penelitian ini divisualisasikan kedalam bentuk SIG. Berdasarkan hasil uji cluster optimal dengan elbow method, provinsi Jawa Timur terbagi menjadi 3 kelompok cluster daerah terdampak potensi banjir yaitu karakteristik rendah, sedang, tinggi. Hasil uji performa cluster menggunakan cophenetic correlation coefficient menunjukkan bahwa metode average linkage memberikan solusi cluster yang lebih baik dibandingkan dengan metode AHC lainnya yakni sebesar 0,9

    Nuevo marco para utilizar la minería de datos y reglas de asociación para la clasificación de la gravedad de accidentes de tráfico

    Get PDF
    Introduction: Traffic accidents are an undesirable burden on society. Every year around one million deaths and more than ten million injuries are reported due to traffic accidents. Hence, traffic accidents prevention measures must be taken to overcome the accident rate. Different countries have different geographical and environmental conditions and hence the accident factors diverge in each country. Traffic accident data analysis is very useful in revealing the factors that affect the accidents in different countries. This article was written in the year 2016 in the Institute of Technology & Science, Mohan Nagar, Ghaziabad, up, India. Methology: We propose a framework to utilize association rule mining (arm) for the severity classification of traffic accidents data obtained from police records in Mujjafarnagar district, Uttarpradesh, India. Results: The results certainly reveal some hidden factors which can be applied to understand the factors behind road accidentality in this region. Conclusions: The framework enables us to find three clusters from the data set. Each cluster represents a type of accident severity, i.e. fatal, major injury and minor/no injury. The association rules exposed different factors that are associated with road accidents in each category. The information extracted provides important information which can be employed to adapt preventive measures to overcome the accident severity in Muzzafarnagar district.Introducción: los accidentes de tránsito son una carga indeseable para la sociedad. Cada año se reportan alrededor de un millón de muertes y más de diez millones de lesiones debido a accidentes de tráfico. Por lo tanto, se deben implementar medidas de prevención de accidentes de tráfico para superar la tasa de accidentalidad. Los países tienen diferentes condiciones geográficas y ambientales y, por ello, las variables que inciden varían en cada país. El análisis de los datos de accidentes de tráfico es muy útil para revelar los factores o variables que inciden en la accidentalidad en diferentes países. Este artículo fue escrito en el 2016 en el Instituto de Tecnología y Ciencia, Mohan Nagar, Ghaziabad, UP, India. Metodología: proponemos un marco para utilizar la minería de datos y reglas de asociación (arm) para la clasificación de severidad de los datos de accidentes de tráfico obtenidos de registros policiales en eldistrito de Mujjafarnagar, Uttarpradesh, India Resultados: los resultados revelan ciertamente algunos factores ocultos que se pueden aplicar para entender las variables detrás de la accidentalidad de tráfico en esta región. Conclusiones: el marco permite establecer tres categorías en el conjunto de datos que representan el tipo de gravedad del accidente: fatal, lesiones graves, y lesiones menores o inexistentes. Las reglas de asociación expusieron diferentes factores relacionados con los accidentes de tráfico en cada categoría. Los datos extraídos proporcionan información importante que se puede emplear para adaptar las medidas preventivas para superar la gravedad de los accidentes de tráfico en el distrito de Muzzafarnagar

    Büyük veride hiyerarşik kümeleme yöntemlerinin kofenetik korelasyon katsayısı ile karşılaştırılması

    Get PDF
    The aim of this study is to compare hierarchical clustering methods by Cophenetic Correlation Coefficient (CCC) when there is a big data. For this purpose, after giving information about big data, clustering methods and CCC, analyzes are carried out for the related data set. The 2015 air travel consumer report, which was used in the application part of the study and published by the US Ministry of Transport, was used as big data. Libraries of the Python programming language installed on the Amazon cloud server, which includes open-source big data technologies, were used for data analysis. Since there is big data in the study, in order to save time and economy, the variables used in the study were first reduced by feature selection method, standardized and analyzed over the final 4 different data sets. As a result of the clustering analysis, it was observed that the highest CCC was obtained with the Average clustering method for all of these four different data sets.Bu çalışmanın amacı büyük veri söz konusu olduğunda hiyerarşik kümeleme yöntemlerini Kofenetik korelasyon katsayı ise karşılaştırmaktır. Bu amaçla büyük veri, kümeleme yöntemleri ve Kofenetik korelasyon katsayısı hakkında bilgiler verildikten sonra ele alınan veri seti için analizler gerçekleştirilmiştir. Çalışmanın uygulama kısmında kullanılan ve büyük veri olarak ABD ulaştırma bakanlığı tarafından yayınlanan 2015 yılı hava seyahat tüketici raporu kullanılmıştır. Veri analizi için açık kaynaklı büyük veri teknolojilerini içeren Amazon bulut sunucusuna kurulan Python programlama diline ait kütüphanelerden yararlanılmıştır. Çalışmada büyük veri söz konusu olduğundan, zamandan ve maliyetten tasarruf amacıyla çalışmada kullanılan değişkenler ilk olarak özellik seçimi yöntemi ile indirgenmiş, standardize edilmiş ve nihai 4 farklı veri seti üzerinden çözümlemeye gidilmiştir. Kümeleme analiz sonucunda bu dört farklı veri setinin tamamı için en yüksek Kofenetik korelasyon katsayısının ortalama bağlantı kümeleme yöntemi ile elde edildiği gözlemlenmiştir

    Data Mining Approach of Accident Occurrences Identification with Effective Methodology and Implementation

    Get PDF
    Data mining is used in various domains of research to identify a new cause for tan effect in the society over the globe. This article includes the same reason for using the data mining to identify the Accident Occurrences in different regions and to identify the most valid reason for happening accidents over the globe. Data Mining and Advanced Machine Learning algorithms are used in this research approach and this article discusses about hyperline, classifications, pre-processing of the data, training the machine with the sample datasets which are collected from different regions in which we have structural and semi-structural data. We will dive into deep of machine learning and data mining classification algorithms to find or predict something novel about the accident occurrences over the globe. We majorly concentrate on two classification algorithms to minify the research and task and they are very basic and important classification algorithms. SVM (Support vector machine), CNB Classifier. This discussion will be quite interesting with WEKA tool for CNB classifier, Bag of Words Identification, Word Count and Frequency Calculation

    Neuroinformatics approach: Hierarchical cluster analysis of indonesian provinces based on people's welfare indicators in the realm of data science and network studies

    Get PDF
    The welfare of people has always piqued our interest, and it remains the primary goal of nations around the world in their development endeavors. To effectively drive development efforts, it is critical to understand the diverse welfare features that exist in different locations. Thus, the purpose of this statistical analysis is to classify Indonesian provinces based on a comprehensive set of People's Welfare Indicators, which includes Population Density (PD), Percentage of Poor Population (PPP), Life Expectancy Rate (LER), and Average Years of Schooling (AYS). The methodology used in this study is Hierarchical Cluster Analysis, which employs five distinctive techniques: Single Linkage, Average Linkage, Complete Linkage, Ward's Linkage, and the Centroid Method. The data for this study was obtained from reliable secondary sources, notably the official website of the Central Bureau of Statistics (BPS), and it provides insights on Indonesia's welfare picture in 2021. The average linkage approach shows as the most suitable of the five hierarchical cluster analysis methods used, with the closest cophenetic correlation to 1. The analysis reveals three distinctive clusters within the Indonesian context. Cluster 1 demonstrates a tendency toward low PWI (People's Welfare Index) status, while Cluster 2 exhibits a notably high PWI status. Cluster 3 occupies an intermediate position, characterized by moderate PWI status. These findings not only give useful classification but also act as an important reference point for the Indonesian government. They provide an in-depth insight into each province's distinct welfare features, supporting smart resource allocation and prioritizing aid distribution in regions of highest need. As a result, this research is an essential resource for creating equitable and effective policies and methods to improve people's well-being throughout Indonesia

    Klasterisasi Menggunakan Agglomerative Hierarchical Clustering Untuk Memodelkan Wilayah Banjir

    Get PDF
    Every year during the rainy season the problem of flooding in the province of East Java is a frequent disaster. Based on the records of the Disaster Management Agency (BNPB) from 2014 to 2015 there were 574 flood disasters in the province of East Java. Many factors cause floods, including the slow information obtained, so we need a modeling of potential flood areas in East Java using a more accurate and efficient method using an Agglomerative Hierarchical Clustering (AHC) method. This method will be used to group performance tests using the silhouette score method and cophenetic correlation coefficient. The results of this study are visualized in the form of GIS. Based on the optimal cluster results with the elbow method, the province of East Java is divided into 3 clusters of areas affected by flood potential i.e. low, medium, and high characteristics. Cluster performance results using cophenetic correlation coefficient indicate that the average linkage method provides a better cluster solution compared to other AHC methods which is 0.92. Keywords: floods; clustering; agglomerative; elbow method; cophenetic correlation coefficien

    Evaluasi Metode Hierarchical Clustering Berbasis Linkage pada MWMOTE : Studi Kasus Data Akademik Universitas XYZ dan Data UCI

    Get PDF
    Ketidakseimbangan (Imbalanced) data terjadi pada berbagai macam data termasuk data akademik Universitas XYZ dan data UCI. Kasus tersebut menyebabkan adanya misclassified dikarenakan data mayoritas dominan terhadap data minoritas yang berakibat pada menurunnya nilai akurasi. Metode MWMOTE dapat menjadi pilihan dalam menyelesaikan kasus imbalanced melalui pembobotan dan clustering. Penelitian ini bertujuan menangani permasalahan imbalanced dataset akademik di Universitas XYZ angkatan 2014 dan 2015 dan data UCI dengan mengevaluasi hierarchical clustering. Tujuan tersebut dicapai dengan mengevaluasi tiga metoda hierarchical cluster sebagai salah satu sub proses pada MWMOTE untuk menghasilkan data sintetik yang lebih representatif. Hasil yang didapat dari penelitian ini adalah ketiga metoda AHC tersebut tidak memberikan perbedaan yang signifikan dalam perbaikan akurasi MWMOTE pada data akademik dan 7 data UCI yang diuji dengan one-way ANOVA dengan nilai sig/alpha > 0.0
    corecore