Analyzing and visualizing data dengue hotspot location

Abstract

In this paper, we will explore the Dengue Hotspot Location training data set that publicly available at data.gov.my. The data set consists of 10,116 cases reported according to respective district in Malaysia for 5 years, starting from 2011 until 2015. The dataset contain 7 columns which are: Tahun, Minggu, Negeri, Daerah/Zon, Lokaliti, Jumlah Kes Terkumpul, and Tempoh Wabak Berlaku (Hari). The purpose of this study is to measure strength of the correlation between all variables in dataset Dengue Hotspot Location. This paper also focused primarily on the selection of suitable variables from a large data set and imputation of missing values. Many statistical models has proven to be fail with missing values. Besides, many researchers had proposed various ways to handle missing values. However, in this paper we demonstrate our approach for analyzing data with one of the machine learning classifier, Naïve Bayes. The choices were made from the highest accuracy among four machine learning classifiers experimented in the previous paper (Abidin, Ritahani, & Emran, 2018)

    Similar works