11 research outputs found

    Federated deep learning for automated detection of diabetic retinopathy

    Get PDF
    Diabetic retinopathy (DR) is a primary cause of impaired vision that can lead to permanent blindness if not detected and treated early. Unfortunately, DR frequently has no early warning signs and may not generate any symptoms. According to recent figures, over 382 million people worldwide suffer from DR, with the number expected to climb to 592 million by 2030. Patients with DR may not be treated in time given the apparent large number of DR patients and inadequate medical resources in specific places, resulting in missed treatment possibilities and eventually irreversible vision loss. Color fundus diagnosis requires highly experienced experts to recognize the existence of tiny features and the relevance of DR. Unfortunately, manually diagnosing DR is time-consuming, tedious and error-prone. At the same time, the effect of manual interpretation is highly dependent on the medical expert experiences. Deep learning is a machine learning algorithm with potential for detecting the significance of DR. However, deep learning still suffers from high computational cost, requires tons of training data, over fitting, and non-trivial hyper parameter tuning. Thus, in order to build a model that can compete with medical experts, deep learning algorithms must feed a huge number of instances or pool data from other institutions. Federated learning allows deep learning algorithms to learn from a diverse set of data stored in multiple databases. Federated learning is a novel method for training deep learning models on local DR patient data, with just model parameters exchanged between medical facilities. The objectives of this research is to avoid the requirement sharing DR patient data, since such approaches expedite the development of deep learning models through the use of federated learning. Primarily, we propose a federated learning which decentralizes deep learning by eliminating the need to pool data in a single location. In this research, we present a practical method for the federated learning of deep network based on retinal image of diabetic retinopathy

    Performance analysis of machine learning algorithms for missing value imputation

    Get PDF
    Data mining requires a pre-processing task in which the data are prepared, cleaned, integrated, transformed, reduced and discretized for ensuring the quality. Missing values is a universal problem in many research domains that is commonly encountered in the data cleaning process. Missing values usually occur when a value of stored data absent for a variable of an observation. Missing values problem imposes undesirable effect on analysis results, especially when it leads to biased parameter estimates. Data imputation is a common way to deal with missing values where the missing value's substitutes are discovered through statistical or machine learning techniques. Nevertheless, examining the strengths (and limitations) of these techniques is important to aid understanding its characteristics. In this paper, the performance of three machine learning classifiers (K-Nearest Neighbors (KNN), Decision Tree, and Bayesian Networks) are compared in terms of data imputation accuracy. The results shows that among the three classifiers, Bayesian has the most promising performance. ยฉ 2015 The Science and Information (SAI) Organization Limited

    Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare

    Get PDF
    Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has been widely adopted as an imputation for missing data due to its robustness and simplicity and it is also a promising method to outperform other machine learning methods. This paper provides a comprehensive review of different imputation techniques used to replace the missing data. The goal of the review paper is to bring specific attention to potential improvements to existing methods and provide readers with a better grasps of imputation technique trends

    Systematic review on missing data imputation techniques with machine learning algorithms for healthcare

    Get PDF
    Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has been widely adopted as an imputation for missing data due to its robustness and simplicity and it is also a promising method to outperform other machine learning methods. This paper provides a comprehensive review of different imputation techniques used to replace the missing data. The goal of the review paper is to bring specific attention to potential improvements to existing methods and provide readers with a better grasps of imputation technique trends

    A particle swarm optimization levy flight algorithm for imputation of missing creatinine dataset

    Get PDF
    Clinicians could intervene during what may be a crucial stage for preventing permanent kidney injury if patients with incipient Acute Kidney Injury (AKI) and those at high risk of developing AKI could be identified. This paper proposes an improved mechanism to machine learning imputation algorithms by introducing the Particle Swarm Levy Flight algorithm. We improve the algorithms by modifying the Particle Swarm Optimization Algorithm (PSO), by enhancing the algorithm with levy flight (PSOLF). The creatinine dataset that we collected, including AKI diagnosis and staging, mortality at hospital discharge, and renal recovery, are tested and compared with other machine learning algorithms such as Genetic Algorithm and traditional PSO. The proposed algorithms' performances are validated with a statistical significance test. The results show that SVMPSOLF has better performance than the other method. This research could be useful as an important tool of prognostic capabilities for determining which patients are likely to suffer from AKI, potentially allowing clinicians to intervene before kidney damage manifests

    Software project estimation with machine learning

    Get PDF
    This project involves research about software effort estimation using machine learning algorithms. Software cost and effort estimation are crucial parts of software project development. It determines the budget, time and resources needed to develop a software project. One of the well-established software project estimation models is Constructive Cost Model (COCOMO) which was developed in the 1980s. Even though such a model is being used, COCOMO has some weaknesses and software developers still facing the problem of lack of accuracy of the effort and cost estimation. Inaccuracy in the estimated effort will affect the schedule and cost of the whole project as well. The objective of this research is to use several algorithms of machine learning to estimate the effort of software project development. The best machine learning model is chosen to compare with the COCOMO

    Analyzing and visualizing data dengue hotspot location

    No full text
    In this paper, we will explore the Dengue Hotspot Location training data set that publicly available at data.gov.my. The data set consists of 10,116 cases reported according to respective district in Malaysia for 5 years, starting from 2011 until 2015. The dataset contain 7 columns which are: Tahun, Minggu, Negeri, Daerah/Zon, Lokaliti, Jumlah Kes Terkumpul, and Tempoh Wabak Berlaku (Hari). The purpose of this study is to measure strength of the correlation between all variables in dataset Dengue Hotspot Location. This paper also focused primarily on the selection of suitable variables from a large data set and imputation of missing values. Many statistical models has proven to be fail with missing values. Besides, many researchers had proposed various ways to handle missing values. However, in this paper we demonstrate our approach for analyzing data with one of the machine learning classifier, Naรฏve Bayes. The choices were made from the highest accuracy among four machine learning classifiers experimented in the previous paper (Abidin, Ritahani, & Emran, 2018)

    An improved K-Nearest neighbour with grasshopper optimization algorithm for imputation of missing data

    No full text
    K-nearest neighbors (KNN) has been extensively used as imputation algorithm to substitute missing data with plausible values. One of the successes of KNN imputation is the ability to robustly measure the missing data simulated from its nearest neighbors. However, despite the favorable points of KNN still imposes undesirable circumstances. KNN suffers from high time complexity, choosing the right k, and choice of different functions. Thus, this paper proposes a novel method for imputation of missing data, named KNNGOA, which optimized KNN imputation technique based on the grasshopper optimization algorithm. Our GOA is designed to find the best value of k and optimize the imputed value from KNN that maximizes the imputation accuracy. Experimental evaluation for different types of datasets collected from UCI, with various rates of missing values ranges from 10%, 30% and 50%. From the experiment conducted, our proposed algorithm has achieved promising results, which outperformed other methods especially in terms of accuracy

    Collaborative requirements review

    No full text
    Requirements review is a formal review conducted to ensure that system requirements have been completely and clearly identified. In the conventional requirements review technique, reviewers are required to physically attend the review session and give their review feed-back during the session. In such a situation, there are potential problems of scheduling the review session based on the availability of the reviewers and issues with having to physically attend the entire review session. Furthermore, the review session needs to be manually organized by the review leader and the outcome of the session need to be manually compiled. In some occasions, more than one sessions are required to be organized. Hence, the objectives of this project are to (1) create means for the reviewers to perform review anytime and anywhere; (2) facilitate collaborative review session; (3) support checklist management for the review guidance; and (4) allow compilation of the review feedback to be generated. As a case study, a web application for collaborative requirements review has been developed and tested. Finally, the features of the application are tested and issues are documented

    Optimized COCOMO parameters using hybrid particle swarm optimization

    No full text
    Software effort and cost estimation are crucial parts of software project development. It determines the budget, time, and resources needed to develop a software project. The success of a software project development depends mainly on the accuracy of software effort and cost estimation. A poor estimation will impact the result, which worsens the project management. Various software effort estimation model has been introduced to resolve this problem. COnstructive COst MOdel (COCOMO) is a well-established software project estimation model; however, it lacks accuracy in effort and cost estimation, especially for current projects. Inaccuracy and complexity in the estimated effort have made it difficult to efficiently and effectively develop software, affecting the schedule, cost, and uncertain estimation directly. In this paper, Particle Swarm Optimization (PSO) is proposed as a metaheuristics optimization method to hybrid with three traditional state-of-art techniques such as Support Vector Machine (SVM), Linear Regression (LR), and Random Forest (RF) for optimizing the parameters of COCOMO models. The proposed approach is applied to the NASA software project dataset downloaded from the promise repository. Comparing the proposed approach has been made with the three traditional algorithms; however, the obtained results confirm low accuracy before hybrid with PSO. Overall, the results showed that PSOSVM on the NASA software project dataset could improve effort estimation accuracy and outperform other models
    corecore