2,589 research outputs found

    Examination of unremitting kidney illness by utilizing machine learning classifiers

    Get PDF
    Chronic kidney disease is a rising health issue that affects millions of people worldwide. Early detection and characterization of this disease is essential for effective management and control. This disease is associated with several serious health risks, such as cardiovascular disease, increased risk of stroke, and end-stage renal disease, which can be effectively prevented by early detection and treatment. Medical scientists rely on machine learning algorithms to diagnose the disease accurately at its outset. Recently, adding value to healthcare is being accomplished through the integration of machine learning algorithms into mobile health solution. Considering this, this paper proposes a predictive model of three machine learning classifiers, including Support Vector Machine, Decision Tree, and Multilayer Perceptron for chronic kidney disease prediction. The performance of the model was assessed using confusion matrix and executed in popular machine learning software tools such as WEKA and Rapid Minor. The study found that support vector machine yielded the highest accuracy rate of 98% in predicting chronic kidney disease in WEKA among other standard classifiers by using 10-fold cross validation. In addition, the proposed prediction model has been compared with existing models in terms of accuracy, sensitivity, and specificity. The experimental results indicate that the proposed predictive model shows promising results. These findings could integrate with the development of mobile health solution and other innovative approaches to prevent and treat this debilitating condition.info:eu-repo/semantics/acceptedVersio

    Comparison of Different Machine Learning Algorithms for National Flags Classification

    Get PDF
    Each country in the world has its own combination of colors, shapes and symbols on their flags. Some of them use an animal figure such as an eagle, some use an object like a boat; some nations prefer religion figures such as a crescent, or a cross. Some questions yet remain and need an answer. What are the factors that determine the flag of a nation? What factors are affecting the color or colors of a national flag? And what are the reasons for existence of symbols on some national flags?In this paper, we worked an analysis on national flags and factors that mostly affects the design of them. In order to find out these factors, we have used feature extraction method, after that we used different machine learning algorithms to predict religion and landmass of the country. We also showed correlations of certain components that are possible to exist on a national flag such as dominant color or colors on a flag, bars or stripes, normal and sacred symbols such as sun, stars, crosses, crescents, and triangles and, finally some specific icons like a boat or an animal figure.This study shows the associations of some characteristics of countries or different nationalities. There are many affected factors and there are very close correlations between these factors. It also includes the classification of national flag data using Multilayer Perceptron, CART and C4.5 algorithms and comparison of these techniques based on accuracy and performance for classification of national flag’s features

    Automatic classification of oranges using image processing and data mining techniques

    Get PDF
    Data mining is the discovery of patterns and regularities from large amounts of data using machine learning algorithms. This can be applied to object recognition using image processing techniques. In fruits and vegetables production lines, the quality assurance is done by trained people who inspect the fruits while they move in a conveyor belt, and classify them in several categories based on visual features. In this paper we present an automatic orange’s classification system, which uses visual inspection to extract features from images captured with a digital camera. With these features train several data mining algorithms which should classify the fruits in one of the three pre-established categories. The data mining algorithms used are five different decision trees (J48, Classification and Regression Tree (CART), Best First Tree, Logistic Model Tree (LMT) and Random For- est), three artificial neural networks (Multilayer Perceptron with Backpropagation, Radial Basis Function Network (RBF Network), Sequential Minimal Optimization for Support Vector Machine (SMO)) and a classification rule (1Rule). The obtained results are encouraging because of the good accuracy achieved by the clas- sifiers and the low computational costs.Workshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI

    Machine learning and statistical techniques : an application to the prediction of insolvency in Spanish non-life insurance companies

    Get PDF
    Prediction of insurance companies insolvency has arisen as an important problem in the field of financial research. Most methods applied in the past to tackle this issue are traditional statistical techniques which use financial ratios as explicative variables. However, these variables often do not satisfy statistical assumptions, which complicates the application of the mentioned methods. In this paper, a comparative study of the performance of two non-parametric machine learning techniques (See5 and Rough Set) is carried out. We have applied the two methods to the problem of the prediction of insolvency of Spanish non-life insurance companies, upon the basis of a set of financial ratios. We also compare these methods with three classical and well-known techniques: one of them belonging to the field of Machine Learning (Multilayer Perceptron) and two statistical ones (Linear Discriminant Analysis and Logistic Regression). Results indicate a higher performance of the machine learning techniques. Furthermore, See5 and Rough Set provide easily understandable and interpretable decision models, which shows that these methods can be a useful tool to evaluate insolvency of insurance firms.El pronóstico sobre la insolvencia de las compañías de seguro ha surgido como un problema importante en el ámbito de investigación financiera. La mayoría de los métodos aplicados en el pasado para abordar este problema, son técnicas estadísticas tradicionales que usan los ratios financieros como variables explicativas. Aunque, estas variables a menudo no satisfacen las suposiciones estadísticas, lo que complica la aplicación de dichos métodos. En este artículo, se lleva a cabo un estudio comparativo sobre la actuación de dos técnicas de aprendizaje automático no paramétrico (See5 y Rough Set). Hemos aplicado ambos métodos al problema del pronóstico sobre la insolvencia de compañías españolas de seguros no de vida, sobre la base de un conjunto de ratios financieros. Además, hemos comparado estos métodos con tres técnicas clásicas y muy conocidas: una de ellas perteneciente al área del Aprendizaje Automático (Perceptrón Multicapa), y dos estadísticos (Análisis Discriminante Lineal y Regresión Logística). Los resultados indican un desempeño más elevado en las técnicas de aprendizaje automático. Es más, See5 y Rough Set aportan unos modelos de decisión fácilmente entendibles, e interpretables, lo que demuestra que estos métodos pueden ser útiles para evaluar la insolvencia de empresas de seguros

    Multivariate discretization of continuous valued attributes.

    Get PDF
    The area of Knowledge discovery and data mining is growing rapidly. Feature Discretization is a crucial issue in Knowledge Discovery in Databases (KDD), or Data Mining because most data sets used in real world applications have features with continuously values. Discretization is performed as a preprocessing step of the data mining to make data mining techniques useful for these data sets. This thesis addresses discretization issue by proposing a multivariate discretization (MVD) algorithm. It begins withal number of common discretization algorithms like Equal width discretization, Equal frequency discretization, Naïve; Entropy based discretization, Chi square discretization, and orthogonal hyper planes. After that comparing the results achieved by the multivariate discretization (MVD) algorithm with the accuracy results of other algorithms. This thesis is divided into six chapters, covering a few common discretization algorithms and tests these algorithms on a real world datasets which varying in size and complexity, and shows how data visualization techniques will be effective in determining the degree of complexity of the given data set. We have examined the multivariate discretization (MVD) algorithm with the same data sets. After that we have classified discrete data using artificial neural network single layer perceptron and multilayer perceptron with back propagation algorithm. We have trained the Classifier using the training data set, and tested its accuracy using the testing data set. Our experiments lead to better accuracy results with some data sets and low accuracy results with other data sets, and this is subject ot the degree of data complexity then we have compared the accuracy results of multivariate discretization (MVD) algorithm with the results achieved by other discretization algorithms. We have found that multivariate discretization (MVD) algorithm produces good accuracy results in comparing with the other discretization algorithm

    Machine Learning Applications in Graduation Prediction at the University of Nevada, Las Vegas

    Full text link
    Graduation rates of four-year institutions are an increasingly important metric to incoming students and for ranking universities. To increase completion rates, universities must analyze available student data to understand trends and factors leading to graduation. Using predictive modeling, incoming students can be assessed as to their likelihood of completing a degree. If students are predicted to be most likely to drop out, interventions can be enacted to increase retention and completion rates. At the University of Nevada, Las Vegas (UNLV), four-year graduation rates are 15% and six-year graduation rates are 39%. To improve these rates, we have gathered seven years worth of data on UNLV students who began in the fall 2010 semester or later up to the summer of 2017 which includes information from admissions applications, financial aid, and first year academic performance. The student group which is reported federally are first-time, full-time freshmen beginning in the summer or fall. Our data set includes all freshmen and transfer students within the time frame who meet our criteria. We applied data analysis and visualization techniques to understand and interpret this data set of 16,074 student profiles for actionable results by higher education staff and faculty. Predictive modeling such as logistic regression, decision trees, support vector machines, and neural networks are applied to predict whether a student will graduate. In this analysis, decision trees give the best performance

    Machine learning based data pre-processing for the purpose of medical data mining and decision support

    Get PDF
    Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. Sometimes, improved data quality is itself the goal of the analysis, usually to improve processes in a production database and the designing of decision support. As medicine moves forward there is a need for sophisticated decision support systems that make use of data mining to support more orthodox knowledge engineering and Health Informatics practice. However, the real-life medical data rarely complies with the requirements of various data mining tools. It is often inconsistent, noisy, containing redundant attributes, in an unsuitable format, containing missing values and imbalanced with regards to the outcome class label.Many real-life data sets are incomplete, with missing values. In medical data mining the problem with missing values has become a challenging issue. In many clinical trials, the medical report pro-forma allow some attributes to be left blank, because they are inappropriate for some class of illness or the person providing the information feels that it is not appropriate to record the values for some attributes. The research reported in this thesis has explored the use of machine learning techniques as missing value imputation methods. The thesis also proposed a new way of imputing missing value by supervised learning. A classifier was used to learn the data patterns from a complete data sub-set and the model was later used to predict the missing values for the full dataset. The proposed machine learning based missing value imputation was applied on the thesis data and the results are compared with traditional Mean/Mode imputation. Experimental results show that all the machine learning methods which we explored outperformed the statistical method (Mean/Mode).The class imbalance problem has been found to hinder the performance of learning systems. In fact, most of the medical datasets are found to be highly imbalance in their class label. The solution to this problem is to reduce the gap between the minority class samples and the majority class samples. Over-sampling can be applied to increase the number of minority class sample to balance the data. The alternative to over-sampling is under-sampling where the size of majority class sample is reduced. The thesis proposed one cluster based under-sampling technique to reduce the gap between the majority and minority samples. Different under-sampling and over-sampling techniques were explored as ways to balance the data. The experimental results show that for the thesis data the new proposed modified cluster based under-sampling technique performed better than other class balancing techniques.In further research it is found that the class imbalance problem not only affects the classification performance but also has an adverse effect on feature selection. The thesis proposed a new framework for feature selection for class imbalanced datasets. The research found that, using the proposed framework the classifier needs less attributes to show high accuracy, and more attributes are needed if the data is highly imbalanced.The research described in the thesis contains the flowing four novel main contributions.a) Improved data mining methodology for mining medical datab) Machine learning based missing value imputation methodc) Cluster Based semi-supervised class balancing methodd) Feature selection framework for class imbalance datasetsThe performance analysis and comparative study show that the use of proposed method of missing value imputation, class balancing and feature selection framework can provide an effective approach to data preparation for building medical decision support
    • …
    corecore