5,151 research outputs found

    RB-Bayes algorithm for the prediction of diabetic in Pima Indian dataset

    Get PDF
    Diabetes is a major concern all over the world. It is increasing at a fast pace. People can avoid diabetes at an early stage without any test. The goal of this paper is to predict the probability of whether the person has a risk of diabetes or not at an early stage. This would lead to having a great impact on their quality of human life. The datasets are Pima Indians diabetes and Cleveland coronary illness and consist of 768 records. Though there are a number of solutions available for information extraction from a huge datasets and to predict the possibility of having diabetes, but the accuracy of their mining process is far from accurate. For achieving highest accuracy, the issue of zero probability which is generally faced by naĂŻve bayes analysis needs to be addressed suitably. The proposed framework RB-Bayes aims to extract the required information with high accuracy that could survive the problem of zero probability and also configure accuracy with other methods like Support Vector Machine, Naive Bayes, and K Nearest Neighbor. We calculated mean to handle missing data and calculated probability for yes (positive) and no (negative). The highest value between yes and no decide the value for the tuple. It is mostly used in text classification. The outcomes on Pima Indian diabetes dataset demonstrate that the proposed methodology enhances the precision as a contrast with other regulated procedures. The accuracy of the proposed methodology large dataset is 72.9%

    Layered genetic programming for feature extraction in classification problems

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsGenetic programming has been proven to be a successful technique for feature extraction in various applications. In this thesis, we present a Layered Genetic Programming system which implements genetic programming-based feature extraction mechanism. The proposed system uses a layered structure where instead of evolving just one population of individuals, several populations are evolved sequentially. Each such population transforms the input data received from the previous population into a lower dimensional space with the aim of improving classification performance. The performance of the proposed system was experimentally tested on 5 real-world problems using different dimensionality reduction step sizes and different classifiers. The proposed method was able to outperform a simple classifier applied directly on the original data on two problems. On the remaining problems, the classifier performed better using the original data. The best solutions were often obtained in the first few layers which implied that increasing the size of the system, i.e. adding more layers was not useful. However, the layered structure allowed control of the size of individuals

    Pattern recognition using genetic programming for classification of diabetes and modulation data

    Get PDF
    The field of science whose goal is to assign each input object to one of the given set of categories is called pattern recognition. A standard pattern recognition system can be divided into two main components, feature extraction and pattern classification. During the process of feature extraction, the information relevant to the problem is extracted from raw data, prepared as features and passed to a classifier for assignment of a label. Generally, the extracted feature vector has fairly large number of dimensions, from the order of hundreds to thousands, increasing the computational complexity significantly. Feature generation is introduced to handle this problem which filters out the unwanted features. The functionality of feature generation has become very important in modern pattern recognition systems as it not only reduces the dimensions of the data but also increases the classification accuracy. A genetic programming (GP) based framework has been utilised in this thesis for feature generation. GP is a process based on the biological evolution of features in which combination of original features are evolved. The stronger features propagate in this evolution while weaker features are discarded. The process of evolution is optimised in a way to improve the discriminatory power of features in every new generation. The final features generated have more discriminatory power than the original features, making the job of classifier easier. One of the main problems in GP is a tendency towards suboptimal-convergence. In this thesis, the response of features for each input instance which gives insight into strengths and weaknesses of features is used to avoid suboptimal-convergence. The strengths and weaknesses are utilised to find the right partners during crossover operation which not only helps to avoid suboptimal-convergence but also makes the evolution more effective. In order to thoroughly examine the capabilities of GP for feature generation and to cover different scenarios, different combinations of GP are designed. Each combination of GP differs in the way, the capability of the features to solve the problem (the fitness function) is evaluated. In this research Fisher criterion, Support Vector Machine and Artificial Neural Network have been used to evaluate the fitness function for binary classification problems while K-nearest neighbour classifier has been used for fitness evaluation of multi-class classification problems. Two Real world classification problems (diabetes detection and modulation classification) are used to evaluate the performance of GP for feature generation. These two problems belong to two different categories; diabetes detection is a binary classification problem while modulation classification is a multi-class classification problem. The application of GP for both the problems helps to evaluate the performance of GP for both categories. A series of experiments are conducted to evaluate and compare the results obtained using GP. The results demonstrate the superiority of GP generated features compared to features generated by conventional methods

    A voting-based machine learning approach for classifying biological and clinical datasets.

    Get PDF
    BACKGROUND: Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS: The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value \u3c 0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies by ~ 10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION: Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans

    Cognitive development optimization algorithm based support vector machines for determining diabetes

    Get PDF
    The definition, diagnosis and classification of Diabetes Mellitus and its complications are very important. First of all, the World Health Organization (WHO) and other societies, as well as scientists have done lots of studies regarding this subject. One of the most important research interests of this subject is the computer supported decision systems for diagnosing diabetes. In such systems, Artificial Intelligence techniques are often used for several disease diagnostics to streamline the diagnostic process in daily routine and avoid misdiagnosis. In this study, a diabetes diagnosis system, which is formed via both Support Vector Machines (SVM) and Cognitive Development Optimization Algorithm (CoDOA) has been proposed. Along the training of SVM, CoDOA was used for determining the sigma parameter of the Gauss (RBF) kernel function, and eventually, a classification process was made over the diabetes data set, which is related to Pima Indians. The proposed approach offers an alternative solution to the field of Artificial Intelligence based diabetes diagnosis, and contributes to the related literature on diagnosis processes

    Hybridizing Cartesian Genetic Programming and Harmony Search for Adaptive Feature Construction in Supervised Learning Problems

    Get PDF
    The advent of the so-called Big Data paradigm has motivated a flurry of research aimed at enhancing machine learning models by following very di- verse approaches. In this context this work focuses on the automatic con- struction of features in supervised learning problems, which differs from the conventional selection of features in that new characteristics with enhanced predictive power are inferred from the original dataset. In particular this manuscript proposes a new iterative feature construction approach based on a self-learning meta-heuristic algorithm (Harmony Search) and a solution encoding strategy (correspondingly, Cartesian Genetic Programming) suited to represent combinations of features by means of constant-length solution vectors. The proposed feature construction algorithm, coined as Adaptive Cartesian Harmony Search (ACHS), incorporates modifications that allow exploiting the estimated predictive importance of intermediate solutions and, ultimately, attaining better convergence rate in its iterative learning proce- dure. The performance of the proposed ACHS scheme is assessed and com- pared to that rendered by the state of the art in a toy example and three practical use cases from the literature. The excellent performance figures obtained in these problems shed light on the widespread applicability of the proposed scheme to supervised learning with legacy datasets composed by already refined characteristics

    Altered developmental programming of the mouse mammary gland in female offspring following perinatal dietary exposures : a systems-biology perspective.

    Get PDF
    Mishaps in prenatal development can influence mammary gland development and, ultimately, affect susceptibility to factors that cause breast cancer. This research was based on the underlying hypothesis that maternal dietary composition during pregnancy can alter developmental (fetal) programming of the mammary gland. We used a computational systems-biology approach and Bayesian-based stochastic search variable selection algorithm (SSVS) to identify differentially expressed genes and biological themes and pathways. Postnatal growth trajectories and gene expression in the mammary gland at 10-weeks of age in female mice were investigated following different maternal diet exposures during prenatal-lactational-early-juvenile development. This correlated a decrease in expression of energy pathways with a reciprocal increase in cytokine and inflammatory-signaling pathways. These findings suggest maternal dietary fat exposure significantly influences postnatal growth trajectories, metabolic programming, and signaling networks in the mammary gland of female offspring. In addition, the adipocytokine pathway may be a sensitive trigger to dietary changes and may influence or enhance activation of an immune response, a key event in cancer development

    Prediction of Concurrent Hypertensive Disorders in Pregnancy and Gestational Diabetes Mellitus Using Machine Learning Techniques

    Get PDF
    Gestational diabetes mellitus and hypertensive disorders in pregnancy are serious maternal health conditions with immediate and lifelong mother-child health consequences. These obstetric pathologies have been widely investigated, but mostly in silos, while studies focusing on their simultaneous occurrence rarely exist. This is especially the case in the machine learning domain. This retrospective study sought to investigate, construct, evaluate, compare, and isolate a supervised machine learning predictive model for the binary classification of co-occurring gestational diabetes mellitus and hypertensive disorders in pregnancy in a cohort of otherwise healthy pregnant women. To accomplish the stated aims, this study analyzed an extract (n=4624, n_features=38) of a labelled maternal perinatal dataset (n=9967, n_fields=79) collected by the PeriData.Net® database from a participating community hospital in Southeast Wisconsin between 2013 and 2018. The datasets were named, “WiseSample” and “WiseSubset” respectively in this study. Thirty-three models were constructed with the six supervised machine learning algorithms explored on the extracted dataset: logistic regression, random forest, decision tree, support vector machine, StackingClassifier, and KerasClassifier, which is a deep learning classification algorithm; all were evaluated using the StratifiedKfold cross-validation (k=10) method. The Synthetic Minority Oversampling Technique was applied to the training data to resolve the class imbalance that was noted in the sub-sample at the preprocessing phase. A wide range of evidence-based feature selection techniques were used to identify the best predictors of the comorbidity under investigation. Multiple model performance evaluation metrics that were employed to quantitatively evaluate and compare model performance quality include accuracy, F1, precision, recall, and the area under the receiver operating characteristic curve. Support Vector Machine objectively emerged as the most generalizable model for identifying the gravidae in WiseSubset who may develop concurrent gestational diabetes mellitus and hypertensive disorders in pregnancy, scoring 100.00% (mean) in recall. The model consisted of 9 predictors extracted by the recursive feature elimination with cross-validation with random forest. Finding from this study show that appropriate machine learning methods can reliably predict comorbid gestational diabetes and hypertensive disorders in pregnancy, using readily available routine prenatal attributes. Six of the nine most predictive factors of the comorbidity were also in the top 6 selections of at least one other feature selection method examined. The six predictors are healthy weight prepregnancy BMI, mother’s educational status, husband’s educational status, husband’s occupation in one year before the current pregnancy, mother’s blood group, and mother’s age range between 34 and 44 years. Insight from this analysis would support clinical decision making of obstetric experts when they are caring for 1.) nulliparous women, since they would have no obstetric history that could prompt their care providers for feto-maternal medical surveillance; and 2.) the experienced mothers with no obstetric history suggestive of any of the disease(s) under this study. Hence, among other benefits, the artificial-intelligence-backed tool designed in this research would likely improve maternal and child care quality outcomes

    Novel Hypertrophic Cardiomyopathy Diagnosis Index Using Deep Features and Local Directional Pattern Techniques

    Get PDF
    Hypertrophic cardiomyopathy (HCM) is a genetic disorder that exhibits a wide spectrum of clinical presentations, including sudden death. Early diagnosis and intervention may avert the latter. Left ventricular hypertrophy on heart imaging is an important diagnostic criterion for HCM, and the most common imaging modality is heart ultrasound (US). The US is operator-dependent, and its interpretation is subject to human error and variability. We proposed an automated computer-aided diagnostic tool to discriminate HCM from healthy subjects on US images. We used a local directional pattern and the ResNet-50 pretrained network to classify heart US images acquired from 62 known HCM patients and 101 healthy subjects. Deep features were ranked using Student's t-test, and the most significant feature (SigFea) was identified. An integrated index derived from the simulation was defined as 100.log(10 )(SigFea /root 2) in each subject, and a diagnostic threshold value was empirically calculated as the mean of the minimum and maximum integrated indices among HCM and healthy subjects, respectively. An integrated index above a threshold of 0.5 separated HCM from healthy subjects with 100% accuracy in our test dataset
    • …
    corecore