8 research outputs found

    The Application of Genetic Algorithms in the Biological Medical Diagnostic Research

    Get PDF
    In this paper, a genetic algorithm is used to determine the Mean Corpuscular Volume (MCV) as the optimal decision-making criterion for anemia caused by iron deficiency based on the diagnostic test of patients with such anemia. On the premise of attaining maximum sensitivity and specificity for the cost, this paper studies the impact of the cost ratio of the optimal decision-making criteria and compares the mathematical derivation and binominal model method, so as to discuss the application of the optimal diagnostic criteria in the genetic algorithm and provide a practical study method for the diagnostic test

    Machine Learning Approaches for Breast Cancer Survivability Prediction

    Get PDF
    Breast cancer is one of the leading causes of cancer death in women. If not diagnosed early, the 5-year survival rate of patients is just about 26\%. Furthermore, patients with similar phenotypes can respond differently to the same therapies, which means the therapies might not work well for some of them. Identifying biomarkers that can help predict a cancer class with high accuracy is at the heart of breast cancer studies because they are targets of the treatments and drug development. Genomics data have been shown to carry useful information for breast cancer diagnosis and prognosis, as well as uncovering the disease’s mechanism. Machine learning methods are powerful tools to find such information. Feature selection methods are often utilized in supervised learning and unsupervised learning tasks to deal with data containing a large number of features in which only a small portion of them are useful to the classification task. On the other hand, analyzing only one type of data, without reference to the existing knowledge about the disease and the therapies, might mislead the findings. Effective data integration approaches are necessary to uncover this complex disease. In this thesis, we apply and develop machine learning methods to identify meaningful biomarkers for breast cancer survivability prediction after a certain treatment. They include applying feature selection methods on gene-expression data to derived gene-signatures, where the initial genes are collected concerning the mechanism of some drugs used breast cancer therapies. We also propose a new feature selection method, named PAFS, and apply it to discover accurate biomarkers. In addition, it has been increasingly supported that, sub-network biomarkers are more robust and accurate than gene biomarkers. We proposed two network-based approaches to identify sub-network biomarkers for breast cancer survivability prediction after a treatment. They integrate gene-expression data with protein-protein interactions during the optimal sub-network searching process and use cancer-related genes and pathways to prioritize the extracted sub-networks. The sub-network search space is usually huge and many proteins interact with thousands of other proteins. Thus, we apply some heuristics to avoid generating and evaluating redundant sub-networks

    Information gain directed genetic algorithm wrapper feature selection for credit rating

    Get PDF
    Financial credit scoring is one of the most crucial processes in the finance industry sector to be able to assess the credit-worthiness of individuals and enterprises. Various statistics-based machine learning techniques have been employed for this task. “Curse of Dimensionality” is still a significant challenge in machine learning techniques. Some research has been carried out on Feature Selection (FS) using genetic algorithm as wrapper to improve the performance of credit scoring models. However, the challenge lies in finding an overall best method in credit scoring problems and improving the time-consuming process of feature selection. In this study, the credit scoring problem is investigated through feature selection to improve classification performance. This work proposes a novel approach to feature selection in credit scoring applications, called as Information Gain Directed Feature Selection algorithm (IGDFS), which performs the ranking of features based on information gain, propagates the top m features through the GA wrapper (GAW) algorithm using three classical machine learning algorithms of KNN, Naïve Bayes and Support Vector Machine (SVM) for credit scoring. The first stage of information gain guided feature selection can help reduce the computing complexity of GA wrapper, and the information gain of features selected with the IGDFS can indicate their importance to decision making

    Swarm intelligence algorithms adaptation for various search spaces

    Get PDF
    U današnje vrijeme postoji mnogo algoritama inteligencije rojeva koji se uspiješno koriste za rešavanje raznih teških problema optimizacije. Zajednicki elementi svih ovih algoritama su operator za lokalnu pretragu (eksploataciju) oko prona enih obecavajucih rješenja i operator globalne pretrage (eksploracije) koji pomaže u bijegu iz lokalnih optimuma. Algoritmi inteligencije rojeva obicno se inicijalno testiraju na neogranicenim, ogranicenim ili visoko-dimenzionalnim skupovima standardnih test funkcija. Nadalje, mogu se poboljšati, prilagoditi, izmijeniti, hibridizirati, kombinirati s lokalnom pretragom. Konacna svrha je korištenje takve metaheuristike za optimizaciju problema iz stvarnog svijeta. Domeni rješenja odnosno prostori pretrage prakticnih teških problema optimizacije mogu biti razliciti. Rješenja mogu biti vektori iz skupa realnih brojeva, cijelih brojeva ali mogu biti i kompleksnije strukture. Algoritmi inteligencije rojeva moraju se prilagoditi za razlicite prostore pretrage što može biti jednostavno podešavanje parametera algoritma ili prilagodba za cjelobrojna rješenja jednostavnim zaokruživanjem dobivenih realnih rješenja ali za pojedine prostore pretrage potrebnao je skoro kompletno prepravljanja algoritma ukljucujuci i operatore ekploatacije i ekploracije zadržavajuci samo proces vo enja odnosno inteligenciju roja. U disertaciji je predstavljeno nekoliko algoritama inteligencije rojeva i njihova prilagodba za razlicite prostore pretrage i primjena na prakticne probleme. Ova disertacija ima za cilj analizirati i prilagoditi, u zavisnosti od funkcije cilja i prostora rješenja, algoritme inteligencije rojeva. Predmet disertacije ukljucuje sveobuhvatan pregled postojecih implementacija algoritama inteligencije rojeva. Disertacija tako er obuhvaca komparativnu analizu, prikaz slabosti i snaga jednih algoritama u odnosu na druge zajedno s istraživanjem prilagodbi algoritama inteligencije rojeva za razlicite prostore pretrage i njihova primjena na prakticne problem. Razmatrani su problemi sa realnim rješenjima kao što su optimizacija stroja potpornih vektora, grupiranje podataka, sa cijelobrojnim rješenjima kao što je slucaj problema segmentacije digitalnih slika i za probleme gdje su rješenja posebne strukture kao što su problemi planiranja putanje robota i triangulacije minimalne težine. Modificirani i prilago eni algoritmi inteligencije rojeva za razlicite prostore pretrage i primjenih na prakticne probleme testirani su na standardnim skupovima test podataka i uspore eni s drugim suvremenim metodama za rješavanje promatranih problema iz literature. Pokazane su uspješne prilagodbe algoritama inteligencije rojeva za razne prostore pretrage. Ovako prilago eni algoritmi su u svim slucajevima postigli bolje rezultate u usporedbi sa metodama iz literature, što dovodi do zakljucka da je moguce prilagoditi algoritme inteligencije rojeva za razne prostore pretrage ukljucujuci i kompleksne strukture i postici bolje rezultate u usporedbi sa metodama iz literature

    Benefit maximizing classification using feature intervals

    Get PDF
    Cataloged from PDF version of article.For a long time, classification algorithms have focused on minimizing the quantity of prediction errors by assuming that each possible error has identical consequences. However, in many real-world situations, this assumption is not convenient. For instance, in a medical diagnosis domain, misdiagnosing a sick patient as healthy is much more serious than its opposite. For this reason, there is a great need for new classification methods that can handle asymmetric cost and benefit constraints of classifications. In this thesis, we discuss cost-sensitive classification concepts and propose a new classification algorithm called Benefit Maximization with Feature Intervals (BMFI) that uses the feature projection based knowledge representation. In the framework of BMFI, we introduce five different voting methods that are shown to be effective over different domains. A number of generalization and pruning methodologies based on benefits of classification are implemented and experimented. Empirical evaluation of the methods has shown that BMFI exhibits promising performance results compared to recent wrapper cost-sensitive algorithms, despite the fact that classifier performance is highly dependent on the benefit constraints and class distributions in the domain. In order to evaluate costsensitive classification techniques, we describe a new metric, namely benefit accuracy which computes the relative accuracy of the total benefit obtained with respect to the maximum possible benefit achievable in the domain.İkizler, NazlıM.S

    Data mining in computational finance

    Get PDF
    Computational finance is a relatively new discipline whose birth can be traced back to early 1950s. Its major objective is to develop and study practical models focusing on techniques that apply directly to financial analyses. The large number of decisions and computationally intensive problems involved in this discipline make data mining and machine learning models an integral part to improve, automate, and expand the current processes. One of the objectives of this research is to present a state-of-the-art of the data mining and machine learning techniques applied in the core areas of computational finance. Next, detailed analysis of public and private finance datasets is performed in an attempt to find interesting facts from data and draw conclusions regarding the usefulness of features within the datasets. Credit risk evaluation is one of the crucial modern concerns in this field. Credit scoring is essentially a classification problem where models are built using the information about past applicants to categorise new applicants as ‘creditworthy’ or ‘non-creditworthy’. We appraise the performance of a few classical machine learning algorithms for the problem of credit scoring. Typically, credit scoring databases are large and characterised by redundant and irrelevant features, making the classification task more computationally-demanding. Feature selection is the process of selecting an optimal subset of relevant features. We propose an improved information-gain directed wrapper feature selection method using genetic algorithms and successfully evaluate its effectiveness against baseline and generic wrapper methods using three benchmark datasets. One of the tasks of financial analysts is to estimate a company’s worth. In the last piece of work, this study predicts the growth rate for earnings of companies using three machine learning techniques. We employed the technique of lagged features, which allowed varying amounts of recent history to be brought into the prediction task, and transformed the time series forecasting problem into a supervised learning problem. This work was applied on a private time series dataset

    Effects of municipal smoke-free ordinances on secondhand smoke exposure in the Republic of Korea

    Get PDF
    ObjectiveTo reduce premature deaths due to secondhand smoke (SHS) exposure among non-smokers, the Republic of Korea (ROK) adopted changes to the National Health Promotion Act, which allowed local governments to enact municipal ordinances to strengthen their authority to designate smoke-free areas and levy penalty fines. In this study, we examined national trends in SHS exposure after the introduction of these municipal ordinances at the city level in 2010.MethodsWe used interrupted time series analysis to assess whether the trends of SHS exposure in the workplace and at home, and the primary cigarette smoking rate changed following the policy adjustment in the national legislation in ROK. Population-standardized data for selected variables were retrieved from a nationally representative survey dataset and used to study the policy action’s effectiveness.ResultsFollowing the change in the legislation, SHS exposure in the workplace reversed course from an increasing (18% per year) trend prior to the introduction of these smoke-free ordinances to a decreasing (−10% per year) trend after adoption and enforcement of these laws (β2 = 0.18, p-value = 0.07; β3 = −0.10, p-value = 0.02). SHS exposure at home (β2 = 0.10, p-value = 0.09; β3 = −0.03, p-value = 0.14) and the primary cigarette smoking rate (β2 = 0.03, p-value = 0.10; β3 = 0.008, p-value = 0.15) showed no significant changes in the sampled period. Although analyses stratified by sex showed that the allowance of municipal ordinances resulted in reduced SHS exposure in the workplace for both males and females, they did not affect the primary cigarette smoking rate as much, especially among females.ConclusionStrengthening the role of local governments by giving them the authority to enact and enforce penalties on SHS exposure violation helped ROK to reduce SHS exposure in the workplace. However, smoking behaviors and related activities seemed to shift to less restrictive areas such as on the streets and in apartment hallways, negating some of the effects due to these ordinances. Future studies should investigate how smoke-free policies beyond public places can further reduce the SHS exposure in ROK
    corecore