8 research outputs found

    Evolutionary algorithms and decision trees for predicting poor outcome after endovascular treatment for acute ischemic stroke

    Get PDF
    Despite the large overall beneficial effects of endovascular treatment in patients with acute ischemic stroke, severe disability or death still occurs in almost one-third of patients. These patients, who might not benefit from treatment, have been previously identified with traditional logistic regression models, which may oversimplify relations between characteristics and outcome, or machine learning techniques, which may be difficult to interpret. We developed and evaluated a novel evolutionary algorithm for fuzzy decision trees to accurately identify patients with poor outcome after endovascular treatment, which was defined as having a modified Rankin Scale score (mRS) higher or equal to 5. The created decision trees have the benefit of being comprehensible, easily interpretable models, making its predictions easy to explain to patients and practitioners. Insights in the reason for the predicted outcome can encourage acceptance and adaptation in practice and help manage expectations after treatment. We compared our proposed method to CART, the benchmark decision tree algorithm, on classification accuracy and interpretability. The fuzzy decision tree significantly outperformed CART: using 5-fold cross-validation with on average 1090 patients in the training set and 273 patients in the test set, the fuzzy decision tree misclassified on average 77 (standard deviation of 7) patients compared to 83 (+/- 7) using CART. The mean number of nodes (decision and leaf nodes) in the fuzzy decision tree was 11 (+/- 2) compared to 26 (+/- 1) for CART decision trees. With an average accuracy of 72% and much fewer nodes than CART, the developed evolutionary algorithm for fuzzy decision trees might be used to gain insights into the predictive value of patient characteristics and can contribute to the development of more accurate medical outcome prediction methods with improved clarity for practitioners and patients.Neuro Imaging Researc

    Evolutionary design of decision-tree algorithms tailored to microarray gene expression data sets

    Get PDF
    Decision-tree induction algorithms are widely used in machine learning applications in which the goal is to extract knowledge from data and present it in a graphically intuitive way. The most successful strategy for inducing decision trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of decision trees: instead of proposing a new manually designed method for inducing decision trees, we propose automatically designing decision-tree induction algorithms tailored to a specific type of classification data set (or application domain). Following recent breakthroughs in the automatic design of machine learning algorithms, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for designing decision-tree algorithms (HEAD-DT) that evolves design components of top-down decision-tree induction algorithms. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better decision-tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression data sets to assess the performance of HEAD-DT, and compare it with very well known decision-tree algorithms such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating algorithms that significantly outperform the baseline manually designed decision-tree algorithms regarding predictive accuracy and F-measure

    Computational strategies to identify, prioritize and design potential antimalarial agents from natural products

    Get PDF
    Philosophiae Doctor - PhDIntroduction: There is an exigent need to develop novel antimalarial drugs in view of the mounting disease burden and emergent resistance to the presently used drugs against the malarial parasites. A large amount of natural products, especially those used in ethnomedicine for malaria, have shown varying in-vitro antiplasmodial activities. Facilitating antimalarial drug development from this wealth of natural products is an imperative and laudable mission to pursue. However, the limited resources, high cost, low prospect and the high cost of failure during preclinical and clinical studies might militate against pursue of this mission. Chemoinformatics techniques can simulate and predict essential molecular properties required to characterize compounds thus eliminating the cost of equipment and reagents to conduct essential preclinical studies, especially on compounds that may fail during drug development. Therefore, applying chemoinformatics techniques on natural products with in-vitro antiplasmodial activities may facilitate identification and prioritization of these natural products with potential for novel mechanism of action, desirable pharmacokinetics and high likelihood for development into antimalarial drugs. In addition, unique structural features mined from these natural products may be templates to design new potential antimalarial compounds. Method: Four chemoinformatics techniques were applied on a collection of selected natural products with in-vitro antiplasmodial activity (NAA) and currently registered antimalarial drugs (CRAD): molecular property profiling, molecular scaffold analysis, machine learning and design of a virtual compound library. Molecular property profiling included computation of key molecular descriptors, physicochemical properties, molecular similarity analysis, estimation of drug-likeness, in-silico pharmacokinetic profiling and exploration of structure-activity landscape. Analysis of variance was used to assess statistical significant differences in these parameters between NAA and CRAD. Next, molecular scaffold exploration and diversity analyses were performed on three datasets (NAA, CRAD and malarial data from Medicines for Malarial Ventures (MMV)) using scaffold counts and cumulative scaffold frequency plots. Scaffolds from the NAA were compared to those from CRAD and MMV. A Scaffold Tree was also generated for all the datasets. Thirdly, machine learning approaches were used to build four regression and four classifier models from bioactivity data of NAA using molecular descriptors and molecular fingerprints. Models were built and refined by leave-one-out cross-validation and evaluated with an independent test dataset. Applicability domain (AD), which defines the limit of reliable predictability by the models, was estimated from the training dataset and validated with the test dataset. Possible chemical features associated with reported antimalarial activities of the compounds were also extracted. Lastly, virtual compound libraries were generated with the unique molecular scaffolds identified from the NAA. The virtual compounds generated were characterized by evaluating selected molecular descriptors, toxicity profile, structural diversity from CRAD and prediction of antiplasmodial activity. Results: From the molecular property profiling, a total of 1040 natural products were selected and a total of 13 molecular descriptors were analyzed. Significant differences were observed between the natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) for at least 11 of the molecular descriptors. Molecular similarity and chemical space analysis identified NAA that were structurally diverse from CRAD. Over 50% of NAA with desirable drug-like properties were identified. However, nearly 70% of NAA were identified as potentially "promiscuous" compounds. Structure-activity landscape analysis highlighted compound pairs that formed "activity cliffs". In all, prioritization strategies for the natural products with in-vitro antiplasmodial activities were proposed. The scaffold exploration and analysis results revealed that CRAD exhibited greater scaffold diversity, followed by NAA and MMV respectively. Unique scaffolds that were not contained in any other compounds in the CRAD datasets were identified in NAA. The Scaffold Tree showed the preponderance of ring systems in NAA and identified virtual scaffolds, which maybe potential bioactive compounds or elucidate the NAA possible synthetic routes. From the machine learning study, the regression and classifier models that were most suitable for NAA were identified as model tree M5P (correlation coefficient = 0.84) and Sequential Minimization Optimization (accuracy = 73.46%) respectively. The test dataset fitted into the applicability domain (AD) defined by the training dataset. The “amine” group was observed to be essential for antimalarial activity in both NAA and MMV dataset but hydroxyl and carbonyl groups may also be relevant in the NAA dataset. The results of the characterization of the virtual compound library showed significant difference (p value 90%) of the virtual compound library. The virtual compound libraries showed sufficient diversity in structures and majority were structurally diverse from currently registered antimalarial drugs. Finally, up to 70% of the virtual compounds were predicted as active antiplasmodial agents. Conclusions:Molecular property profiling of natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) produced a wealth of information that may guide decisions and facilitate antimalarial drug development from natural products and led to a prioritized list of natural products with in-vitro antiplasmodial activities. Molecular scaffold analysis identified unique scaffolds and virtual scaffolds from NAA that possess desirable drug-like properties, which make them ideal starting points for molecular antimalarial drug design. The machine learning study built, evaluated and identified amply accurate regression and classifier accurate models that were used for virtual screening of natural compound libraries to mine possible antimalarial compounds without the expense of bioactivity assays. Finally, a good amount of the virtual compounds generated were structurally diverse from currently registered antimalarial drugs and potentially active antiplasmodial agents. Filtering and optimization may lead to a collection of virtual compounds with unique chemotypes that may be synthesized and added to screening deck against Plasmodium

    Evolutionary model tree induction

    No full text
    Made available in DSpace on 2015-04-14T14:49:20Z (GMT). No. of bitstreams: 1 422461.pdf: 1656872 bytes, checksum: 4520cf1ef2435e86327deed3e89baed9 (MD5) Previous issue date: 2009-12-10?rvores-modelo s?o um caso particular de ?rvores de decis?o aplicadas na solu??o de problemas de regress?o, onde a vari?vel a ser predita ? cont?nua. Possuem a vantagem de apresentar uma sa?da interpret?vel, auxiliando o usu?rio do sistema a ter mais confian?a na predi??o e proporcionando a base para o usu?rio ter novos insights sobre os dados, confirmando ou rejeitando hip?teses previamente formadas. Al?m disso, ?rvores-modelo apresentam um n?vel aceit?vel de desempenho preditivo quando comparadas ? maioria das t?cnicas utilizadas na solu??o de problemas de regress?o. Uma vez que gerar a ?rvore-modelo ?tima ? um problema NP-Completo, algoritmos tradicionais de indu??o de ?rvores-modelo fazem uso da estrat?gia gulosa, top-down e de divis?o e conquista, que pode n?o convergir ? solu??o ?tima-global. Neste trabalho ? proposta a utiliza??o do paradigma de algoritmos evolutivos como uma heur?stica alternativa para gera??o de ?rvores-modelo. Esta nova abordagem ? testada por meio de bases de dados de regress?o p?blicas da UCI, e os resultados s?o comparados ?queles gerados por algoritmos gulosos tradicionais de indu??o de ?rvores-modelo. Os resultados mostram que esta nova abordagem apresenta uma boa rela??o custo-benef?cio entre desempenho preditivo e gera??o de modelos de f?cil interpreta??o, proporcionando um diferencial muitas vezes crucial em diversas aplica??es de minera??o de dados

    Evolutionary model tree induction

    No full text
    Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output with an acceptable level of predictive performance. Since generating optimal model trees is a NP-Complete problem, the traditional model tree induction algorithms make use of a greedy heuristic, which may not converge to the global optimal solution. We propose the use of the evolutionary algorithms paradigm (EA) as an alternate heuristic to generate model trees in order to improve the convergence to global optimal solutions. We test the predictive performance of this new approach using public UCI datasets, and compare the results with traditional greedy regression/model trees induction algorithms

    Evolutionary model trees for handling continuous classes in machine learning

    No full text
    Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)European Research Consortium for Informatics and Mathematics (ERCIM)European Research Consortium for Informatics and Mathematics (ERCIM
    corecore