773 research outputs found

    EapGAFS: Microarray Dataset for Ensemble Classification for Diseases Prediction

    Get PDF
    Microarray data stores the measured expression levels of thousands of genes simultaneously which helps the researchers to get insight into the biological and prognostic information. Cancer is a deadly disease that develops over time and involves the uncontrolled division of body cells. In cancer, many genes are responsible for cell growth and division. But different kinds of cancer are caused by a different set of genes. So to be able to better understand, diagnose and treat cancer, it is essential to know which of the genes in the cancer cells are working abnormally. The advances in data mining, machine learning, soft computing, and pattern recognition have addressed the challenges posed by the researchers to develop computationally effective models to identify the new class of disease and develop diagnostic or therapeutic targets. This paper proposed an Ensemble Aprior Gentic Algorithm Feature Selection (EapGAFS) for microarray dataset classification. The proposed algorithm comprises of the genetic algorithm implemented with aprior learning for the microarray attributes classification. The proposed EapGAFS uses the rule set mining in the genetic algorithm for the microarray dataset processing. Through framed rule set the proposed model extract the attribute features in the dataset. Finally, with the ensemble classifier model the microarray dataset were classified for the processing. The performance of the proposed EapGAFS is conventional classifiers for the collected microarray dataset of the breast cancer, Hepatities, diabeties, and bupa. The comparative analysis of the proposed EapGAFS with the conventional classifier expressed that the proposed EapGAFS exhibits improved performance in the microarray dataset classification. The performance of the proposed EapGAFS is improved ~4 ā€“ 6% than the conventional classifiers such as Adaboost and ensemble

    MACHINE LEARNING AND DEEP LEARNING APPROACHES FOR GENE REGULATORY NETWORK INFERENCE IN PLANT SPECIES

    Get PDF
    The construction of gene regulatory networks (GRNs) is vital for understanding the regulation of metabolic pathways, biological processes, and complex traits during plant growth and responses to environmental cues and stresses. The increasing availability of public databases has facilitated the development of numerous methods for inferring gene regulatory relationships between transcription factors and their targets. However, there is limited research on supervised learning techniques that utilize available regulatory relationships of plant species in public databases. This study investigates the potential of machine learning (ML), deep learning (DL), and hybrid approaches for constructing GRNs in plant species, specifically Arabidopsis thaliana, poplar, and maize. Challenges arise due to limited training data for gene regulatory pairs, especially in less-studied species such as poplar and maize. Nonetheless, our results demonstrate that hybrid models integrating ML and artificial neural network (ANN) techniques significantly outperformed traditional methods in predicting gene regulatory relationships. The best-performing hybrid models achieved over 95% accuracy on holdout test datasets, surpassing traditional ML and ANN models and also showed good accuracy on lignin biosynthesis pathway analysis. Employing transfer learning techniques, this study has also successfully transferred the known knowledge of gene regulation from one species to another, substantially improving performance and manifesting the viability of cross-species learning using deep learning-based approaches. This study contributes to the methodology for growing body of knowledge in GRN prediction and construction for plant species, highlighting the value of adopting hybrid models and transfer learning techniques. This study and the results will help to pave a way for future research on how to learn from known to unknown and will be conductive to the advance of modern genomics and bioinformatics

    Graph-based Regularization in Machine Learning: Discovering Driver Modules in Biological Networks

    Get PDF
    Curiosity of human nature drives us to explore the origins of what makes each of us different. From ancient legends and mythology, Mendel\u27s law, Punnett square to modern genetic research, we carry on this old but eternal question. Thanks to technological revolution, today\u27s scientists try to answer this question using easily measurable gene expression and other profiling data. However, the exploration can easily get lost in the data of growing volume, dimension, noise and complexity. This dissertation is aimed at developing new machine learning methods that take data from different classes as input, augment them with knowledge of feature relationships, and train classification models that serve two goals: 1) class prediction for previously unseen samples; 2) knowledge discovery of the underlying causes of class differences. Application of our methods in genetic studies can help scientist take advantage of existing biological networks, generate diagnosis with higher accuracy, and discover the driver networks behind the differences. We proposed three new graph-based regularization algorithms. Graph Connectivity Constrained AdaBoost algorithm combines a connectivity module, a deletion function, and a model retraining procedure with the AdaBoost classifier. Graph-regularized Linear Programming Support Vector Machine integrates penalty term based on submodular graph cut function into linear classifier\u27s objective function. Proximal Graph LogisticBoost adds lasso and graph-based penalties into logistic risk function of an ensemble classifier. Results of tests of our models on simulated biological datasets show that the proposed methods are able to produce accurate, sparse classifiers, and can help discover true genetic differences between phenotypes

    Development and evaluation of machine learning algorithms for biomedical applications

    Get PDF
    Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches. This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques for selecting topological features suitable for link prediction in gene networks; a graph sparsification method for network sampling; (iii) combined supervised and unsupervised methods to infer gene networks; and (iv) sampling and boosting techniques for reverse engineering gene networks. For drug sensitivity prediction problem, the dissertation presents (i) an instance selection technique and hybrid method for drug sensitivity prediction; (ii) a link prediction approach to drug sensitivity prediction; a noise-filtering method for drug sensitivity prediction; and (iv) transfer learning approaches for enhancing the performance of drug sensitivity prediction. Substantial experiments are conducted to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results demonstrate the feasibility of the algorithms and their superiority over the existing approaches

    Inferring Network Mechanisms: The Drosophila melanogaster Protein Interaction Network

    Get PDF
    Naturally occurring networks exhibit quantitative features revealing underlying growth mechanisms. Numerous network mechanisms have recently been proposed to reproduce specific properties such as degree distributions or clustering coefficients. We present a method for inferring the mechanism most accurately capturing a given network topology, exploiting discriminative tools from machine learning. The Drosophila melanogaster protein network is confidently and robustly (to noise and training data subsampling) classified as a duplication-mutation-complementation network over preferential attachment, small-world, and other duplication-mutation mechanisms. Systematic classification, rather than statistical study of specific properties, provides a discriminative approach to understand the design of complex networks.Comment: 19 pages, 5 figure
    • ā€¦
    corecore