1,780 research outputs found

    Computational models and approaches for lung cancer diagnosis

    Full text link
    The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results

    Computational Intelligence Based Classifier Fusion Models for Biomedical Classification Applications

    Get PDF
    The generalization abilities of machine learning algorithms often depend on the algorithms’ initialization, parameter settings, training sets, or feature selections. For instance, SVM classifier performance largely relies on whether the selected kernel functions are suitable for real application data. To enhance the performance of individual classifiers, this dissertation proposes classifier fusion models using computational intelligence knowledge to combine different classifiers. The first fusion model called T1FFSVM combines multiple SVM classifiers through constructing a fuzzy logic system. T1FFSVM can be improved by tuning the fuzzy membership functions of linguistic variables using genetic algorithms. The improved model is called GFFSVM. To better handle uncertainties existing in fuzzy MFs and in classification data, T1FFSVM can also be improved by applying type-2 fuzzy logic to construct a type-2 fuzzy classifier fusion model (T2FFSVM). T1FFSVM, GFFSVM, and T2FFSVM use accuracy as a classifier performance measure. AUC (the area under an ROC curve) is proved to be a better classifier performance metric. As a comparison study, AUC-based classifier fusion models are also proposed in the dissertation. The experiments on biomedical datasets demonstrate promising performance of the proposed classifier fusion models comparing with the individual composing classifiers. The proposed classifier fusion models also demonstrate better performance than many existing classifier fusion methods. The dissertation also studies one interesting phenomena in biology domain using machine learning and classifier fusion methods. That is, how protein structures and sequences are related each other. The experiments show that protein segments with similar structures also share similar sequences, which add new insights into the existing knowledge on the relation between protein sequences and structures: similar sequences share high structure similarity, but similar structures may not share high sequence similarity

    Probabilistic grammatical model of protein language and its application to helix-helix contact site classification

    Get PDF
    BACKGROUND: Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS: In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. CONCLUSIONS: We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists

    Bioinformatics Applications Based On Machine Learning

    Get PDF
    The great advances in information technology (IT) have implications for many sectors, such as bioinformatics, and has considerably increased their possibilities. This book presents a collection of 11 original research papers, all of them related to the application of IT-related techniques within the bioinformatics sector: from new applications created from the adaptation and application of existing techniques to the creation of new methodologies to solve existing problems

    Applying Machine Learning Algorithms for the Analysis of Biological Sequences and Medical Records

    Get PDF
    The modern sequencing technology revolutionizes the genomic research and triggers explosive growth of DNA, RNA, and protein sequences. How to infer the structure and function from biological sequences is a fundamentally important task in genomics and proteomics fields. With the development of statistical and machine learning methods, an integrated and user-friendly tool containing the state-of-the-art data mining methods are needed. Here, we propose SeqFea-Learn, a comprehensive Python pipeline that integrating multiple steps: feature extraction, dimensionality reduction, feature selection, predicting model constructions based on machine learning and deep learning approaches to analyze sequences. We used enhancers, RNA N6- methyladenosine sites and protein-protein interactions datasets to evaluate the validation of the tool. The results show that the tool can effectively perform biological sequence analysis and classification tasks. Applying machine learning algorithms for Electronic medical record (EMR) data analysis is also included in this dissertation. Chronic kidney disease (CKD) is prevalent across the world and well defined by an estimated glomerular filtration rate (eGFR). The progression of kidney disease can be predicted if future eGFR can be accurately estimated using predictive analytics. Thus, I present a prediction model of eGFR that was built using Random Forest regression. The dataset includes demographic, clinical and laboratory information from a regional primary health care clinic. The final model included eGFR, age, gender, body mass index (BMI), obesity, hypertension, and diabetes, which achieved a mean coefficient of determination of 0.95. The estimated eGFRs were used to classify patients into CKD stages with high macro-averaged and micro-averaged metrics

    Evolutionary Granular Kernel Machines

    Get PDF
    Kernel machines such as Support Vector Machines (SVMs) have been widely used in various data mining applications with good generalization properties. Performance of SVMs for solving nonlinear problems is highly affected by kernel functions. The complexity of SVMs training is mainly related to the size of a training dataset. How to design a powerful kernel, how to speed up SVMs training and how to train SVMs with millions of examples are still challenging problems in the SVMs research. For these important problems, powerful and flexible kernel trees called Evolutionary Granular Kernel Trees (EGKTs) are designed to incorporate prior domain knowledge. Granular Kernel Tree Structure Evolving System (GKTSES) is developed to evolve the structures of Granular Kernel Trees (GKTs) without prior knowledge. A voting scheme is also proposed to reduce the prediction deviation of GKTSES. To speed up EGKTs optimization, a master-slave parallel model is implemented. To help SVMs challenge large-scale data mining, a Minimum Enclosing Ball (MEB) based data reduction method is presented, and a new MEB-SVM algorithm is designed. All these kernel methods are designed based on Granular Computing (GrC). In general, Evolutionary Granular Kernel Machines (EGKMs) are investigated to optimize kernels effectively, speed up training greatly and mine huge amounts of data efficiently

    Gene Expression Analysis Methods on Microarray Data a A Review

    Get PDF
    In recent years a new type of experiments are changing the way that biologists and other specialists analyze many problems. These are called high throughput experiments and the main difference with those that were performed some years ago is mainly in the quantity of the data obtained from them. Thanks to the technology known generically as microarrays, it is possible to study nowadays in a single experiment the behavior of all the genes of an organism under different conditions. The data generated by these experiments may consist from thousands to millions of variables and they pose many challenges to the scientists who have to analyze them. Many of these are of statistical nature and will be the center of this review. There are many types of microarrays which have been developed to answer different biological questions and some of them will be explained later. For the sake of simplicity we start with the most well known ones: expression microarrays
    corecore