1,963 research outputs found

    A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

    Get PDF
    Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

    L2-norm multiple kernel learning and its application to biomedical data fusion

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as <it>L</it><sub>∞</sub>, <it>L</it><sub>1</sub>, and <it>L</it><sub>2 </sub>MKL. In particular, <it>L</it><sub>2 </sub>MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing <it>L</it><sub>∞ </sub>MKL method. In real biomedical applications, <it>L</it><sub>2 </sub>MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources.</p> <p>Results</p> <p>We provide a theoretical analysis of the relationship between the <it>L</it><sub>2 </sub>optimization of kernels in the dual problem with the <it>L</it><sub>2 </sub>coefficient regularization in the primal problem. Understanding the dual <it>L</it><sub>2 </sub>problem grants a unified view on MKL and enables us to extend the <it>L</it><sub>2 </sub>method to a wide range of machine learning problems. We implement <it>L</it><sub>2 </sub>MKL for ranking and classification problems and compare its performance with the sparse <it>L</it><sub>∞ </sub>and the averaging <it>L</it><sub>1 </sub>MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. <it>L</it><sub>2 </sub>MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel <it>L</it><sub>2 </sub>MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing.</p> <p>Conclusions</p> <p>This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in <it>L</it><sub>∞ </sub>MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing <it>L</it><sub>2 </sub>kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL.</p> <p>Availability</p> <p>The MATLAB code of algorithms implemented in this paper is downloadable from <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p

    Deterministic Classifiers Accuracy Optimization for Cancer Microarray Data

    Get PDF
    The objective of this study was to improve classification accuracy in cancer microarray gene expression data using a collection of machine learning algorithms available in WEKA. State of the art deterministic classification methods, such as: Kernel Logistic Regression, Support Vector Machine, Stochastic Gradient Descent and Logistic Model Trees were applied on publicly available cancer microarray datasets aiming to discover regularities that provide insights to help characterization and diagnosis correctness on each cancer typology. The implemented models, relying on 10-fold cross-validation, parameterized to enhance accuracy, reached accuracy above 90%. Moreover, although the variety of methodologies, no significant statistic differences were registered between them, at significance level 0.05, confirming that all the selected methods are effective for this type of analysis.info:eu-repo/semantics/publishedVersio

    Classification of Dengue Fever Patients Based on Gene Expression Data Using Support Vector Machines

    Get PDF
    Background: Symptomatic infection by dengue virus (DENV) can range from dengue fever (DF) to dengue haemorrhagic fever (DHF), however, the determinants of DF or DHF progression are not completely understood. It is hypothesised that host innate immune response factors are involved in modulating the disease outcome and the expression levels of genes involved in this response could be used as early prognostic markers for disease severity. Methodology/Principal Findings: mRNA expression levels of genes involved in DENV innate immune responses were measured using quantitative real time PCR (qPCR). Here, we present a novel application of the support vector machines (SVM) algorithm to analyze the expression pattern of 12 genes in peripheral blood mononuclear cells (PBMCs) of 28 dengue patients (13 DHF and 15 DF) during acute viral infection. The SVM model was trained using gene expression data of these genes and achieved the highest accuracy of ,85% with leave-one-out cross-validation. Through selective removal of gene expression data from the SVM model, we have identified seven genes (MYD88, TLR7, TLR3, MDA5, IRF3, IFN-a and CLEC5A) that may be central in differentiating DF patients from DHF, with MYD88 and TLR7 observed to be the most important. Though the individual removal of expression data of five other genes had no impact on the overall accuracy, a significant combined role was observed when the SVM model of the two main genes (MYD88 and TLR7) was re-trained to include the five genes, increasing the overall accuracy to ,96%. Conclusions/Significance: Here, we present a novel use of the SVM algorithm to classify DF and DHF patients, as well as to elucidate the significance of the various genes involved. It was observed that seven genes are critical in classifying DF and DHF patients: TLR3, MDA5, IRF3, IFN-a, CLEC5A, and the two most important MYD88 and TLR7. While these preliminary results are promising, further experimental investigation is necessary to validate their specific roles in dengue disease

    A Hybrid Metaheuristics based technique for Mutation Based Disease Classification

    Get PDF
    Due to recent advancements in computational biology, DNA microarray technology has evolved as a useful tool in the detection of mutation among various complex diseases like cancer. The availability of thousands of microarray datasets makes this field an active area of research. Early cancer detection can reduce the mortality rate and the treatment cost. Cancer classification is a process to provide a detailed overview of the disease microenvironment for better diagnosis. However, the gene microarray datasets suffer from a curse of dimensionality problems also the classification models are prone to be overfitted due to small sample size and large feature space. To address these issues, the authors have proposed an Improved Binary Competitive Swarm Optimization Whale Optimization Algorithm (IBCSOWOA) for cancer classification, in which IBCSO has been employed to reduce the informative gene subset originated from using minimum redundancy maximum relevance (mRMR) as filter method. The IBCSOWOA technique has been tested on an artificial neural network (ANN) model and the whale optimization algorithm (WOA) is used for parameter tuning of the model. The performance of the proposed IBCSOWOA is tested on six different mutation-based microarray datasets and compared with existing disease prediction methods. The experimental results indicate the superiority of the proposed technique over the existing nature-inspired methods in terms of optimal feature subset, classification accuracy, and convergence rate. The proposed technique has illustrated above 98% accuracy in all six datasets with the highest accuracy of 99.45% in the Lung cancer dataset

    Gene selection for cancer classification with the help of bees

    Full text link
    corecore