3,407 research outputs found

    Innovative Hybridisation of Genetic Algorithms and Neural Networks in Detecting Marker Genes for Leukaemia Cancer

    Get PDF
    Methods for extracting marker genes that trigger the growth of cancerous cells from a high level of complexity microarrays are of much interest from the computing community. Through the identified genes, the pathology of cancerous cells can be revealed and early precaution can be taken to prevent further proliferation of cancerous cells. In this paper, we propose an innovative hybridised gene identification framework based on genetic algorithms and neural networks to identify marker genes for leukaemia disease. Our approach confirms that high classification accuracy does not ensure the optimal set of genes have been identified and our model delivers a more promising set of genes even with a lower classification accurac

    Optimal classifier selection and negative bias in error rate estimation: An empirical study on high-dimensional prediction

    Get PDF
    In biometric practice, researchers often apply a large number of different methods in a "trial-and-error" strategy to get as much as possible out of their data and, due to publication pressure or pressure from the consulting customer, present only the most favorable results. This strategy may induce a substantial optimistic bias in prediction error estimation, which is quantitatively assessed in the present manuscript. The focus of our work is on class prediction based on high-dimensional data (e.g. microarray data), since such analyses are particularly exposed to this kind of bias. In our study we consider a total of 124 variants of classifiers (possibly including variable selection or tuning steps) within a cross-validation evaluation scheme. The classifiers are applied to original and modified real microarray data sets, some of which are obtained by randomly permuting the class labels to mimic non-informative predictors while preserving their correlation structure. We then assess the minimal misclassification rate over the different variants of classifiers in order to quantify the bias arising when the optimal classifier is selected a posteriori in a data-driven manner. The bias resulting from the parameter tuning (including gene selection parameters as a special case) and the bias resulting from the choice of the classification method are examined both separately and jointly. We conclude that the strategy to present only the optimal result is not acceptable, and suggest alternative approaches for properly reporting classification accuracy

    An Overview of the Use of Neural Networks for Data Mining Tasks

    Get PDF
    In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

    GliomaPredict: A Clinically Useful Tool for Assigning Glioma Patients to Specific Molecular Subtypes

    Get PDF
    Background: Advances in generating genome-wide gene expression data have accelerated the development of molecular-based tumor classification systems. Tools that allow the translation of such molecular classification schemas from research into clinical applications are still missing in the emerging era of personalized medicine. Results: We developed GliomaPredict as a computational tool that allows the fast and reliable classification of glioma patients into one of six previously published stratified subtypes based on sets of extensively validated classifiers derived from hundreds of glioma transcriptomic profiles. Our tool utilizes a principle component analysis (PCA)-based approach to generate a visual representation of the analyses, quantifies the confidence of the underlying subtype assessment and presents results as a printable PDF file. GliomaPredict tool is implemented as a plugin application for the widely-used GenePattern framework. Conclusions: GliomaPredict provides a user-friendly, clinically applicable novel platform for instantly assigning gene expression-based subtype in patients with gliomas thereby aiding in clinical trial design and therapeutic decisionmaking. Implemented as a user-friendly diagnostic tool, we expect that in time GliomaPredict, and tools like it, will become routinely used in translational/clinical research and in the clinical care of patients with gliomas

    Robustness of Random Forest-based gene selection methods

    Full text link
    Gene selection is an important part of microarray data analysis because it provides information that can lead to a better mechanistic understanding of an investigated phenomenon. At the same time, gene selection is very difficult because of the noisy nature of microarray data. As a consequence, gene selection is often performed with machine learning methods. The Random Forest method is particularly well suited for this purpose. In this work, four state-of-the-art Random Forest-based feature selection methods were compared in a gene selection context. The analysis focused on the stability of selection because, although it is necessary for determining the significance of results, it is often ignored in similar studies. The comparison of post-selection accuracy in the validation of Random Forest classifiers revealed that all investigated methods were equivalent in this context. However, the methods substantially differed with respect to the number of selected genes and the stability of selection. Of the analysed methods, the Boruta algorithm predicted the most genes as potentially important. The post-selection classifier error rate, which is a frequently used measure, was found to be a potentially deceptive measure of gene selection quality. When the number of consistently selected genes was considered, the Boruta algorithm was clearly the best. Although it was also the most computationally intensive method, the Boruta algorithm's computational demands could be reduced to levels comparable to those of other algorithms by replacing the Random Forest importance with a comparable measure from Random Ferns (a similar but simplified classifier). Despite their design assumptions, the minimal optimal selection methods, were found to select a high fraction of false positives

    Biomarkers in solid organ transplantation: establishing personalized transplantation medicine.

    Get PDF
    Technological advances in molecular and in silico research have enabled significant progress towards personalized transplantation medicine. It is now possible to conduct comprehensive biomarker development studies of transplant organ pathologies, correlating genomic, transcriptomic and proteomic information from donor and recipient with clinical and histological phenotypes. Translation of these advances to the clinical setting will allow assessment of an individual patient's risk of allograft damage or accommodation. Transplantation biomarkers are needed for active monitoring of immunosuppression, to reduce patient morbidity, and to improve long-term allograft function and life expectancy. Here, we highlight recent pre- and post-transplantation biomarkers of acute and chronic allograft damage or adaptation, focusing on peripheral blood-based methodologies for non-invasive application. We then critically discuss current findings with respect to their future application in routine clinical transplantation medicine. Complement-system-associated SNPs present potential biomarkers that may be used to indicate the baseline risk for allograft damage prior to transplantation. The detection of antibodies against novel, non-HLA, MICA antigens, and the expression of cytokine genes and proteins and cytotoxicity-related genes have been correlated with allograft damage and are potential post-transplantation biomarkers indicating allograft damage at the molecular level, although these do not have clinical relevance yet. Several multi-gene expression-based biomarker panels have been identified that accurately predicted graft accommodation in liver transplant recipients and may be developed into a predictive biomarker assay
    • ā€¦
    corecore