4,921 research outputs found

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    Texture analysis in gel electrophoresis images using an integrative kernel-based approach

    Get PDF
    [Abstract] Texture information could be used in proteomics to improve the quality of the image analysis of proteins separated on a gel. In order to evaluate the best technique to identify relevant textures, we use several different kernel-based machine learning techniques to classify proteins in 2-DE images into spot and noise. We evaluate the classification accuracy of each of these techniques with proteins extracted from ten 2-DE images of different types of tissues and different experimental conditions. We found that the best classification model was FSMKL, a data integration method using multiple kernel learning, which achieved AUROC values above 95% while using a reduced number of features. This technique allows us to increment the interpretability of the complex combinations of textures and to weight the importance of each particular feature in the final model. In particular the Inverse Difference Moment exhibited the highest discriminating power. A higher value can be associated with an homogeneous structure as this feature describes the homogeneity; the larger the value, the more symmetric. The final model is performed by the combination of different groups of textural features. Here we demonstrated the feasibility of combining different groups of textures in 2-DE image analysis for spot detection.Instituto de Salud Carlos III; PI13/00280United Kingdom. Medical Research Council; G10000427, MC_UU_12013/8Galicia. Consellería de Economía e Industria; 10SIN105004P

    Stable Feature Selection for Biomarker Discovery

    Full text link
    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

    New trends in data mining.

    Get PDF
    Trends; Data; Data mining;

    A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry spectra, widely used in proteomics studies as a screening tool for protein profiling and to detect discriminatory signals, are high dimensional data. A large number of local maxima (a.k.a. <it>peaks</it>) have to be analyzed as part of computational pipelines aimed at the realization of efficient predictive and screening protocols. With this kind of data dimensions and samples size the risk of over-fitting and selection bias is pervasive. Therefore the development of bio-informatics methods based on unsupervised feature extraction can lead to general tools which can be applied to several fields of predictive proteomics.</p> <p>Results</p> <p>We propose a method for feature selection and extraction grounded on the theory of multi-scale spaces for high resolution spectra derived from analysis of serum. Then we use support vector machines for classification. In particular we use a database containing 216 samples spectra divided in 115 cancer and 91 control samples. The overall accuracy averaged over a large cross validation study is 98.18. The area under the ROC curve of the best selected model is 0.9962.</p> <p>Conclusion</p> <p>We improved previous known results on the problem on the same data, with the advantage that the proposed method has an unsupervised feature selection phase. All the developed code, as MATLAB scripts, can be downloaded from <url>http://medeaserver.isa.cnr.it/dacierno/spectracode.htm</url></p

    Multicentric validation of proteomic biomarkers in urine specific for diabetic nephropathy

    Get PDF
    Background: Urine proteome analysis is rapidly emerging as a tool for diagnosis and prognosis in disease states. For diagnosis of diabetic nephropathy (DN), urinary proteome analysis was successfully applied in a pilot study. The validity of the previously established proteomic biomarkers with respect to the diagnostic and prognostic potential was assessed on a separate set of patients recruited at three different European centers. In this case-control study of 148 Caucasian patients with diabetes mellitus type 2 and duration &gt;= 5 years, cases of DN were defined as albuminuria &gt;300 mg/d and diabetic retinopathy (n = 66). Controls were matched for gender and diabetes duration (n = 82). Methodology/Principal Findings: Proteome analysis was performed blinded using high-resolution capillary electrophoresis coupled with mass spectrometry (CE-MS). Data were evaluated employing the previously developed model for DN. Upon unblinding, the model for DN showed 93.8% sensitivity and 91.4% specificity, with an AUC of 0.948 (95% CI 0.898-0.978). Of 65 previously identified peptides, 60 were significantly different between cases and controls of this study. In &lt;10% of cases and controls classification by proteome analysis not entirely resulted in the expected clinical outcome. Analysis of patient's subsequent clinical course revealed later progression to DN in some of the false positive classified DN control patients. Conclusions: These data provide the first independent confirmation that profiling of the urinary proteome by CE-MS can adequately identify subjects with DN, supporting the generalizability of this approach. The data further establish urinary collagen fragments as biomarkers for diabetes-induced renal damage that may serve as earlier and more specific biomarkers than the currently used urinary albumin

    Severe childhood malaria syndromes defined by plasma proteome profiles

    Get PDF
    BACKGROUND Cerebral malaria (CM) and severe malarial anemia (SMA) are the most serious life-threatening clinical syndromes of Plasmodium falciparum infection in childhood. Therefore it is important to understand the pathology underlying the development of CM and SMA, as opposed to uncomplicated malaria (UM). Different host responses to infection are likely to be reflected in plasma proteome-patterns that associate with clinical status and therefore provide indicators of the pathogenesis of these syndromes. METHODS AND FINDINGS Plasma and comprehensive clinical data for discovery and validation cohorts were obtained as part of a prospective case-control study of severe childhood malaria at the main tertiary hospital of the city of Ibadan, an urban and densely populated holoendemic malaria area in Nigeria. A total of 946 children participated in this study. Plasma was subjected to high-throughput proteomic profiling. Statistical pattern-recognition methods were used to find proteome-patterns that defined disease groups. Plasma proteome-patterns accurately distinguished children with CM and with SMA from those with UM, and from healthy or severely ill malaria-negative children. CONCLUSIONS We report that an accurate definition of the major childhood malaria syndromes can be achieved using plasma proteome-patterns. Our proteomic data can be exploited to understand the pathogenesis of the different childhood severe malaria syndromes
    • …
    corecore