Search CORE

4,921 research outputs found

Kernel methods in genomics and computational biology

Author: Vert Jean-Philippe
Publication venue
Publication date: 17/10/2005
Field of study

Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

arXiv.org e-Print Archive

HAL-MINES ParisTech

Recommended from our members

Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy.

MotivationMultiple biological clocks govern a healthy pregnancy. These biological mechanisms produce immunologic, metabolomic, proteomic, genomic and microbiomic adaptations during the course of pregnancy. Modeling the chronology of these adaptations during full-term pregnancy provides the frameworks for future studies examining deviations implicated in pregnancy-related pathologies including preterm birth and preeclampsia.ResultsWe performed a multiomics analysis of 51 samples from 17 pregnant women, delivering at term. The datasets included measurements from the immunome, transcriptome, microbiome, proteome and metabolome of samples obtained simultaneously from the same patients. Multivariate predictive modeling using the Elastic Net (EN) algorithm was used to measure the ability of each dataset to predict gestational age. Using stacked generalization, these datasets were combined into a single model. This model not only significantly increased predictive power by combining all datasets, but also revealed novel interactions between different biological modalities. Future work includes expansion of the cohort to preterm-enriched populations and in vivo analysis of immune-modulating interventions based on the mechanisms identified.Availability and implementationDatasets and scripts for reproduction of results are available through: https://nalab.stanford.edu/multiomics-pregnancy/.Supplementary informationSupplementary data are available at Bioinformatics online

eScholarship - University of California

PolyPublie

Texture analysis in gel electrophoresis images using an integrative kernel-based approach

Author: Campbell I C G
Dorado Julian
Fernandez-Lozano Carlos
Gaunt Tom R
Gestal Marcos
Pazos Alejandro
Seoane Jose A. A
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

[Abstract] Texture information could be used in proteomics to improve the quality of the image analysis of proteins separated on a gel. In order to evaluate the best technique to identify relevant textures, we use several different kernel-based machine learning techniques to classify proteins in 2-DE images into spot and noise. We evaluate the classification accuracy of each of these techniques with proteins extracted from ten 2-DE images of different types of tissues and different experimental conditions. We found that the best classification model was FSMKL, a data integration method using multiple kernel learning, which achieved AUROC values above 95% while using a reduced number of features. This technique allows us to increment the interpretability of the complex combinations of textures and to weight the importance of each particular feature in the final model. In particular the Inverse Difference Moment exhibited the highest discriminating power. A higher value can be associated with an homogeneous structure as this feature describes the homogeneity; the larger the value, the more symmetric. The final model is performed by the combination of different groups of textural features. Here we demonstrated the feasibility of combining different groups of textures in 2-DE image analysis for spot detection.Instituto de Salud Carlos III; PI13/00280United Kingdom. Medical Research Council; G10000427, MC_UU_12013/8Galicia. Consellería de Economía e Industria; 10SIN105004P

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

PubMed Central

Explore Bristol Research

Stable Feature Selection for Biomarker Discovery

Author: He Zengyou
Yu Weichuan
Publication venue
Publication date: 01/01/2010
Field of study

Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

arXiv.org e-Print Archive

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

New trends in data mining.

Author: Baesens Bart
Denys K
Huysmans Johan
Martens David
Vanthienen Jan
Publication venue
Publication date
Field of study

Trends; Data; Data mining;

Research Papers in Economics

A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection

Author: Antonio d'Acierno
Ceccarelli Michele
Michele MicheleCeccarelli
procedure hav Angelo Facchiano
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Mass spectrometry spectra, widely used in proteomics studies as a screening tool for protein profiling and to detect discriminatory signals, are high dimensional data. A large number of local maxima (a.k.a. <it>peaks</it>) have to be analyzed as part of computational pipelines aimed at the realization of efficient predictive and screening protocols. With this kind of data dimensions and samples size the risk of over-fitting and selection bias is pervasive. Therefore the development of bio-informatics methods based on unsupervised feature extraction can lead to general tools which can be applied to several fields of predictive proteomics. Results We propose a method for feature selection and extraction grounded on the theory of multi-scale spaces for high resolution spectra derived from analysis of serum. Then we use support vector machines for classification. In particular we use a database containing 216 samples spectra divided in 115 cancer and 91 control samples. The overall accuracy averaged over a large cross validation study is 98.18. The area under the ROC curve of the best selected model is 0.9962. Conclusion We improved previous known results on the problem on the same data, with the advantage that the proposed method has an unsupervised feature selection phase. All the developed code, as MATLAB scripts, can be downloaded from <url>http://medeaserver.isa.cnr.it/dacierno/spectracode.htm</url></p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Multicentric validation of proteomic biomarkers in urine specific for diabetic nephropathy

Background: Urine proteome analysis is rapidly emerging as a tool for diagnosis and prognosis in disease states. For diagnosis of diabetic nephropathy (DN), urinary proteome analysis was successfully applied in a pilot study. The validity of the previously established proteomic biomarkers with respect to the diagnostic and prognostic potential was assessed on a separate set of patients recruited at three different European centers. In this case-control study of 148 Caucasian patients with diabetes mellitus type 2 and duration >= 5 years, cases of DN were defined as albuminuria >300 mg/d and diabetic retinopathy (n = 66). Controls were matched for gender and diabetes duration (n = 82). Methodology/Principal Findings: Proteome analysis was performed blinded using high-resolution capillary electrophoresis coupled with mass spectrometry (CE-MS). Data were evaluated employing the previously developed model for DN. Upon unblinding, the model for DN showed 93.8% sensitivity and 91.4% specificity, with an AUC of 0.948 (95% CI 0.898-0.978). Of 65 previously identified peptides, 60 were significantly different between cases and controls of this study. In <10% of cases and controls classification by proteome analysis not entirely resulted in the expected clinical outcome. Analysis of patient's subsequent clinical course revealed later progression to DN in some of the false positive classified DN control patients. Conclusions: These data provide the first independent confirmation that profiling of the urinary proteome by CE-MS can adequately identify subjects with DN, supporting the generalizability of this approach. The data further establish urinary collagen fragments as biomarkers for diabetes-induced renal damage that may serve as earlier and more specific biomarkers than the currently used urinary albumin

Public Library of Science (PLOS)

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

PubMed Central

Enlighten

Dissertations of the University of Groningen

Severe childhood malaria syndromes defined by plasma proteome profiles

Author: A Hodgetts
Adebola E. Orimadegun
AE Orimadegun
AK Rowe
AL Conroy
Barry K. Ely
Biobele J. Brown
C Wu
D Agranoff
Delmiro Fernandez-Reyes
Dimitrios Athanasakis
FE Lovegrove
Felix O. Akinbami
Florence Burté
Francesca Battaglia
Francis Akinkunmi
FX Wu
G Sandhu
J Amzat
J Bryce
K Marsh
Kevin KA. Tetteh
Kikelomo Osinusi
LK Erdman
MC Papadopoulos
Nathaniel K. Afolabi
Olayinka Kowobari
Olugbemiro Sodeinde
RT Pang
RW Snow
S Rojas-Galeano
Samuel Omokhodion
SI Hay
SI Hay
SJ Ceesay
Wasiu A. Ajetunmobi
Wuraola A. Shokunbi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

BACKGROUND Cerebral malaria (CM) and severe malarial anemia (SMA) are the most serious life-threatening clinical syndromes of Plasmodium falciparum infection in childhood. Therefore it is important to understand the pathology underlying the development of CM and SMA, as opposed to uncomplicated malaria (UM). Different host responses to infection are likely to be reflected in plasma proteome-patterns that associate with clinical status and therefore provide indicators of the pathogenesis of these syndromes. METHODS AND FINDINGS Plasma and comprehensive clinical data for discovery and validation cohorts were obtained as part of a prospective case-control study of severe childhood malaria at the main tertiary hospital of the city of Ibadan, an urban and densely populated holoendemic malaria area in Nigeria. A total of 946 children participated in this study. Plasma was subjected to high-throughput proteomic profiling. Statistical pattern-recognition methods were used to find proteome-patterns that defined disease groups. Plasma proteome-patterns accurately distinguished children with CM and with SMA from those with UM, and from healthy or severely ill malaria-negative children. CONCLUSIONS We report that an accurate definition of the major childhood malaria syndromes can be achieved using plasma proteome-patterns. Our proteomic data can be exploited to understand the pathogenesis of the different childhood severe malaria syndromes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Sussex Research Online

FigShare