Search CORE

110 research outputs found

Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery

Author: D Mantini
E Petricoin
Henry Han
HW Ressom
J Nocedal
JS Yu
KR Coombes
M Gonen
M Hauskrecht
P Hoyer
R Lilien
R Zass
T Alexandrov
V Vapnik
X Han
X Han
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background As a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnostic paradigm is that robustly identifying cancer molecular patterns from high-dimensional protein expression data is still a challenge in machine learning and oncology research. As a well-established dimension reduction technique, PCA is widely integrated in pattern recognition analysis to discover cancer molecular patterns. However, its global feature selection mechanism prevents it from capturing local features. This may lead to difficulty in achieving high-performance proteomic pattern discovery, because only features interpreting global data behavior are used to train a learning machine. Methods In this study, we develop a nonnegative principal component analysis algorithm and present a nonnegative principal component analysis based support vector machine algorithm with sparse coding to conduct a high-performance proteomic pattern classification. Moreover, we also propose a nonnegative principal component analysis based filter-wrapper biomarker capturing algorithm for mass spectral serum profiles. Results We demonstrate the superiority of the proposed algorithm by comparison with six peer algorithms on four benchmark datasets. Moreover, we illustrate that nonnegative principal component analysis can be effectively used to capture meaningful biomarkers. Conclusion Our analysis suggests that nonnegative principal component analysis effectively conduct local feature selection for mass spectral profiles and contribute to improving sensitivities and specificities in the following classification, and meaningful biomarker discovery.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data

Author: AC Sauve
DI Malyarenko
EF Petricoin
ET Fung
GL Wright Jr
H Hong
JJ Goeman
Jos H Beijnen
JS Morris
Judith YMN Engwegen
JYMN Engwegen
KA Baggerly
L Breiman
Lodewyk FA Wessels
M de Noo
M Dijkstra
Marcel JT Reinders
Marie-Christine W Gast
ME de Noo
OJ Semmes
VN Vapnik
Wouter Meuleman
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Mass spectrometry for biological data analysis is an active field of research, providing an efficient way of high-throughput proteome screening. A popular variant of mass spectrometry is SELDI, which is often used to measure sample populations with the goal of developing (clinical) classifiers. Unfortunately, not only is the data resulting from such measurements quite noisy, variance between replicate measurements of the same sample can be high as well. Normalisation of spectra can greatly reduce the effect of this technical variance and further improve the quality and interpretability of the data. However, it is unclear which normalisation method yields the most informative result. Results In this paper, we describe the first systematic comparison of a wide range of normalisation methods, using two objectives that should be met by a good method. These objectives are minimisation of inter-spectra variance and maximisation of signal with respect to class separation. The former is assessed using an estimation of the coefficient of variation, the latter using the classification performance of three types of classifiers on real-world datasets representing two-class diagnostic problems. To obtain a maximally robust evaluation of a normalisation method, both objectives are evaluated over multiple datasets and multiple configurations of baseline correction and peak detection methods. Results are assessed for statistical significance and visualised to reveal the performance of each normalisation method, in particular with respect to using no normalisation. The normalisation methods described have been implemented in the freely available MASDA R-package. Conclusion In the general case, normalisation of mass spectra is beneficial to the quality of data. The majority of methods we compared performed significantly better than the case in which no normalisation was used. We have shown that normalisation methods that scale spectra by a factor based on the dispersion (e.g., standard deviation) of the data clearly outperform those where a factor based on the central location (e.g., mean) is used. Additional improvements in performance are obtained when these factors are estimated locally, using a sliding window within spectra, instead of globally, over full spectra. The underperforming category of methods using a globally estimated factor based on the central location of the data includes the method used by the majority of SELDI users.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

On consensus biomarker selection

Author: A Gambin
A Gambin
Anna Gambin
B Scholkopf
B Wu
BL Adam
C Dwork
CA Smith
EF Petricoin
GA Jones
IJ Jacobs
IT Jolliffe
J Li
Janusz Dutkowski
JS Yu
L Breiman
M Luksza
P Geurts
P Pokarowski
R Tibshirani
RH Lilien
T Hastie
T Speed
V Vapnik
WK Grassmann
WN Venables
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Recent development of mass spectrometry technology enabled the analysis of complex peptide mixtures. A lot of effort is currently devoted to the identification of biomarkers in human body fluids like serum or plasma, based on which new diagnostic tests for different diseases could be constructed. Various biomarker selection procedures have been exploited in recent studies. It has been noted that they often lead to different biomarker lists and as a consequence, the patient classification may also vary. Results Here we propose a new approach to the biomarker selection problem: to apply several competing feature ranking procedures and compute a consensus list of features based on their outcomes. We validate our methods on two proteomic datasets for the diagnosis of ovarian and prostate cancer. Conclusion The proposed methodology can improve the classification results and at the same time provide a unified biomarker list for further biological examinations and interpretation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information

Author: A Armon
A Pintar
A Pintar
A Smith
A Yamaguchi
AJ Bordner
AT Laurie
B Huang
B Rost
C Fufezan
CJ Reedy
DG Levitt
DT Jones
F Glaser
FP Guengerich
GJ Bartlett
HB Gray
HR Ansari
HX Zhou
IB Kuznetsov
J Liang
J Mihel
JA Capra
JC Nebel
Jianjun Hu
JS Chauhan
JS Chauhan
JS Sodhi
LJ Smith
M Brylinski
M Hendlich
M Paoli
M Weisel
N Igarashi
NB Terwilliger
NK Mishra
O Schueler-Furman
RA Laskowski
Rong Liu
RR Thangudu
S Henrich
S Jones
S Schneider
SF Altschul
SM Mense
T Guo
T Pupko
V Sobolev
V Sobolev
VN Vapnik
W De Laurentis
W Kabsch
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues. Results Here we introduced an efficient algorithm HemeBIND for predicting heme binding residues by integrating structural and sequence information. We systematically investigated the characteristics of binding interfaces based on a non-redundant dataset of heme-protein complexes. It was found that several sequence and structural attributes such as evolutionary conservation, solvent accessibility, depth and protrusion clearly illustrate the differences between heme binding and non-binding residues. These features can then be separately used or combined to build the structure-based classifiers using support vector machine (SVM). The results showed that the information contained in these features is largely complementary and their combination achieved the best performance. To further improve the performance, an attempt has been made to develop a post-processing procedure to reduce the number of false positives. In addition, we built a sequence-based classifier based on SVM and sequence profile as an alternative when only sequence information can be used. Finally, we employed a voting method to combine the outputs of structure-based and sequence-based classifiers, which demonstrated remarkably better performance than the individual classifier alone. Conclusions HemeBIND is the first specialized algorithm used to predict binding residues in protein structures for heme ligands. Extensive experiments indicated that both the structure-based and sequence-based methods have effectively identified heme binding residues while the complementary relationship between them can result in a significant improvement in prediction performance. The value of our method is highlighted through the development of HemeBIND web server that is freely accessible at <url>http://mleg.cse.sc.edu/hemeBIND/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Scholar Commons - Institutional Repository of the University of South Carolina

An Ensemble Analysis of Electromyographic Activity during Whole Body Pointing with the Use of Support Vector Machines

Author: A David
A Ghazanfar
A Tolambiya
A Tolambiya
AC Rencher
Arvind Tolambiya
B Berret
Bastien Berret
C Cortes
C Hart
C Papaxanthis
Carletta Jean
D Howell
DA Winter
DA Winter
E Chiovetto
E Thomas
Eleni Vasilaki
Elizabeth Thomas
Enrico Chiovetto
ER Kandell
F Horak
FP Kendall
G Bosco
G Cheron
HJ Hufschmidt
J Massion
JF Soechting
JS Thomas
K Chan
L Fautrelle
LG Grimm
MA Nicolelis
MF Bear
PJ Cordo
PJ Stapley
PR Hinton
R Begg
R Fisher
R Poppele
R Shadmehr
RA Schmidt
S Nair
S Theodoridis
T Pozzo
Thierry Pozzo
TR Kaminski
TR Kaminski
V Cherkassky
Vapnik
VN Vapnik
Y Saeys
YP Ivanenko
Z Wang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

We explored the use of support vector machines (SVM) in order to analyze the ensemble activities of 24 postural and focal muscles recorded during a whole body pointing task. Because of the large number of variables involved in motor control studies, such multivariate methods have much to offer over the standard univariate techniques that are currently employed in the field to detect modifications. The SVM was used to uncover the principle differences underlying several variations of the task. Five variants of the task were used. An unconstrained reaching, two constrained at the focal level and two at the postural level. Using the electromyographic (EMG) data, the SVM proved capable of distinguishing all the unconstrained from the constrained conditions with a success of approximately 80% or above. In all cases, including those with focal constraints, the collective postural muscle EMGs were as good as or better than those from focal muscles for discriminating between conditions. This was unexpected especially in the case with focal constraints. In trying to rank the importance of particular features of the postural EMGs we found the maximum amplitude rather than the moment at which it occurred to be more discriminative. A classification using the muscles one at a time permitted us to identify some of the postural muscles that are significantly altered between conditions. In this case, the use of a multivariate method also permitted the use of the entire muscle EMG waveform rather than the difficult process of defining and extracting any particular variable. The best accuracy was obtained from muscles of the leg rather than from the trunk. By identifying the features that are important in discrimination, the use of the SVM permitted us to identify some of the features that are adapted when constraints are placed on a complex motor task

HAL-uB

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

ε-Distance Weighted Support Vector Regression

Author: AF Izmailov
B Demir
B Scholkopf
CC Chang
DN Guo
GX Yuan
JD Brown
JS Marron
LH Dicker
Léon Bottou
N Srivastava
PK Rajaraman
RE Fan
V Vapnik
XY Qiao
Y Ke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We gratefully thank Dr Teng Zhang and Prof Zhi-Hua Zhou for providing the source code of “LDM”, and their kind technical assistance. We also thank Prof Chih-Jen Lins team for providing the LIBSVM and LIBLINEAR packages and their support. This work is supported by the National Natural Science Foundation of China (Grant Nos.61472159, 61572227) and Development Project of Jilin Province of China (Grant Nos. 20140101180JC, 20160204022GX, 20180414012G H). This work is also partially supported by the 2015 Scottish Crucible Award funded by the Royal Society of Edinburgh and the 2016 PECE bursary provided by the Scottish Informatics & Computer Science Alliance (SICSA).Postprin

Aberdeen University Research

Heriot Watt Pure

Crossref

Protein ligand-specific binding residue predictions by an ensemble classifier

Author: A Roy
AT Laurie
B Huang
B Panwar
BH Dessailly
BK Dukka
C Fang
C-C Chang
C-H Ngan
CH Lu
D Xu
DW Buchan
GY Wong
HR Ansari
I Mayrose
J Konc
J Yang
J Yang
J Yang
JA Capra
JA Capra
JA Horst
JS Chauhan
JS Chauhan
K Chen
K Chen
Kai Wang
L Fu
M Brylinski
N Shu
NK Mishra
P Chen
PW Rose
Q Dong
Qiwen Dong
R Liu
R Wang
S Leis
S Wu
S Wu
SF Altschul
T Gallo Cassarino
T Pupko
T Schmidt
U Consortium
V Sobolev
V Sobolev
VN Vapnik
W Nemoto
X Ma
Xiuzhen Hu
Y Freund
Z Zhang
ZR Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Multimodal microscopy for automated histologic analysis of prostate cancer

Author: A Jemal
A Madabhushi
A Tabesh
AW Wetzel
C Beleites
C Petibois
D Helm
DC Fernandez
DC Malins
DF Gleanson
DI Ellis
DM Berney
E Ly
EKW Schulte
ER Dougherty
G Budinova
H Fabian
HC Peng
IW Levin
J Diamond
JA Nelder
JI Epstein
JI Epstein
Jin Tae Kwak
JS Lee
K Jafari-Khouzani
K Morik
L Mulrane
LG Brown
M Arif
M Diem
M Roula
MN Gurcan
N Landwehr
P Pudil
PA Humphrey
PF Pinsky
PH Bartels
PW Huang
R Bhargava
R Bhargava
R Bhargava
R Farjam
R Farjam
R Stotzka
RA Shaw
Rohit Bhargava
S Doyle
S Naik
Saurabh Sinha
SJ Jacobsen
SM Gilbert
SM Pizer
Stephen M Hewitt
VN Vapnik
Y Smith
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Prostate cancer is the single most prevalent cancer in US men whose gold standard of diagnosis is histologic assessment of biopsies. Manual assessment of stained tissue of all biopsies limits speed and accuracy in clinical practice and research of prostate cancer diagnosis. We sought to develop a fully-automated multimodal microscopy method to distinguish cancerous from non-cancerous tissue samples. Methods We recorded chemical data from an unstained tissue microarray (TMA) using Fourier transform infrared (FT-IR) spectroscopic imaging. Using pattern recognition, we identified epithelial cells without user input. We fused the cell type information with the corresponding stained images commonly used in clinical practice. Extracted morphological features, optimized by two-stage feature selection method using a minimum-redundancy-maximal-relevance (mRMR) criterion and sequential floating forward selection (SFFS), were applied to classify tissue samples as cancer or non-cancer. Results We achieved high accuracy (area under ROC curve (AUC) >0.97) in cross-validations on each of two data sets that were stained under different conditions. When the classifier was trained on one data set and tested on the other data set, an AUC value of ~0.95 was observed. In the absence of IR data, the performance of the same classification system dropped for both data sets and between data sets. Conclusions We were able to achieve very effective fusion of the information from two different images that provide very different types of data with different characteristics. The method is entirely transparent to a user and does not involve any adjustment or decision-making based on spectral data. By combining the IR and optical data, we achieved high accurate classification.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

Author: A Schlessinger
A Schlessinger
A Schlessinger
AG de Brevern
B Rost
B Rost
B Rost
B Xue
C Bystroff
C Haynes
C Mooney
C Zhang
C Zheng
Christian Schönbach
D Xie
DT Jones
E Faraggi
E Faraggi
G Helles
Geoffrey I. Webb
GN Ramachandran
GP Raghava
H Zhang
H Zhang
Hao Tan
HJ Dyson
HS Kang
J Cheng
J Gao
J Gsponer
J Song
J Song
J Song
J Song
J Song
J Song
Jiangning Song
JJ Ward
JS Chauhan
K Chen
K Chen
K Chen
L Chen
L Kurgan
M Kumar
Mingjun Wang
MJ Mizianty
MJ Rooman
MJ Wood
MJ Wood
MK Kalita
MN Nguyen
MN Nguyen
MV Berjanskii
O Dor
O Dor
O Zimmermann
P Chen
P Kountouris
P Kountouris
P Sliz
PC Chen
R Gaudet
R Karchin
R Kuang
R Verma
S Ahmad
S Ahmad
S Liang
S Qiu
S Wu
S Wu
SF Altschul
T Ishida
T Zhang
T Zhang
Tatsuya Akutsu
V Vapnik
V Vapnik
W Kabsch
W Liu
W Zhang
X Miao
X Wang
XY Pan
Y Ofran
Y Ofran
YM Huang
Z Markovic-Housley
Z Yuan
Z Yuan
Z Yuan
Z Yuan
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Monash University Research Portal

Maximizing upgrading and downgrading margins for ordinal regression

Author: A Shashua
AP Bradley
Belen Martin-Barragan
C Cortes
DJ Hand
E Bredensteiner
E Carrizosa
E Carrizosa
E Carrizosa
E Carrizosa
E Grigoroudis
EL Allwein
Emilio Carrizosa
F Plastria
G Ballarino
H Nakayama
J Mercer
J Shawe-Taylor
JC Platt
JP Pedroso
JS Cardoso
L Li
MA Kupinski
N Cristianini
NM Adams
OL Mangasarian
R Herbrich
R Lall
RM Everson
T Hastie
T Jiao
V Vapnik
V Vapnik
W Chu
W Waegeman
Y Guermeur
Y Jin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2011
Field of study

In ordinal regression, a score function and threshold values are sought to classify a set of objects into a set of ranked classes. Classifying an individual in a class with higher (respectively lower) rank than its actual rank is called an upgrading (respectively downgrading) error. Since upgrading and downgrading errors may not have the same importance, they should be considered as two different criteria to be taken into account when measuring the quality of a classifier. In Support Vector Machines, margin maximization is used as an effective and computationally tractable surrogate of the minimization of misclassification errors. As an extension, we consider in this paper the maximization of upgrading and downgrading margins as a surrogate of the minimization of upgrading and downgrading errors, and we address the biobjective problem of finding a classifier maximizing simultaneously the two margins. The whole set of Pareto-optimal solutions of such biobjective problem is described as translations of the optimal solutions of a scalar optimization problem. For the most popular case in which the Euclidean norm is considered, the scalar problem has a unique solution, yielding that all the Pareto-optimal solutions of the biobjective problem are translations of each other. Hence, the Pareto-optimal solutions can easily be provided to the analyst, who, after inspection of the misclassification errors caused, should choose in a later stage the most convenient classifier. The consequence of this analysis is that it provides a theoretical foundation for a popular strategy among practitioners, based on the so-called ROC curve, which is shown here to equal the set of Pareto-optimal solutions of maximizing simultaneously the downgrading and upgrading margins

Crossref

Edinburgh Research Explorer

idUS. Depósito de Investigación Universidad de Sevilla