Search CORE

1,963 research outputs found

A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

L2-norm multiple kernel learning and its application to biomedical data fusion

Author: A Daemen
A Daemen
Anneleen Daemen
AY Ng
B Schölkopf
Bart De Moor
C Bottomley
C Leslie
DMJ Tax
ED Andersen
FR Bach
G Condous
G Thomas
GC Cawley
GRG Lanckriet
GRG Lanckriet
J Gudmundsson
J Shawe-Taylor
JAK Suykens
JAK Suykens
Johan AK Suykens
JP Ye
K Tretyakov
K Veropoulos
Leon-Charles Tranchevent
M Grant
M Grant
M Kloft
M Kloft
M Kowalski
O Gevaert
R Hettich
R Reemtsen
RA Eeles
S Aerts
S Sonnenburg
S Yu
Shi Yu
SJ Kim
T De Bie
T van den Bosch
Tillmann Falck
V Vapnik
Y Zheng
Yves Moreau
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as <it>L</it>∞, <it>L</it>1, and <it>L</it>2 MKL. In particular, <it>L</it>2 MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing <it>L</it>∞ MKL method. In real biomedical applications, <it>L</it>2 MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources. Results We provide a theoretical analysis of the relationship between the <it>L</it>2 optimization of kernels in the dual problem with the <it>L</it>2 coefficient regularization in the primal problem. Understanding the dual <it>L</it>2 problem grants a unified view on MKL and enables us to extend the <it>L</it>2 method to a wide range of machine learning problems. We implement <it>L</it>2 MKL for ranking and classification problems and compare its performance with the sparse <it>L</it>∞ and the averaging <it>L</it>1 MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. <it>L</it>2 MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel <it>L</it>2 MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing. Conclusions This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in <it>L</it>∞ MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing <it>L</it>2 kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL. Availability The MATLAB code of algorithms implemented in this paper is downloadable from <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deterministic Classifiers Accuracy Optimization for Cancer Microarray Data

Author: B Smith
D Castillo
D Singh
DB Allison
E Glaab
I Polaka
J Cao
J Friedman
J Quackenbush
JD Hoheisel
JR Landis
K Kaliyappan
L Rokach
L Shen
LFA Wessels
M Zahurak
MA Shipp
N Landwehr
N Raghavachari
O Dagliyan
SA Armstrong
T Saito
U Scherf
VN Vapnik
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The objective of this study was to improve classification accuracy in cancer microarray gene expression data using a collection of machine learning algorithms available in WEKA. State of the art deterministic classification methods, such as: Kernel Logistic Regression, Support Vector Machine, Stochastic Gradient Descent and Logistic Model Trees were applied on publicly available cancer microarray datasets aiming to discover regularities that provide insights to help characterization and diagnosis correctness on each cancer typology. The implemented models, relying on 10-fold cross-validation, parameterized to enhance accuracy, reached accuracy above 90%. Moreover, although the variety of methodologies, no significant statistic differences were registered between them, at significance level 0.05, confirming that all the selected methods are effective for this type of analysis.info:eu-repo/semantics/publishedVersio

Crossref

Biblioteca Digital do IPB

Identification of a small optimal subset of CpG sites as bio-markers from high-throughput DNA methylation profiles

Author: Li Guoya
Meng Hailong
Murrelle Edward L
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Classification of Dengue Fever Patients Based on Gene Expression Data Using Support Vector Machines

Author: Ana Lisa V. Gomes
Anita Brandstaetter
Asif M. Khan
C Burges
C Chang
C Cortes
Carlos E. Calzavara-Silva
D Knipe
EC Holmes
EJM Nascimento
EK Tang
Ernesto T. A. Marques
JL Muñoz-Jordán
K Honda
K McKenna
KJ Livak
L Tanner
Laura H. V. G. Gil
Lawrence J. K. Wee
LC Platanias
LJK Wee
LL Coffey
M Jones
M Yoneyama
MD de Kruif
MS Diamond
MT Cordeiro
MT Cordeiro
P Sun
PC Chen
S Ubol
SB Halstead
ST Chen
T Kawai
Tin Wee Tan
Y Tang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Symptomatic infection by dengue virus (DENV) can range from dengue fever (DF) to dengue haemorrhagic fever (DHF), however, the determinants of DF or DHF progression are not completely understood. It is hypothesised that host innate immune response factors are involved in modulating the disease outcome and the expression levels of genes involved in this response could be used as early prognostic markers for disease severity. Methodology/Principal Findings: mRNA expression levels of genes involved in DENV innate immune responses were measured using quantitative real time PCR (qPCR). Here, we present a novel application of the support vector machines (SVM) algorithm to analyze the expression pattern of 12 genes in peripheral blood mononuclear cells (PBMCs) of 28 dengue patients (13 DHF and 15 DF) during acute viral infection. The SVM model was trained using gene expression data of these genes and achieved the highest accuracy of ,85% with leave-one-out cross-validation. Through selective removal of gene expression data from the SVM model, we have identified seven genes (MYD88, TLR7, TLR3, MDA5, IRF3, IFN-a and CLEC5A) that may be central in differentiating DF patients from DHF, with MYD88 and TLR7 observed to be the most important. Though the individual removal of expression data of five other genes had no impact on the overall accuracy, a significant combined role was observed when the SVM model of the two main genes (MYD88 and TLR7) was re-trained to include the five genes, increasing the overall accuracy to ,96%. Conclusions/Significance: Here, we present a novel use of the SVM algorithm to classify DF and DHF patients, as well as to elucidate the significance of the various genes involved. It was observed that seven genes are critical in classifying DF and DHF patients: TLR3, MDA5, IRF3, IFN-a, CLEC5A, and the two most important MYD88 and TLR7. While these preliminary results are promising, further experimental investigation is necessary to validate their specific roles in dengue disease

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

D-Scholarship@Pitt

ScholarBank@NUS

A Hybrid Metaheuristics based technique for Mutation Based Disease Classification

Author: Kumar Dharmender
Phogat Manu
Publication venue: Faculty of Electrical Engineering, J.J. Strossmayer University of Osijek
Publication date: 01/01/2023
Field of study

Due to recent advancements in computational biology, DNA microarray technology has evolved as a useful tool in the detection of mutation among various complex diseases like cancer. The availability of thousands of microarray datasets makes this field an active area of research. Early cancer detection can reduce the mortality rate and the treatment cost. Cancer classification is a process to provide a detailed overview of the disease microenvironment for better diagnosis. However, the gene microarray datasets suffer from a curse of dimensionality problems also the classification models are prone to be overfitted due to small sample size and large feature space. To address these issues, the authors have proposed an Improved Binary Competitive Swarm Optimization Whale Optimization Algorithm (IBCSOWOA) for cancer classification, in which IBCSO has been employed to reduce the informative gene subset originated from using minimum redundancy maximum relevance (mRMR) as filter method. The IBCSOWOA technique has been tested on an artificial neural network (ANN) model and the whale optimization algorithm (WOA) is used for parameter tuning of the model. The performance of the proposed IBCSOWOA is tested on six different mutation-based microarray datasets and compared with existing disease prediction methods. The experimental results indicate the superiority of the proposed technique over the existing nature-inspired methods in terms of optimal feature subset, classification accuracy, and convergence rate. The proposed technique has illustrated above 98% accuracy in all six datasets with the highest accuracy of 99.45% in the Lung cancer dataset

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Gene selection for cancer classification with the help of bees

Author: A Balmain
A Banharnsakun
A Bhattacharjee
A Brazma
A Choudhary
A Dussutour
A Farji-Brener
A Statnikov
A Statnikov
AG Karegowda
AI Su
AV Tinker
B Wu
BJ Norton
BK Verma
C Giallourakis
C Lazar
C Xu
CA Markowski
CC Chang
CJ Tu
CL Nutt
CM Bishop
D Chen
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Karaboga
D Singh
D Teodorovic
DM Gordon
DM Gordon
DM Gordon
DV Nguyen
EL Lehmann
ER Dougherty
F Ahmade
F Emmert-Streib
F Kang
F Kang
F Roces
F Roces
F Wilcoxon
FJ Rodriguez
G George
G Li
G Stephanopoulos
G Xu
G Yan
G Zhu
GEP Box
H Drias
H Hu
H Liu
H Shah
H Sharma
H Torres-Contreras
H Yu
H Zhang
HF Wedde
I Eksin
I Guyon
I Guyon
I Inza
J Hamidi
J Ji
J Kennedy
J Khan
J Kiefer
J Wang
J Xu
J-Q Li
JC Bansal
JC Bansal
JC Chang
JD Gibbons
JE Staunton
JG Zhang
JH Cho
JJ Howard
JJ Liu
JL Deneubourg
Johra Muhammad Moosa
JW Lee
L Breiman
L Deng
L Lan
L Li
L Wang
LW Jacobs
LY Chuang
LY Chuang
LY Chuang
LY Chuang
M Bollazzi
M Dorigo
M Hollander
M Kefayat
M Mohamad
M Pirooznia
M Schena
MA Shipp
MA Tahir
MH Kashan
MJ Greene
Mohammad Kaykobad
Mohammad Sohel Rahman
MS Mohamad
MS Mohamad
MS Mohamad
N Todorovic
OK Erol
P Mukherjee
PA Devijver
PE Lønning
PW TSai
PY Kumbhar
Q Shen
Q Zhou
QK Pan
QK Pan
R Akbari
R Cai
R Debnath
R Díaz-Uriarte
R Hooke
R Kohavi
R Kohavi
R Mallika
R Murugan
R Ruiz
Rameen Shakur
RJ Schafer
RN Khushaba
S Bicciato
S Bitam
S Dudoit
S Guo
S Knudsen
S Kumar
S Kumar
S Li
S Omkar
S Pavlidis
S Ramaswamy
S Siegel
S Sundar
S Wang
S Yang
SA Armstrong
SL Pomeroy
SL Wang
SP Fodor
SS Jadon
SS Jeffrey
T Davidović
T Li
T Stützle
TK Sharma
TM Cover
TR Golub
TS Furey
V Saravanan
V Tereshko
V Tereshko
V Tereshko
VN Vapnik
W Li
W Li
W Szeto
W-F Gao
WH Au
WH Kruskal
WH Press
X Wang
X Yan
X Yu
X Zhou
Y Leung
Y Lu
Y Saeys
Y Tan
Y Wang
Y Wang
Y Xu
Y Zhang
Y Zhang
Z Liu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref