Search CORE

arXiv.org e-Print Archive

Kernel methods in genomics and computational biology

Author: Vert Jean-Philippe
Publication venue
Publication date: 17/10/2005
Field of study

Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

HAL-MINES ParisTech

Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers

Author: Adamec Jiri
Biegert Greyson
Helikar Tomáš
Mohammed Akram
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 21/09/2017
Field of study

Machine learning techniques for cancer prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. Recent “OMICS” studies which include a variety of cancer and normal tissue samples along with machine learning approaches have the potential to further accelerate such discovery. To demonstrate this potential, 2,175 gene expression samples from nine tissue types were obtained to identify gene sets whose expression is characteristic of each cancer class. Using random forests classification and ten-fold cross-validation, we developed nine single-tissue classifiers, two multi-tissue cancer-versus-normal classifiers, and one multi-tissue normal classifier. Given a sample of a specified tissue type, the single-tissue models classified samples as cancer or normal with a testing accuracy between 85.29% and 100%. Given a sample of non-specific tissue type, the multitissue bi-class model classified the sample as cancer versus normal with a testing accuracy of 97.89%. Given a sample of non-specific tissue type, the multi-tissue multiclass model classified the sample as cancer versus normal and as a specific tissue type with a testing accuracy of 97.43%. Given a normal sample of any of the nine tissue types, the multi-tissue normal model classified the sample as a particular tissue type with a testing accuracy of 97.35%. The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissuespecific tumor development. This study demonstrates the feasibility of predicting the tissue origin of carcinoma in the context of multiple cancer classes

DigitalCommons@University of Nebraska

An algorithm for classifying tumors based on genomic aberrations and selecting representative tumor models

Author: AA Alizadeh
AF Gazdar
B Fisher
Charles Van Sant
CL Vogel
D Hanahan
DD Lee
DD Lee
Dimitri Semizarov
DJ Slamon
DJ Slamon
DR Carrasco
EA Maher
F Cappuzzo
G Hodgson
G Schwarz
J Lapointe
J Schneiderman
JA Hartigan
John Coon
JP Brunet
K Jong
Ke Zhang
L Breiman
L Chin
LJ van 't Veer
M Baudis
M Harris
MA Shipp
P Laurent-Puig
P Paatero
Q Wang
RR Sokal
S Myllykangas
S Myllykangas
SJ Cleator
T Takano
TR Golub
X Zhao
Xin Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Cancer is a heterogeneous disease caused by genomic aberrations and characterized by significant variability in clinical outcomes and response to therapies. Several subtypes of common cancers have been identified based on alterations of individual cancer genes, such as HER2, EGFR, and others. However, cancer is a complex disease driven by the interaction of multiple genes, so the copy number status of individual genes is not sufficient to define cancer subtypes and predict responses to treatments. A classification based on genome-wide copy number patterns would be better suited for this purpose. Method To develop a more comprehensive cancer taxonomy based on genome-wide patterns of copy number abnormalities, we designed an unsupervised classification algorithm that identifies genomic subgroups of tumors. This algorithm is based on a modified genomic Non-negative Matrix Factorization (gNMF) algorithm and includes several additional components, namely a pilot hierarchical clustering procedure to determine the number of clusters, a multiple random initiation scheme, a new stop criterion for the core gNMF, as well as a 10-fold cross-validation stability test for quality assessment. Result We applied our algorithm to identify genomic subgroups of three major cancer types: non-small cell lung carcinoma (NSCLC), colorectal cancer (CRC), and malignant melanoma. High-density SNP array datasets for patient tumors and established cell lines were used to define genomic subclasses of the diseases and identify cell lines representative of each genomic subtype. The algorithm was compared with several traditional clustering methods and showed improved performance. To validate our genomic taxonomy of NSCLC, we correlated the genomic classification with disease outcomes. Overall survival time and time to recurrence were shown to differ significantly between the genomic subtypes. Conclusions We developed an algorithm for cancer classification based on genome-wide patterns of copy number aberrations and demonstrated its superiority to existing clustering methods. The algorithm was applied to define genomic subgroups of three cancer types and identify cell lines representative of these subgroups. Our data enabled the assembly of representative cell line panels for testing drug candidates.</p

Springer - Publisher Connector

Springer - Publisher Connector

Genetic alteration and gene expression modulation during cancer progression

Author: Buys Timon PH
Garnis Cathie
Lam Wan L
Publication venue: BioMed Central
Publication date: 01/03/2004
Field of study

Cancer progresses through a series of histopathological stages. Progression is thought to be driven by the accumulation of genetic alterations and consequently gene expression pattern changes. The identification of genes and pathways involved will not only enhance our understanding of the biology of this process, it will also provide new targets for early diagnosis and facilitate treatment design. Genomic approaches have proven to be effective in detecting chromosomal alterations and identifying genes disrupted in cancer. Gene expression profiling has led to the subclassification of tumors. In this article, we will describe the current technologies used in cancer gene discovery, the model systems used to validate the significance of the genes and pathways, and some of the genes and pathways implicated in the progression of preneoplastic and early stage cancer

The molecular basis of lung cancer: molecular abnormalities and therapeutic implications

Author: Carbone David P
Massion Pierre P
Publication venue: BioMed Central
Publication date: 01/10/2003
Field of study

Lung cancer is the number one cause of cancer-related death in the western world. Its incidence is highly correlated with cigarette smoking, and about 10% of long-term smokers will eventually be diagnosed with lung cancer, underscoring the need for strengthened anti-tobacco policies. Among the 10% of patients who develop lung cancer without a smoking history, the environmental or inherited causes of lung cancer are usually unclear. There is no validated screening method for lung cancer even in high-risk populations and the overall five-year survival has not changed significantly in the last 20 years. However, major progress has been made in the understanding of the disease and we are beginning to see this knowledge translated into the clinic. In this review, we will summarize the current state of knowledge regarding the cascade of events associated with lung cancer development. From subclinical DNA damage to overt invasive disease, the mechanisms leading to clinically and molecularly heterogeneous tumors are being unraveled. These lesions allow cells to escape the normal regulation of cell division, apoptosis and invasion. While all subtypes of non-small cell lung cancer have historically been treated the same, stage-for-stage, recent technological advances have allowed a better understanding of the molecular classification of the disease and provide hypotheses for molecular early detection and targeted therapeutic strategies

Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential

Author: A Berchuck
A Colaprico
A Dobrovic
A Muniategui
A Oulas
A Schumacher
A Sewer
A Sharma
A Subramanian
AB Poplawski
AE Pasquinelli
AG Knudson
AJ Lowery
AL Smith
AM Cleton-Jansen
AM Gonzalez-Angulo
AM Gonzalez‐Angulo
AP Bird
AP Feinberg
AP Kumar
AP Trapé
AS Knoop
AV Ivshina
B Futcher
B Liu
B Orsetti
B Phipson
B Vogelstein
B Zhang
BG Masayesva
BN Hannafon
BS Wittner
BZ Ring
C Alkan
C Ambroise
C Blenkiron
C Cava
C Cava
C Cava
C Corzo
C Costa
C Desmedt
C Mayr
C Previti
C Ragan
C Rodriguez
C Soneson
C Sotiriou
C Wang
C Wang
C Wang
C Xue
Cancer Genome Atlas Network
CB Kingsley
CD Mayer
CJ Sherr
CJ Vaske
CK Zoon
Claudia Cava
CM Marson
D Beck
D Carling
D Chen
D Hanahan
D Li
D Lipson
D Luo
D Madhavan
D Madhavan
D Malkin
D Samantarrai
D Subramaniam
D Tsafrir
D Xu
DD Taylor
DE Hallahan
DJ Gordon
DJ Slamon
DP Bartel
DP Pandey
DR Hurst
DW Thomson
E Berezikov
E Dudziec
E Hervouet
E Hyman
E O'Day
E Rinaldis de
EC Lai
EC Lai
EC Robanus-Maandag
EJ Faivre
ER Fearon
F Andre
F Eckhardt
F Holst
F Mar-Aguilar
F Meng
F Meric-Bernstam
F Mohn
F Wessely
F Wu
F Xiao
F Yu
F Yu
FC Stingo
FP O'Malley
G Bertoli
G Imataka
G Maire
G Sales
G Song
G Terai
G Viale
GA Calin
GA Calin
GK Scott
Gloria Bertoli
H Bengtsson
H Dvinge
H Konishi
H Lee
H Liu
H Nagai
H Park
H Si
H Solvang
H Wang
H Wu
HJ Peltier
HM Muller
HS Eo
I Ali
I Auwera Van der
I Bentwich
I Bentwich
I Gonzalez
IL Hofacker
IS Oh
Isabella Castiglioni
J Allmer
J Baselga
J Baselga
J Fullgrabe
J Hertel
J Huang
J Nie
J Pollack
J Staaf
J Xu
J Yu
J Yun
J Zhang
JA Berger
JA Nielsen
JB Patel
JB Weidhaas
JC Alwine
JC Engelmann
JC Huang
JD Pollock
JE Eckel-Passow
JG Paez
JJ Goeman
JJ Goeman
JL Phillips
JM Bartlett
JM Bueno-de-Mesquita
JM Bueno-de-Mesquita
JM Korn
JS Parker
JT Bell
JW Nam
JZ Xu
K Chin
K Lundgren
K Polyak
K Salari
K-C Chen
KJ Png
KL Ng
KW Tsai
L Cascione
L Chin
L He
L Li
L Li
L Lu
L Ma
L Ma
L Yu
L Zhang
L Zhong
LF Sempere
LJ van 't Veer
LP Lim
LX Yan
LY Chuang
LY Chuang
M Bhasin
M Billam
M Buyse
M Chen
M Chimonidou
M Ehrlich
M Inomata
M Korpal
M Lindow
M Negrini
M Ortiz-Estevez
M Sachdeva
M Salman
M Schäfer
M Szyf
M Taniguchi
M Tanner
M Wanderley
M Wolf
M Yousef
M Yousef
M Zhang
M Zhou
MA Taylor
MA Valasek
MA Wiel van de
MC Pouliot
MD Edmonds
MD Mattie
ME Thompson
MJ Aryee
MJ Lodes
MJ Vijver van de
ML Si
MM Desouki
MR Aure
MS Stark
MV Iorio
MV Iorio
N Dias
N Huang
N Rosenfeld
N Srivastava
O Alter
O Bornachea
O Kan
O Monni
P Bertheau
P Du
P Jafari
P Medvedev
P Rizzolo
P Souza Rocha Simonini de
PA Gregory
PA Jones
PH Westfall
PM Neilsen
PN Munster
PS Yan
Q Li
Q Wu
QQ Li
R Battiti
R Beroukhim
R Kodzius
R Louhimo
R Menezes
R Nogales-Cadenas
R Pinto
R Radpour
R Shen
RA Veitia
RC Lee
RC Thompson
RC Zeng
RJ Webster
RS Gitan
RT Barfield
S Aulmann
S Aulmann
S Bicciato
S Brenner
S Cho
S Fan
S Kadri
S Paik
S Paik
S Sarkar
S Streit
S Tsutsui
S Valastyan
S Valastyan
S Valastyan
S Vasudevan
S Volinia
S Volinia
S-D Hsu
SA Bustin
SA Leon
SD Reddy
SD Reddy
SF Chin
SF Tavazoie
SM Hammond
T Dhingra
T Oskarsson
TA Farazi
TA Harris
TH Huang
TH Huang
TR Golub
U Hamann
UD Akavia
V Birgisdottir
V Bolón-Canedo
V Jayaswal
V Stearns
V Tarasov
VA Gennarino
VE Velculescu
VG Tusher
VK Mootha
VN Kristensen
VN Kristensen
W Chen
W Chen
W Li
W Ritchie
W Ritchie
W Scheuer
W Walther
W Zhang
WL Tam
WN Wieringen
X Gai
X Li
X Liu
X Zhao
X Zhao
XF Li
Y Assenov
Y Grad
Y Huang
Y Li
Y Nannya
Y Saeys
Y Sun
Y Wang
Y Zhang
YH Hsiao
Z Herceg
Z Hu
Z Li
Z Wang
Z Yu
Z Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data

Author: Deng Youping
Devanarayan Viswanath
Donald Sens
Xie Linglin
Yang Yi
Zhang Ke
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The recent advancement in array CGH (aCGH) research has significantly improved tumor identification using DNA copy number data. A number of unsupervised learning methods have been proposed for clustering aCGH samples. Two of the major challenges for developing aCGH sample clustering are the high spatial correlation between aCGH markers and the low computing efficiency. A mixture hidden Markov model based algorithm was developed to address these two challenges. Results The hidden Markov model (HMM) was used to model the spatial correlation between aCGH markers. A fast clustering algorithm was implemented and real data analysis on glioma aCGH data has shown that it converges to the optimal cluster rapidly and the computation time is proportional to the sample size. Simulation results showed that this HMM based clustering (HMMC) method has a substantially lower error rate than NMF clustering. The HMMC results for glioma data were significantly associated with clinical outcomes. Conclusions We have developed a fast clustering algorithm to identify tumor subtypes based on DNA copy number aberrations. The performance of the proposed HMMC method has been evaluated using both simulated and real aCGH data. The software for HMMC in both R and C++ is available in ND INBRE website <url>http://ndinbre.org/programs/bioinformatics.php.</url></p

Springer - Publisher Connector