Search CORE

Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment

Author: A Buness
A Subramanian
AL Boulesteix
Annalisa Barla
B Di Camillo
B Di Camillo
Barbara Di Camillo
C Ambroise
C Desmedt
C Furlanello
C Furlanello
C Sotiriou
CA Davis
Cesare Furlanello
Claudio Cobelli
D Cai
DS Oh
Francesco Sambo
G Jurman
G Jurman
G Jurman
Gianna Toffolo
Giuseppe Jurman
HY Chuang
I Guyon
JE Larkin
Jo-Ann L. Stanton
JP Ioannidis
L Ein-Dor
L Ein-Dor
L Shi
LD Miller
M Zucknick
Margherita Squillario
Matteo Martini
ML Siegal
P Baldi
RA Irizarry
RA Irizarry
S Riccadonna
SY Kim
T Abeel
Tiziana Sanavia
VG Tusher
VK Mootha
VN Vapnik
X Solé
Y Benjamini
Y Sun
Z He
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. METHODS: We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. RESULTS: The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results

Archivio della ricerca - Fondazione Bruno Kessler

Archivio istituzionale della ricerca - Università di Genova

Archivio istituzionale della ricerca - Università di Padova

Institutional Research Information System University of Turin

FigShare

Physical properties of naked DNA influence nucleosome positioning and correlate with transcription start and termination sites in yeast

Abstract Background In eukaryotic organisms, DNA is packaged into chromatin structure, where most of DNA is wrapped into nucleosomes. DNA compaction and nucleosome positioning have clear functional implications, since they modulate the accessibility of genomic regions to regulatory proteins. Despite the intensive research effort focused in this area, the rules defining nucleosome positioning and the location of DNA regulatory regions still remain elusive. Results Naked (histone-free) and nucleosomal DNA from yeast were digested by microccocal nuclease (MNase) and sequenced genome-wide. MNase cutting preferences were determined for both naked and nucleosomal DNAs. Integration of their sequencing profiles with DNA conformational descriptors derived from atomistic molecular dynamic simulations enabled us to extract the physical properties of DNA on a genomic scale and to correlate them with chromatin structure and gene regulation. The local structure of DNA around regulatory regions was found to be unusually flexible and to display a unique pattern of nucleosome positioning. Ab initio physical descriptors derived from molecular dynamics were used to develop a computational method that accurately predicts nucleosome enriched and depleted regions. Conclusions Our experimental and computational analyses jointly demonstrate a clear correlation between sequence-dependent physical properties of naked DNA and regulatory signals in the chromatin structure. These results demonstrate that nucleosome positioning around TSS (Transcription Start Site) and TTS (Transcription Termination Site) (at least in yeast) is strongly dependent on DNA physical properties, which can define a basal regulatory mechanism of gene expression

Springer - Publisher Connector

Springer

eScholarship - University of California

Comparative and Functional Genomics of Rhodococcus opacus PD630 for Biofuels Development

Author: A Arakaki
A Argyrou
A Marchler-Bauer
A Pohlmann
A Stamatakis
AF Alvarez
AI Saeed
AJ Enright
AK Pandey
AL Delcher
AL Delcher
Alex L. B. Leach
AM Waterhouse
Anthony C. DeBono
Anthony J. Sinskey
AR Horswill
Brian Desany
Bruce W. Birren
C Kaddor
Chinnappa D. Kodira
Christine Dancel
Christopher A. Desjardins
D Jendrossek
D Portevin
D Post
DE Vance
Dirk Gevers
DL Rainwater
DL Rainwater
DP MacEachran
E Puglisi
E Schwartz
E Schweizer
E Severi
E Severi
E Vimr
ER Goncalves
F Abascal
F David
F-F Hsu
G Timmins
HM Alvarez
HM Alvarez
HM Alvarez
I Letunic
I Matsunaga
IB Lomakin
IC Sutcliffe
Ion Ghiviriga
J Hughes
J Rengarajan
Jason P. Affourtit
Jason W. Holder
Jeremy Zucker
Jil C. Ulrich
JM Mathieu
K Isono
K Katoh
K Kurosawa
K Lagesen
K Raman
KC Yam
KR Robrock
L Diacovich
L Li
M Brudno
M Green
M Hernandez
M Seto
M Wu
MA Larkin
MJ de Hoon
MP Mansour
MP McLeod
O Lenz
O Zimhony
OP Peoples
PA Lessard
Paul A. Godfrey
Paul M. Richardson
PD Karp
PR Romero
Qiandong Zeng
R Edgar
R Gande
R Gande
R Kalscheuer
R Van der Geize
RD Finn
RL Hunter
S Griffiths-Jones
S Guindon
S Kikuchi
S Rajakumari
SC Slater
SK Parker
T Chopra
T Lee
T Sirakova
TD Sirakova
TD Sirakova
Thomas Abeel
TM Lowe
U Grafe
X Yang
Y Hu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

The Actinomycetales bacteria Rhodococcus opacus PD630 and Rhodococcus jostii RHA1 bioconvert a diverse range of organic substrates through lipid biosynthesis into large quantities of energy-rich triacylglycerols (TAGs). To describe the genetic basis of the Rhodococcus oleaginous metabolism, we sequenced and performed comparative analysis of the 9.27 Mb R. opacus PD630 genome. Metabolic-reconstruction assigned 2017 enzymatic reactions to the 8632 R. opacus PD630 genes we identified. Of these, 261 genes were implicated in the R. opacus PD630 TAGs cycle by metabolic reconstruction and gene family analysis. Rhodococcus synthesizes uncommon straight-chain odd-carbon fatty acids in high abundance and stores them as TAGs. We have identified these to be pentadecanoic, heptadecanoic, and cis-heptadecenoic acids. To identify bioconversion pathways, we screened R. opacus PD630, R. jostii RHA1, Ralstonia eutropha H16, and C. glutamicum 13032 for growth on 190 compounds. The results of the catabolic screen, phylogenetic analysis of the TAGs cycle enzymes, and metabolic product characterizations were integrated into a working model of prokaryotic oleaginy.Cambridge-MIT InstituteMassachusetts Institute of Technology. (Seed Grant program)Shell Oil CompanyNational Institute of Allergy and Infectious Diseases (U.S.)United States. National Institutes of HealthNational Institutes of Health. Department of Health and Human Services (Contract No. HHSN272200900006C

CiteSeerX

DSpace@MIT

Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer

Next-generation sequencing (NGS) is routinely applied in life sciences and clinical practice, but interpretation of the massive quantities of genomic data produced has become a critical challenge. The genome-wide mutation analyses enabled by NGS have had a revolutionary impact in revealing the predisposing and driving DNA alterations behind a multitude of disorders. The workflow to identify causative mutations from NGS data, for example in cancer and rare diseases, commonly involves phases such as quality filtering, case-control comparison, genome annotation, and visual validation, which require multiple processing steps and usage of various tools and scripts. To this end, we have introduced an interactive and user-friendly multi-platform-compatible software, BasePlayer, which allows scientists, regardless of bioinformatics training, to carry out variant analysis in disease genetics settings. A genome-wide scan of regulatory regions for mutation clusters can be carried out with a desktop computer in -10 min with a dataset of 3 million somatic variants in 200 whole-genome-sequenced (WGS) cancers.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Identification of long non-coding transcripts with feature selection: a comparative study

Author: A Li
A Pauli
A Siepel
A-CC Haury
AA Bazzini
AA Bazzini
AJ de Koning
AM Michel
Antonietta Spagnuolo
B Panwar
C Trapnell
D Chalopin
DW Chung
F Musacchia
G Menardi
Giovanna M. M. Ventola
H Glover
H Shimodaira
H Zou
I Guyon
I Ulitsky
J Davis
J Lv
J Piriyapongsa
J Ponjavic
J Ruiz-Orera
JTY Kung
JW Fickett
JW Fickett
K Boyd
K Kaushik
K Sobczak
K Sun
K Sun
K Zhang
KS Pollard
L Flintoft
L Kong
L Ma
L Wang
LA Pray
LG Wilming
Luigi Cerulo
M Guttman
M Kozak
M Kozak
M Muñoz-López
MF Lin
Michele Ceccarelli
MN Cabili
N Ingolia
N Meinshausen
N Sela
P Carninci
R Johnson
R Tibshirani
S Diederichs
S Russell
S Washietl
Salvatore D’Aniello
SJ Grzegorski
SR Wessler
T Abeel
T Derrien
T Hubbard
T Li
Teresa M. R. Noviello
TR Mercer
UA Orom
XN Fan
Y Saeys
Z Ji
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Evolution of extensively drug-resistant tuberculosis over four decades: whole genome sequencing and dating analysis of Mycobacterium tuberculosis isolates from KwaZulu-Natal.

Author: A Cattamanchi
A Roetzer
A Stamatakis
Abigail Manson McGuire
AJ Drummond
Alexander S. Pym
Ashlee M. Earl
B Müller
BJ Walker
Bruce J. Walker
Bruce W. Birren
C Demay
CB Ford
CB Ford
CC Boehme
Christopher A. Desjardins
Clinton Howarth
CM Denkinger
CRE McEvoy
Deepak V. Almeida
DR Sherman
E Bonnet
Eamon Y. Duffy
F Coll
FM Cohan
GJ Churchyard
H Li
H Safi
H Zhang
HL Mills
I Comas
J Guerra-Assunção
Jacques Grosset
Jeffrey D. Larimer
Jennifer Wortman
JM Bryant
John Z Metcalfe
JW Inman
K Wallengren
Kashmeel Maharaj
Keira A. Cohen
Koleka P. Mlisana
Lucia Alvarado
M Coscolla
M De Vos
M Ester
M Jassal
M Merker
M Merker
M Pillay
Margaret E. Priest
Matthew D. Pearson
Max R. O'Donnell
MD Iseman
MG Reynolds
MH Larsen
Michael G. Fitzgerald
MR Farhat
MR O’Donnell
MR O’Donnell
N Bantubani
N Casali
Nesri Padayatchi
Nomonde R. Mvelase
Nonkqubela Bantubani
NR Dlamini-Mvelase
NR Gandhi
NR Gandhi
NR Gandhi
P Moodley
Pamla Govender
Qiandong Zeng
R Hershberg
R Koenig
S Gagneux
S Gagneux
Sarah K. Young
Sharvari Gujja
Sinéad B. Chapman
SJ Schrag
Susanna Hamilton
T Cohen
T Song
Terrance P. Shea
Thomas Abeel
TM Walker
TM Walker
TR Ioerger
V Eldholm
Vanisha Munsamy
VN Chihota
William R. Bishai
Z Ma
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

CAPRISA, 2015.Abstract available in pdf

Harvard University - DASH

ResearchSpace@UKZN

FigShare

Extreme genome diversity in the hyper-prevalent parasitic eukaryote Blastocystis

Author: A Alinaghizade
A Bateman
A Dobin
A Krogh
A Marchler-Bauer
A Schlacht
A Seyer
A Stamatakis
A Stechmann
AC Smith
Alexander Schlacht
Anastasios D. Tsaousis
Andrew J. Roger
Arthur W. Pightling
AS Jacob
B Chevreux
B Eisenhaber
B Langmead
Bernard Henrissat
BJ Haas
BL Cantarel
Bruce A. Curtis
C Audebert
C Aurrecoechea
C Noel
C Nourrisson
C Nourrisson
C Rotte
C. Graham Clark
CC Caswell
CH Zierdt
CL Will
CM Klinger
Courtney W. Stairs
CR Stensvold
CW Stairs
D Bhattacharya
D Morse
D Posada
D Sirim
Darren M. Soanes
Dayana E. Salas-Leiva
DC Rio
DR Nelson
E Elhaik
E Hamann
E Roman
E Shoguchi
EL Jarroll
Eleni Gentekaki
Emily K. Herman
EP Nawrocki
EV Armbrust
F Brossier
F Denoeud
F Suomi
FA Simao
FJ Rendon-Gandarilla
FM Tonelli
Frédérique Hilliou
G Manning
G Manning
G Michel
G Vlahou
GK Paterson
H Hu
H Li
H Yoshikawa
H Yoshikawa
Hiroshi Suga
I Letunic
I Wawrzyniak
I Wawrzyniak
I Wawrzyniak
IB Rogozin
IJ Anderson
J Castresana
J Jerlstrom-Hultqvist
J.J. Doyle
JC Mottram
JG Ellis 4th
JJ Turunen
JM Carlton
Joel B. Dacks
John M. Archibald
Joseph Heitman
JS Bonifacino
JW Harper
K Katoh
K Kremer
K Nito
L Chang
L Eme
L Eme
L Eme
L Kall
LA Baxt
LA Dunn
Laura Eme
LO Andersen
M Boetzer
M Elias
M Fischer
M Klemba
M Muller
M Rossi
M Sajid
M Stanke
MA Alfellani
Marek Eliáš
Maria C. Arias
Mark van der Giezen
Martin Kolisko
Mary J. Klute
MB Rogers
MD Lanuza
ME Hodges
MG Claros
MG Grabherr
MJ Paul
MK Puthia
MK Puthia
ML Philippi
MN Price
N Kienle
N Lartillot
ND Rawlings
NH Gonzalez
O Emanuelsson
P Horton
P Sylvestre
PD Scanlan
PD Scanlan
PJ Keeling
PR Crocker
R Feyereisen
R Nagel
R Wisedpanichkij
R Zhang
RC Edgar
Richard A. Rachubinski
RJ Perry
RP Baker
RT Mohamed
S Ball
S Boisvert
S Guindon
S Muller
S Sivakumar
S Tempel
S Yamaoka
SB Hua
SB Malik
SF Altschul
Shehre-Banoo Malik
SQ Le
SR Eddy
SR Eddy
SR Eddy
SS Ajjampur
ST Furlong
Steven G. Ball
T Abeel
T Gabaldon
T Gabaldon
T Robert
T Roberts
T Schwartz
TJ Treangen
TJ Wheeler
V Klimes
V Perez-Brocal
V Zaman
V Zaman
V Zaman
Vladimír Klimeš
WL Chuang
XQ Chen
Y Lantsman
Z Wang
Z Wu
Z Yu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

Blastocystis is the most prevalent eukaryotic microbe colonizing the human gut, infecting approximately 1 billion individuals worldwide. Although Blastocystis has been linked to intestinal disorders, its pathogenicity remains controversial because most carriers are asymptomatic. Here, the genome sequence of Blastocystis subtype (ST) 1 is presented and compared to previously published sequences for ST4 and ST7. Despite a conserved core of genes, there is unexpected diversity between these STs in terms of their genome sizes, guanine-cytosine (GC) content, intron numbers, and gene content. ST1 has 6,544 protein-coding genes, which is several hundred more than reported for ST4 and ST7. The percentage of proteins unique to each ST ranges from 6.2% to 20.5%, greatly exceeding the differences observed within parasite genera. Orthologous proteins also display extreme divergence in amino acid sequence identity between STs (i.e., 59%–61%median identity), on par with observations of the most distantly related species pairs of parasite genera. The STs also display substantial variation in gene family distributions and sizes, especially for protein kinase and protease gene families, which could reflect differences in virulence. It remains to be seen to what extent these inter-ST differences persist at the intra-ST level. A full 26% of genes in ST1 have stop codons that are created on the mRNA level by a novel polyadenylation mechanism found only in Blastocystis. Reconstructions of pathways and organellar systems revealed that ST1 has a relatively complete membrane-trafficking system and a near-complete meiotic toolkit, possibly indicating a sexual cycle. Unlike some intestinal protistan parasites, Blastocystis ST1 has near-complete de novo pyrimidine, purine, and thiamine biosynthesis pathways and is unique amongst studied stramenopiles in being able to metabolize ?-glucans rather than ?-glucans. It lacks all genes encoding heme-containing cytochrome P450 proteins. Predictions of the mitochondrion-related organelle (MRO) proteome reveal an expanded repertoire of functions, including lipid, cofactor, and vitamin biosynthesis, as well as proteins that may be involved in regulating mitochondrial morphology and MRO/endoplasmic reticulum (ER) interactions. In sharp contrast, genes for peroxisome-associated functions are absent, suggesting Blastocystis STs lack this organelle. Overall, this study provides an important window into the biology of Blastocystis, showcasing significant differences between STs that can guide future experimental investigations into differences in their virulence and clarifying the roles of these organisms in gut health and disease

Lund University Publications

LSHTM Research Online

HAL AMU