Search CORE

MidA is a putative methyltransferase that is required for mitochondrial complex I function

Author: B. Roidl
B. Roidl
B. Roidl
C.W. Rowley
D. Deprés
D. Saile
J. Boris
J. Forsythe
J. Janssen
L.R. Rollstin
M. Bitter
M. Bitter
M. Meinke
M. Pamiès
M.S. Liou
N. Alkishriwi
N. Jarrin
P. Meliga
P.J. Schmid
R. Benay
R. Sandberg
R. Sandberg
T. Mathur
V. Statnikov
V. Statnikov
W. Bannink
Publication venue: Company of Biologists
Publication date: 01/01/2010
Field of study

10 páginas, 6 figuras.-- et al.Dictyostelium and human MidA are homologous proteins that belong to a family of proteins of unknown function called DUF185. Using yeast two-hybrid screening and pull-down experiments, we showed that both proteins interact with the mitochondrial complex I subunit NDUFS2. Consistent with this, Dictyostelium cells lacking MidA showed a specific defect in complex I activity, and knockdown of human MidA in HEK293T cells resulted in reduced levels of assembled complex I. These results indicate a role for MidA in complex I assembly or stability. A structural bioinformatics analysis suggested the presence of a methyltransferase domain; this was further supported by site-directed mutagenesis of specific residues from the putative catalytic site. Interestingly, this complex I deficiency in a Dictyostelium midA- mutant causes a complex phenotypic outcome, which includes phototaxis and thermotaxis defects. We found that these aspects of the phenotype are mediated by a chronic activation of AMPK, revealing a possible role of AMPK signaling in complex I cytopathology.This work was supported by grants BMC2006-00394 and BMC2009-09050 to R.E. from the Spanish Ministerio de Ciencia e Innovación; to P.R.F. from the Thyne Reid Memorial Trusts and the Australian Research Council; to A.V. and O.G. from the Spanish National Bioinformatics Institute (www.inab.org), a platform of Genome Spain; to R.G. from the Fondo de Investigaciones Sanitarias, Instituto de Salud Carlos III, Spain (PI070167) and from the Comunidad de Madrid (GEN-0269/2006). S.C. is supported by a research contract from Consejería de Educación de la Comunidad de Madrid y del Fondo Social Europeo (FSE).Peer Reviewe

Public Library of Science (PLOS)

Digital.CSIC

Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data

Author: A Bhattacharjee
A Butte
A Dupuy
A Potti
A Rosenwald
A Statnikov
A Statnikov
A Statnikov
Alexander Statnikov
AM Glas
B Freidlin
Bryan E. Shepherd
CF Aliferis
Constantin F. Aliferis
CX Ling
DG Beer
DJ Hand
EJ Yeoh
EL Lehmann
FE Harrell Jr
Frank E. Harrell
G Casella
Ioannis Tsamardinos
JA Sparano
Jonathan S. Schildcrout
JP Ioannidis
KK Dobbin
KK Dobbin
L Ein-Dor
L Shi
LA Habel
LJ van't Veer
M Saerens
MD Radmacher
ME Burczynski
MJ Marton
ML Lee
N Iizuka
P Baldi
PI Good
R Kohavi
R Simon
RE Fan
S Michiels
S Mukherjee
S Paik
S Paik
S Ramaswamy
SL Pomeroy
T Bammler
T Hastie
TR Golub
TS Furey
UM Braga-Neto
Vladimir B. Bajic
VN Vapnik
W Jiang
Publication venue: Public Library of Science
Publication date: 17/03/2009
Field of study

Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate tests of differential expression, little is known about the factors that affect the statistical power of complex multivariate analysis protocols for high-dimensional molecular signature development.We show that choices of specific components of the analysis (i.e., error metric, classifier, error estimator and event balancing) have large and compounding effects on statistical power. The effects are demonstrated empirically by an analysis of 7 of the largest microarray cancer outcome prediction datasets and supplementary simulations, and by contrasting them to prior analyses of the same data.THE FINDINGS OF THE PRESENT STUDY HAVE TWO IMPORTANT PRACTICAL IMPLICATIONS: First, high-throughput studies by avoiding under-powered data analysis protocols, can achieve substantial economies in sample required to demonstrate statistical significance of predictive signal. Factors that affect power are identified and studied. Much less sample than previously thought may be sufficient for exploratory studies as long as these factors are taken into consideration when designing and executing the analysis. Second, previous highly-cited claims that microarray assays may not be able to predict disease outcomes better than chance are shown by our experiments to be due to under-powered data analysis combined with inappropriate statistical tests

Bridging a translational gap: using machine learning to improve the prediction of PTSD

Author: A Statnikov
A Statnikov
A Statnikov
Alexander Statnikov
AP Bradley
Arieh Y Shalev
AY Shalev
AY Shalev
AY Shalev
AY Shalev
AY Shalev
AY Shalev
AY Shalev
B Kleim
BE Boser
C-C Chang
CJ Bryan
CR Brewin
CR Marmar
D Forbes
EB Binder
EB Foa
EB Foa
EJ Ozer
G Orrù
IR Galatzer-Levy
Isaac R Galatzer-Levy
J Difede
JA Boscarino
JA Haagsma
Karen-Inge Karstoft
KC Koenen
KC Koenen
L Breiman
N Breslau
OJ Bienvenu
RA Bryant
RA Bryant
RC Kessler
RC Kessler
RH Segman
RS Lazarus
S Visweswaran
SA Freedman
SB Norman
TA Mellman
W Guy
Zhiguo Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Multiplicity: an organizing principle for cancers and somatic mutations

Author: A de la Chapelle
A Statnikov
B Markman
B Vogelstein
C Greenman
C-H Yeang
CJ Sherr
ER Fearon
FE Bleeker
FS Collins
G Tabatabai
H Ledford
JH Ward
JM Hall
JW Arends
K Berns
K Naruse
KW Kinzler
LD Wood
Lewis J Frey
M Kanehisa
M-d-M Inda
Mary E Edgerton
P Topcu-Yilmaz
PA Futreal
R Reiter
R Wooster
S Forbes
S Ortega
SB Edge
SR Piccolo
Stephen R Piccolo
T Kamada
TJ Hudson
WD Nooy
Wellcome Trust Sanger Institute
Y Liang
Y Sun
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background With the advent of whole-genome analysis for profiling tumor tissue, a pressing need has emerged for principled methods of organizing the large amounts of resulting genomic information. We propose the concept of multiplicity measures on cancer and gene networks to organize the information in a clinically meaningful manner. Multiplicity applied in this context extends Fearon and Vogelstein's multi-hit genetic model of colorectal carcinoma across multiple cancers. Methods Using the Catalogue of Somatic Mutations in Cancer (COSMIC), we construct networks of interacting cancers and genes. Multiplicity is calculated by evaluating the number of cancers and genes linked by the measurement of a somatic mutation. The Kamada-Kawai algorithm is used to find a two-dimensional minimum energy solution with multiplicity as an input similarity measure. Cancers and genes are positioned in two dimensions according to this similarity. A third dimension is added to the network by assigning a maximal multiplicity to each cancer or gene. Hierarchical clustering within this three-dimensional network is used to identify similar clusters in somatic mutation patterns across cancer types. Results The clustering of genes in a three-dimensional network reveals a similarity in acquired mutations across different cancer types. Surprisingly, the clusters separate known causal mutations. The multiplicity clustering technique identifies a set of causal genes with an area under the ROC curve of 0.84 versus 0.57 when clustering on gene mutation rate alone. The cluster multiplicity value and number of causal genes are positively correlated via Spearman's Rank Order correlation (<it>rs</it>(8) = 0.894, Spearman's <it>t </it>= 17.48, <it>p </it>< 0.05). A clustering analysis of cancer types segregates different types of cancer. All blood tumors cluster together, and the cluster multiplicity values differ significantly (Kruskal-Wallis, <it>H </it>= 16.98, <it>df </it>= 2, <it>p </it>< 0.05). Conclusion We demonstrate the principle of multiplicity for organizing somatic mutations and cancers in clinically relevant clusters. These clusters of cancers and mutations provide representations that identify segregations of cancer and genes driving cancer progression.</p

Predictive integration of gene functional similarity and co-expression defines treatment response of endothelial progenitor cells

Author: A Alexeyenko
A Ceol
A Dasgupta
A Peled
A Siddique
A Statnikov
B Aranda
B Turner
BJ Oh
C Pesquita
C Urbich
D Barrell
D Lin
D Szklarczyk
D Warde-Farley
Daniel R Wagner
DC Kirouac
DH Walter
DW Huang
E Frank
E Novikov
E Schutyser
EC Keeley
F Azuaje
F Azuaje
F Azuaje
F Browne
Francisco J Azuaje
Frédérique Léonard
H Wang
H Wang
Haiying Wang
Huiru Zheng
HY Chuang
IW Taylor
J Chen
J De Sutter
J Hur
L Salwinski
L Salwinski
L Statnikov
Lu Zhang
M Gnecchi
Magali Rolland-Turner
MC Montesinos
N Bolshakova
P Shannon
PC Boutros
RJ Medina
S Rafii
S Ryzhov
VG Tusher
Y Chen
Yvan Devaux
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Endothelial progenitor cells (EPCs) have been implicated in different processes crucial to vasculature repair, which may offer the basis for new therapeutic strategies in cardiovascular disease. Despite advances facilitated by functional genomics, there is a lack of systems-level understanding of treatment response mechanisms of EPCs. In this research we aimed to characterize the EPCs response to adenosine (Ado), a cardioprotective factor, based on the systems-level integration of gene expression data and prior functional knowledge. Specifically, we set out to identify novel biosignatures of Ado-treatment response in EPCs. Results The predictive integration of gene expression data and standardized functional similarity information enabled us to identify new treatment response biosignatures. Gene expression data originated from Ado-treated and -untreated EPCs samples, and functional similarity was estimated with Gene Ontology (GO)-based similarity information. These information sources enabled us to implement and evaluate an integrated prediction approach based on the concept of <it>k</it>-nearest neighbours learning (<it>k</it>NN). The method can be executed by expert- and data-driven input queries to guide the search for biologically meaningful biosignatures. The resulting <it>integrated kNN </it>system identified new candidate EPC biosignatures that can offer high classification performance (areas under the operating characteristic curve > 0.8). We also showed that the proposed models can outperform those discovered by standard gene expression analysis. Furthermore, we report an initial independent <it>in vitro </it>experimental follow-up, which provides additional evidence of the potential validity of the top biosignature. Conclusion Response to Ado treatment in EPCs can be accurately characterized with a new method based on the combination of gene co-expression data and GO-based similarity information. It also exploits the incorporation of human expert-driven queries as a strategy to guide the automated search for candidate biosignatures. The proposed biosignature improves the systems-level characterization of EPCs. The new integrative predictive modeling approach can also be applied to other phenotype characterization or biomarker discovery problems.</p

Approaches to working in high-dimensional data spaces: gene expression microarrays

Author: A Dupuy
A Statnikov
AK Jain
B Efron
BJ Frey
C Lai
CF Aliferis
D J Miller
D Miller
DB Allison
DF Ransohoff
DF Ransohoff
EP Xing
GV Trunk
I Guyon
I Guyon
J Novovicova
J Wang
JA Swets
JD Storey
KA Shedden
KY Yeung
L Ein-Dor
MW Graham
R Clarke
R Clarke
RO Duda
S Ramaswamy
T Lange
TR Golub
VN Vapnik
Y Wang
Z Wang
Publication venue: Nature Publishing Group
Publication date
Field of study

This review provides a focused summary of the implications of high-dimensional data spaces produced by gene expression microarrays for building better models of cancer diagnosis, prognosis, and therapeutics. We identify the unique challenges posed by high dimensionality to highlight methodological problems and discuss recent methods in predictive classification, unsupervised subclass discovery, and marker identification

University of Essex Research Repository

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

Author: A Kikuchi
A Statnikov
A Ultsch
Andrew Harrison
Aris Perperoglou
Asma Gul
B Lausen
Berthold Lausen
C Cortes
C Ding
C Ma
C Müssel
C Zou
D Apiletti
D Apiletti
DA Notterman
DeAndresSA Díaz‐Uriarte R
DG Altman
E Baralis
GJ Gordon
H Peng
H‐C Liu
J Fan
J Fan
J Lu
K‐H Chen
L Breiman
L Breiman
L Lausser
M Dramiński
M Marczyk
Metodi V Metodiev
N De Jay
Osama Mahmoud
P Alhopuro
P Laiho
RN Jorissen
RS Croner
RS Croner
S Chiaretti
S Michiels
T Cover
T Jirapech‐Umpai
TR Golub
VG Tusher
W Talloen
Y Saeys
Y Su
Zardad Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.Results: We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.Conclusions: A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes

Explore Bristol Research

Small-Sample Error Estimation for Bagged Classification Rules

Author: A Assareh
A Bhattacharjee
A Statnikov
B Efron
B Efron
B Efron
B Wu
B Zhang
B-L Adam
EC Gunther
G Izmirlian
G Martínez-Muñoz
HJ Issaq
L Breiman
L Breiman
L Xu
LJ Van't Veer
MJ van de Vijver
P Geurts
R Díaz-Uriarte
RE Banfield
RE Schapire
RO Duda
S Alvarez
T Bylander
TT Vu
U Braga-Neto
U Braga-Neto
U Braga-Neto
UM Braga-Neto
W Tong
Y Freund
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2010
Field of study

OAKTrust Digital Repository (Texas A&M Univ)

A comparative analysis of predictive models of morbidity in intensive care unit after cardiac surgery – Part I: model planning

Author: A Agresti
A Azzalini
A Statnikov
AH Murphy
AJ Petros
AK Jain
B Biagioli
B Bridgewater
BG Tabachnick
Bonizella Biagioli
BW Silverman
CM Bishop
DA Harrison
DC Bamber
DL Reich
DM Shahian
DS Sivia
DW Hosmer
E Artioli
Emanuela Barbini
ER DeLong
EW Steyerberg
F Jaimes
FE Harrell Jr
FH Edwards
G Asimakopoulos
G Cevenini
G Marshall
G Marshall
GA Diamond
Gabriele Cevenini
GD Friedman
GK van Wermeskerken
GT O'Connor
HJ Geisser
HM Krumholz
J Ellenius
J Ivanov
JA Hanley
JH Schafer
JL Moran
JR Le Gall
JV Tu
K Fukunaga
K-Y Liang
MK Campbell
MW Knuiman
NA Obuchowski
O Pitkanen
P Armitage
P Itskowitz
P Schulman
Paolo Barbini
Pierpaolo Giomarelli
PM Lee
R Murphy-Filkins
RI Jennrich
RO Duda
RP Lippmann
RZ Omar
S Arya
S den Boer
S Dreiseitl
S Gangopadhyay
S Le Cessie
S Lemeshow
Sabino Scolletta
SJ Mason
TA Lasko
TA Ryan
TL Higgins
TL Higgins
TL Higgins
VN Vapnik
WA Knaus
WA Knaus
WJ Krzanowski
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Different methods have recently been proposed for predicting morbidity in intensive care units (ICU). The aim of the present study was to critically review a number of approaches for developing models capable of estimating the probability of morbidity in ICU after heart surgery. The study is divided into two parts. In this first part, popular models used to estimate the probability of class membership are grouped into distinct categories according to their underlying mathematical principles. Modelling techniques and intrinsic strengths and weaknesses of each model are analysed and discussed from a theoretical point of view, in consideration of clinical applications. Methods Models based on Bayes rule, <it>k-</it>nearest neighbour algorithm, logistic regression, scoring systems and artificial neural networks are investigated. Key issues for model design are described. The mathematical treatment of some aspects of model structure is also included for readers interested in developing models, though a full understanding of mathematical relationships is not necessary if the reader is only interested in perceiving the practical meaning of model assumptions, weaknesses and strengths from a user point of view. Results Scoring systems are very attractive due to their simplicity of use, although this may undermine their predictive capacity. Logistic regression models are trustworthy tools, although they suffer from the principal limitations of most regression procedures. Bayesian models seem to be a good compromise between complexity and predictive performance, but model recalibration is generally necessary. <it>k</it>-nearest neighbour may be a valid non parametric technique, though computational cost and the need for large data storage are major weaknesses of this approach. Artificial neural networks have intrinsic advantages with respect to common statistical models, though the training process may be problematical. Conclusion Knowledge of model assumptions and the theoretical strengths and weaknesses of different approaches are fundamental for designing models for estimating the probability of morbidity after heart surgery. However, a rational choice also requires evaluation and comparison of actual performances of locally-developed competitive models in the clinical scenario to obtain satisfactory agreement between local needs and model response. In the second part of this study the above predictive models will therefore be tested on real data acquired in a specialized ICU.</p

Archivio della Ricerca - Università degli Studi di Siena