Search CORE

248 research outputs found

Inferring Multiple Graphical Structures

Author: A. Argyriou
C. Ambroise
C. Drummond
Christophe Ambroise
H. Toh
J. Baxter
J. Friedman
J.H. Friedman
Julien Chiquet
K. Sachs
M. Nikolova
M. Yuan
M. Yuan
M.R. Osborne
M.R. Osborne
N. Meinshausen
O. Banerjee
P. Ravikumar
R. Caruana
Y. Kim
Yves Grandvalet
Publication venue
Publication date: 12/05/2010
Field of study

Gaussian Graphical Models provide a convenient framework for representing dependencies between variables. Recently, this tool has received a high interest for the discovery of biological networks. The literature focuses on the case where a single network is inferred from a set of measurements, but, as wetlab data is typically scarce, several assays, where the experimental conditions affect interactions, are usually merged to infer a single network. In this paper, we propose two approaches for estimating multiple related graphs, by rendering the closeness assumption into an empirical prior or group penalties. We provide quantitative results demonstrating the benefits of the proposed approaches. The methods presented in this paper are embeded in the R package 'simone' from version 1.0-0 and later

arXiv.org e-Print Archive

Determining appropriate approaches for using data in feature selection

Author: A Kalousis
C Ambroise
DW Aha
F Wilcoxon
G Chandrashekar
H Liu
J Reunanen
JC Platt
JR Quinlan
L Yu
M Lecocke
MA Hall
P Somol
V Bolón-Canedo
Y Han
Y Saeys
Z He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/12/2015
Field of study

Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

Crossref

Springer - Publisher Connector

University of East Anglia digital repository

Mundo. Mapas generales (1856)

Author: Estruch y Jordán Domingo
Tardieu Ambroise, 1788-1841
Publication venue
Publication date: 01/01/1835
Field of study

Incluye los siguientes mapas: mapamundi en dos hemisferios, un planisferio, un mapa del mundo conocido de los antiguos, Europa, Islas Británicas, Francia, las Galias, Países Bajos, Suecia y Dinamarca, Suiza, Italia e Iliria, Italia antigua, Alemania, Germania, Panonia, Dacia y Sarmacia, Prusia, mapa general de Polonia, reino de Polonia, Rusia europea, Turquía europea y Grecia, Grecia antigua, planos para la historia de la Grecia Antigua y mapas de Asia, entre otros.Continene mapas del mundo antiguo y modernoTítulo, responsabilidad y datos de publicación tomados de diferentes repertorios consultados.La mayoría de los mapas están grabados por Ramón y Pablo Alabern y Domingo Estruch, realizados entre 1830 y 1834Copia digital. Madrid : Ministerio de Cultura. Dirección General del Libro, Archivos y Bibliotecas, 201

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

Author: A Ivshina
Anne-Claire Haury
C Ambroise
C Fan
C Lai
C Sotiriou
C Sotiriou
F Reyal
G Abraham
H Zou
I Guyon
I Guyon
J Bi
J Mairal
J Wang
Jean-Philippe Vert
JPA Ioannidis
L Ein-Dor
L Ein-Dor
M Dai
Muy-Teck Teh
N Meinshausen
P Wirapati
Pierre Gestraud
R Kohavi
R Shen
R Simon
R Tibshirani
RA Irizarry
S Michiels
T Abeel
T Barrett
T Iwamoto
W Shi
Y Benjamini
Y Pawitan
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/06/2011
Field of study

Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. Methods: We compare 32 feature selection methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter methods generally outperform more complex embedded or wrapper methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/~ahaury/

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HAL Descartes

HAL-MINES ParisTech

Identification of disease-causing genes using microarray data mining and gene ontology

Author: A Mohammadi
A Zhang
AA Alizadeh
Azadeh Mohammadi
B Duval
BF Souza
C Ambroise
C Ding
C Tago
D Lin
D Singh
E Martinez
FM Couto
I Guyon
I Inza
J Jaeger
JJ Jiang
L Li
L Yu
L Ziaei
Mansoor Salehi
Mohammad H Saraee
N Cristianini
P Pavlidis
P Resnik
PA Mundra
PA Mundra
PJ Park
R Genuer
RF Weaver
S Li
S Li
TM Huang
TR Golub
TS Furey
U Alon
W Xu
Y Ding
Y Saeys
Y Wang
YL Chin
Z Xie
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

University of Salford Institutional Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Protective ventilation in ARDS: as soon as possible. An immediate use of HFOV

Author: AC Bryan
Ambroise A Montcriol
Bertrand B Prunet
CW Bollen
D Dreyfuss
ER Schmid
Eric E Meaudre
J Downar
Julien J Bordes
MB Amato
MM Treggiari
Philippe Ph Goutorbe
RM Kacmarek
RM Kacmarek
Y Imai
Yves Y Asencio
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Roc632: An overview

Author: A Rosenwald
C Ambroise
EW Steyerberg
H Liu
LY Geer
P Collinson
T Fawcett
T Vu
WJ Krzanowski
Y Foucher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The present paper aims to analyze and explore the ROC632 package, specifying its main characteristics and functions. More specifically, the goal of this study is the evaluation of the effectiveness of the package and its strengths and weaknesses. This package was created in order to overcome the lack of information concerning incomplete time-to-event data, adapting the 0.632+ bootstrap estimator for the evaluation of time dependent ROC curves. By applying this package to a specific dataset (DLBCLpatients), it becomes possible to assess tangible data, determining if it is able to analyze complete and incomplete data efficiently and without bias.(undefined)info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

Optimality Driven Nearest Centroid Classification from Genomic Data

Author: A Alizadeh
Alan R. Dabney
AR Dabney
AR Dabney
B Efron
C Ambroise
C Stein
D Ross
I Hedenfalk
J Khan
J Schäfer
Ji Zhu
John D. Storey
JW Lee
K Mardia
P Bickel
R Shen
R Tibshirani
RJ McKay
RJ McKay
S Dudoit
T Golub
TH Bø
Y Guo
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

OAKTrust Digital Repository (Texas A&M Univ)

Gene selection with multiple ordering criteria

Author: BA Rosenzweig
C Ambroise
CA Tsai
Chen-An Tsai
Chun-Houh Chen
G Fleury
GS Akerman
H Liu
I Guyon
James J Chen
JH Cho
JM Perket
L Breiman
L Breiman
L Li
M de Berg
M Dettling
MAQC Consortium
O Barndorff-Nielsen
S Michiels
SE Choe
SH Jung
ShengLi Tzeng
U Alon
V Tusher
W Jin
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: A microarray study may select different differentially expressed gene sets because of different selection criteria. For example, the fold-change and p-value are two commonly known criteria to select differentially expressed genes under two experimental conditions. These two selection criteria often result in incompatible selected gene sets. Also, in a two-factor, say, treatment by time experiment, the investigator may be interested in one gene list that responds to both treatment and time effects. RESULTS: We propose three layer ranking algorithms, point-admissible, line-admissible (convex), and Pareto, to provide a preference gene list from multiple gene lists generated by different ranking criteria. Using the public colon data as an example, the layer ranking algorithms are applied to the three univariate ranking criteria, fold-change, p-value, and frequency of selections by the SVM-RFE classifier. A simulation experiment shows that for experiments with small or moderate sample sizes (less than 20 per group) and detecting a 4-fold change or less, the two-dimensional (p-value and fold-change) convex layer ranking selects differentially expressed genes with generally lower FDR and higher power than the standard p-value ranking. Three applications are presented. The first application illustrates a use of the layer rankings to potentially improve predictive accuracy. The second application illustrates an application to a two-factor experiment involving two dose levels and two time points. The layer rankings are applied to selecting differentially expressed genes relating to the dose and time effects. In the third application, the layer rankings are applied to a benchmark data set consisting of three dilution concentrations to provide a ranking system from a long list of differentially expressed genes generated from the three dilution concentrations. CONCLUSION: The layer ranking algorithms are useful to help investigators in selecting the most promising genes from multiple gene lists generated by different filter, normalization, or analysis methods for various objectives

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Impact of the spotted microarray preprocessing method on fold-change compression and variance stability

Author: A Tarca
Annie Robert
B Durbin
B Durbin
Benoît Macq
Bernadette Govaerts
Bertrand Bearzatto
C Kooperberg
D Allison
D Edwards
D Zhang
G Hardiman
G Lee
G Smyth
H Parsons
J Quackenbush
Jean-Luc Gala
Jérôme Ambroise
L Shi
M Ritchie
M Yang
P de Cremoux
P Tran
R Gentleman
R Muller
R Scharpf
R Shippy
S Dudoit
S Lin
T Barrett
T Han
T Patterson
W Huber
W Huber
X Cui
Y Leung
Y Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The standard approach for preprocessing spotted microarray data is to subtract the local background intensity from the spot foreground intensity, to perform a log2 transformation and to normalize the data with a global median or a lowess normalization. Although well motivated, standard approaches for background correction and for transformation have been widely criticized because they produce high variance at low intensities. Whereas various alternatives to the standard background correction methods and to log2 transformation were proposed, impacts of both successive preprocessing steps were not compared in an objective way. Results In this study, we assessed the impact of eight preprocessing methods combining four background correction methods and two transformations (the log2 and the glog), by using data from the MAQC study. The current results indicate that most preprocessing methods produce fold-change compression at low intensities. Fold-change compression was minimized using the Standard and the Edwards background correction methods coupled with a log2 transformation. The drawback of both methods is a high variance at low intensities which consequently produced poor estimations of the p-values. On the other hand, effective stabilization of the variance as well as better estimations of the p-values were observed after the glog transformation. Conclusion As both fold-change magnitudes and p-values are important in the context of microarray class comparison studies, we therefore recommend to combine the Edwards correction with a hybrid transformation method that uses the log2 transformation to estimate fold-change magnitudes and the glog transformation to estimate p-values.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DIAL UCLouvain