Search CORE

421 research outputs found

De la personnalité des célébrités à la personnalité des marques

Author: G. Pantin-Sohier
L. Ambroise
P. Valette-Florence
Publication venue
Publication date: 01/01/2007
Field of study

Determining appropriate approaches for using data in feature selection

Author: A Kalousis
C Ambroise
DW Aha
F Wilcoxon
G Chandrashekar
H Liu
J Reunanen
JC Platt
JR Quinlan
L Yu
M Lecocke
MA Hall
P Somol
V Bolón-Canedo
Y Han
Y Saeys
Z He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/12/2015
Field of study

Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

Crossref

Springer - Publisher Connector

University of East Anglia digital repository

Infrastructure expansion challenges sustainable development in Papua New Guinea

Author: Alamgir Mohammed
Brenier Ambroise
Campbell Mason J.
Engert Jayden
Ibisch Pierre L.
Kiele Regina
Laurance William F.
Mutton Thomas
Porolak Gabriel
Sloan Sean
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

The island of New Guinea hosts the third largest expanse of tropical rainforest on the planet. Papua New Guinea—comprising the eastern half of the island—plans to nearly double its national road network (from 8,700 to 15,000 km) over the next three years, to spur economic growth. We assessed these plans using fine-scale biophysical and environmental data. We identified numerous environmental and socioeconomic risks associated with these projects, including the dissection of 54 critical biodiversity habitats and diminished forest connectivity across large expanses of the island. Key habitats of globally endangered species including Goodfellow's tree-kangaroo (Dendrolagus goodfellowi), Matchie's tree kangaroo (D. matschiei), and several birds of paradise would also be bisected by roads and opened up to logging, hunting, and habitat conversion. Many planned roads would traverse rainforests and carbon-rich peatlands, contradicting Papua New Guinea's international commitments to promote low-carbon development and forest conservation for climate-change mitigation. Planned roads would also create new deforestation hotspots via rapid expansion of logging, mining, and oil-palm plantations. Our study suggests that several planned road segments in steep and high-rainfall terrain would be extremely expensive in terms of construction and maintenance costs. This would create unanticipated economic challenges and public debt. The net environmental, social, and economic risks of several planned projects—such as the Epo-Kikori link, Madang-Baiyer link, Wau-Malalaua link, and some other planned projects in the Western and East Sepik Provinces—could easily outstrip their overall benefits. Such projects should be reconsidered under broader environmental, economic, and social grounds, rather than short-term economic considerations

ResearchOnline@JCU

ResearchOnline at James Cook University

Directory of Open Access Journals

The Dexi-SH* model for a multivariate assessment of agro-ecological sustainability of dairy grazing systems

Author: Ambroise R.
Astigarraga L.
Bockstaller C.
Colloque Dinabio
Coquil X.
Fiorelli J.-L.
Gerber M.
Hostiou N.
Ingrand S.
Marie M.
Peigné J.
Plantureux S.
Sadok Walid
Veysset P.
Publication venue: United Nations University
Publication date: 01/01/2009
Field of study

Dexi-SH* is an ex ante multivariate model for assessing the sustainability of dairy cows grazing systems. This model is composed of three sub-models that evaluate the impact of the systems on: (i) biotic resources; (ii) abiotic resources, and (iii) pollution risks. The structuring of the hierarchical tree was inspired by that of the Masc model. The choice of criteria and their aggregation modalities were discussed within a multi-disciplinary group of scientists. For each cluster, a utility function was established in order to determine weighting and priority functions between criteria. The model can take local and regional conditions and standards into account by adjusting criterion categories to the agroecological context, and the specific views of the decision makers by changing the weighting of criteria

Organic Eprints

DIAL UCLouvain

Identification of disease-causing genes using microarray data mining and gene ontology

Author: A Mohammadi
A Zhang
AA Alizadeh
Azadeh Mohammadi
B Duval
BF Souza
C Ambroise
C Ding
C Tago
D Lin
D Singh
E Martinez
FM Couto
I Guyon
I Inza
J Jaeger
JJ Jiang
L Li
L Yu
L Ziaei
Mansoor Salehi
Mohammad H Saraee
N Cristianini
P Pavlidis
P Resnik
PA Mundra
PA Mundra
PJ Park
R Genuer
RF Weaver
S Li
S Li
TM Huang
TR Golub
TS Furey
U Alon
W Xu
Y Ding
Y Saeys
Y Wang
YL Chin
Z Xie
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

University of Salford Institutional Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

Author: A Ivshina
Anne-Claire Haury
C Ambroise
C Fan
C Lai
C Sotiriou
C Sotiriou
F Reyal
G Abraham
H Zou
I Guyon
I Guyon
J Bi
J Mairal
J Wang
Jean-Philippe Vert
JPA Ioannidis
L Ein-Dor
L Ein-Dor
M Dai
Muy-Teck Teh
N Meinshausen
P Wirapati
Pierre Gestraud
R Kohavi
R Shen
R Simon
R Tibshirani
RA Irizarry
S Michiels
T Abeel
T Barrett
T Iwamoto
W Shi
Y Benjamini
Y Pawitan
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/06/2011
Field of study

Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. Methods: We compare 32 feature selection methods on 4 public gene expression datasets for breast cancer prognosis, in terms of predictive performance, stability and functional interpretability of the signatures they produce. Results: We observe that the feature selection method has a significant influence on the accuracy, stability and interpretability of signatures. Simple filter methods generally outperform more complex embedded or wrapper methods, and ensemble feature selection has generally no positive effect. Overall a simple Student's t-test seems to provide the best results. Availability: Code and data are publicly available at http://cbio.ensmp.fr/~ahaury/

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HAL Descartes

HAL-MINES ParisTech

Overview of Current Directions in Boredom Research

Author: Asani Nazim
Baillifard Ambroise
Bertrams Alex
Bieleke Maik
Brielmann Aenne
Caldwell Linda L.
Chan Christian S.
Coppin Géraldine
Danckert James
Dang Van
Daniels Lia M.
Martarelli Corinna S.
Pekrun Reinhard Herrmann
Tonne Artak Christine Emilie
Wolff Wanja
Publication venue: Routledge
Publication date: 01/01/2024
Field of study

The concluding chapter of this book represents a collaborative effort between the editors and all contributing authors, resulting in a comprehensive overview of the current directions in boredom research. Summaries of each chapter not only underscore the multitude of perspectives on boredom but also elucidate the diverse approaches employed in its study. Furthermore, the chapter directs attention to both known and unknown aspects of boredom, providing a foundation for future research in the field

ACU Research Bank

Gene selection with multiple ordering criteria

Author: BA Rosenzweig
C Ambroise
CA Tsai
Chen-An Tsai
Chun-Houh Chen
G Fleury
GS Akerman
H Liu
I Guyon
James J Chen
JH Cho
JM Perket
L Breiman
L Breiman
L Li
M de Berg
M Dettling
MAQC Consortium
O Barndorff-Nielsen
S Michiels
SE Choe
SH Jung
ShengLi Tzeng
U Alon
V Tusher
W Jin
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: A microarray study may select different differentially expressed gene sets because of different selection criteria. For example, the fold-change and p-value are two commonly known criteria to select differentially expressed genes under two experimental conditions. These two selection criteria often result in incompatible selected gene sets. Also, in a two-factor, say, treatment by time experiment, the investigator may be interested in one gene list that responds to both treatment and time effects. RESULTS: We propose three layer ranking algorithms, point-admissible, line-admissible (convex), and Pareto, to provide a preference gene list from multiple gene lists generated by different ranking criteria. Using the public colon data as an example, the layer ranking algorithms are applied to the three univariate ranking criteria, fold-change, p-value, and frequency of selections by the SVM-RFE classifier. A simulation experiment shows that for experiments with small or moderate sample sizes (less than 20 per group) and detecting a 4-fold change or less, the two-dimensional (p-value and fold-change) convex layer ranking selects differentially expressed genes with generally lower FDR and higher power than the standard p-value ranking. Three applications are presented. The first application illustrates a use of the layer rankings to potentially improve predictive accuracy. The second application illustrates an application to a two-factor experiment involving two dose levels and two time points. The layer rankings are applied to selecting differentially expressed genes relating to the dose and time effects. In the third application, the layer rankings are applied to a benchmark data set consisting of three dilution concentrations to provide a ranking system from a long list of differentially expressed genes generated from the three dilution concentrations. CONCLUSION: The layer ranking algorithms are useful to help investigators in selecting the most promising genes from multiple gene lists generated by different filter, normalization, or analysis methods for various objectives

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Optimally splitting cases for training and testing high dimensional classifiers

Author: A Dupuy
A Rosenwald
AM Molinaro
B Efron
C Ambroise
J Schafer
JM Boer
K Fukunaga
K Shedden
Kevin K Dobbin
KI Kim
KK Dobbin
KK Dobbin
L Devroye
L Sun
LJ van't Veer
MD Radmacher
O Ledoit
R Simon
Richard M Simon
RO Duda
S Mukherjee
TR Golub
WJ Fu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background We consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate? Results We develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of better understanding the factors that determine the best split proportions and to evaluate commonly used splitting strategies (1/2 training or 2/3 training) under a wide variety of conditions. These methods are based on a decomposition of the MSE into three intuitive component parts. Conclusions By applying these approaches to a number of synthetic and real microarray datasets we show that for linear classifiers the optimal proportion depends on the overall number of samples available and the degree of differential expression between the classes. The optimal proportion was found to depend on the full dataset size (n) and classification accuracy - with higher accuracy and smaller <it>n </it>resulting in more assigned to the training set. The commonly used strategy of allocating 2/3rd of cases for training was close to optimal for reasonable sized datasets (<it>n </it>≥ 100) with strong signals (i.e. 85% or greater full dataset accuracy). In general, we recommend use of our nonparametric resampling approach for determing the optimal split. This approach can be applied to any dataset, using any predictor development method, to determine the best split.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Atividade pozolânica dos resíduos cauliníticos das indústrias de mineração de caulim da Amazônia

Author: AMBROISE J.
AMBROISE J.
BARATA M.S.
BARATA M.S.
CALDARONE M.A.
DUARTE A.L.
M.S. Barata
MALHOTRA V.M.
MIELENS R.C.
MIELENS R.C.
MURAT M.
R.S. Angélica
RUAS A.P. L.
SAAD M.
SOUSA D.J.L.
SOUZA SANTOS P.
WILD S.
ZHANG M.H.
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2011
Field of study

Crossref