Search CORE

105 research outputs found

Large-scale Nonlinear Variable Selection via Kernel Random Features

Author: A Beck
A Rakotomamonjy
B Schölkopf
DX Zhou
F Bach
GI Allen
I Guyon
J Weston
K Muandet
L Rosasco
M Yamada
P Gurram
R Kohavi
S Maldonado
S Mosci
T Hastie
T Hastie
V Bolón-Canedo
V Bolón-Canedo
V Koltchinskii
Y Lin
Publication venue
Publication date: 01/09/2018
Field of study

We propose a new method for input variable selection in nonlinear regression. The method is embedded into a kernel regression machine that can model general nonlinear functions, not being a priori limited to additive models. This is the first kernel-based variable selection method applicable to large datasets. It sidesteps the typical poor scaling properties of kernel methods by mapping the inputs into a relatively low-dimensional space of random features. The algorithm discovers the variables relevant for the regression task together with learning the prediction model through learning the appropriate nonlinear random feature maps. We demonstrate the outstanding performance of our method on a set of large-scale synthetic and real datasets.Comment: Final version for proceedings of ECML/PKDD 201

arXiv.org e-Print Archive

Crossref

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Archive ouverte UNIGE

Analyzing sensory data using non-linear preference learning with feature subset selection

Author: A. Picinelli
A. Rakotomamonjy
D. Schuurmans
F. Goyache
I. Guyon
J. Murray
M. Gil
S. Degroeve
T. Næs
V. Vapnik
W. Cohen
Y. Bengio
Publication venue: Springer
Publication date: 01/01/2004
Field of study

15th European Conference on Machine Learning, Pisa, Italy, September 20-24, 2004The quality of food can be assessed from different points of view. In this paper, we deal with those aspects that can be appreciated through sensory impressions. When we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers’ ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we postulate to learn from consumers’ preference judgments instead of using an approach based on regression. This requires the use of special purpose kernels and feature subset selection methods. We illustrate the benefits of our approach in two families of real-world data base

Crossref

Repositorio Institucional de la Universidad de Oviedo

Evolving training sets for improved transfer learning in brain computer interfaces

Author: A Onishi
A Rakotomamonjy
AB Schwartz
B Blankertz
BF Lotte
J Cantillo-Negrete
JJ Locascio
K Weiss
LF Nicolas-Alonso
M Xu
P Khatwani
T Hothorn
U Hoffmann
V Jayaram
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

A new proof-of-concept method for optimising the performance of Brain Computer Interfaces (BCI) while minimising the quantity of required training data is introduced. This is achieved by using an evolutionary approach to rearrange the distribution of training instances, prior to the construction of an Ensemble Learning Generic Information (ELGI) model. The training data from a population was optimised to emphasise generality of the models derived from it, prior to a re-combination with participant-specific data via the ELGI approach, and training of classifiers. Evidence is given to support the adoption of this approach in the more difficult BCI conditions: smaller training sets, and those suffering from temporal drift. This paper serves as a case study to lay the groundwork for further exploration of this approach

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

ABCD Neurocognitive Prediction Challenge 2019: Predicting individual fluid intelligence scores from structural MRI using probabilistic segmentation and kernel ridge regression

Author: A Pfefferbaum
A Rakotomamonjy
A Rao
A Woolgar
AMJ MacLullich
BJ Casey
C Blaiotta
CE Rasmussen
G Varoquaux
GC Monté-Rubio
GD Batty
GD Batty
IJ Deary
IJ Deary
IJ Deary
J Ashburner
J Gläscher
J Schrouff
J Schrouff
JP Rushton
KL Narr
LS Gottfredson
MA McDaniel
ME Tipping
NA Goriounova
Natacha Akshoomoff
NC Andreasen
Neil P. Oxtoby
RB McCall
Rex E. Jung
S Fors
S Karama
SB Blumberg
T Rohlfing
W Johnson
Publication venue
Publication date: 01/01/2019
Field of study

We applied several regression and deep learning methods to predict fluid intelligence scores from T1-weighted MRI scans as part of the ABCD Neurocognitive Prediction Challenge (ABCD-NP-Challenge) 2019. We used voxel intensities and probabilistic tissue-type labels derived from these as features to train the models. The best predictive performance (lowest mean-squared error) came from Kernel Ridge Regression (KRR;

\lambda=10

), which produced a mean-squared error of 69.7204 on the validation set and 92.1298 on the test set. This placed our group in the fifth position on the validation leader board and first place on the final (test) leader board.Comment: Winning entry in the ABCD Neurocognitive Prediction Challenge at MICCAI 2019. 7 pages plus references, 3 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

UCL Discovery

MPG.PuRe

Feature selection for chemical sensor arrays using mutual information

Author: A Kraskov
A Krause
A Rakotomamonjy
A Vergara
A Vergara
A Vergara
A Vergara
Amalia Z. Berna
AZ Berna
B Nelson
B Raman
C Cortes
C Guestrin
CC Chang
CE Shannon
E Llobet
H Dacres
H Koinuma
H Peng
H Zheng
I Guyon
I Rodriguez-Lujan
I Rodriguez-Lujan
IJ Myung
J Fonollosa
J Gardner
James P. Brody
Joseph T. Lizier
L Breiman
L Olsson
M Aleixandre
M Pardo
M Pardo
Mikhail Prokopenko
MK Muezzinoglu
N Friedman
R Battiti
R Binions
S Marco
S Martínez
S Pashami
Stephen C. Trowell
T Nowotny
TC Pearce
Thomas Nowotny
TM Cover
X. Rosalind Wang
XR Wang
Y Saeys
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Western Sydney ResearchDirect

Sussex Research Online

FigShare

Benchopt: Reproducible, efficient and collaborative optimization benchmarks

Author: Ablin Pierre
Bannier Pierre-Antoine
Charlier Benjamin
Dagréou Mathieu
Dantas Cassio F.
Durif Ghislain
Gramfort Alexandre
Klopfenstein Quentin
la Tour Tom Dupré
Lai En
Larsson Johan
Lefort Tanguy
Malézieux Benoit
Massias Mathurin
Moreau Thomas
Moufad Badr
Nguyen Binh T.
Rakotomamonjy Alain
Ramzi Zaccharie
Salmon Joseph
Vaiter Samuel
Publication venue
Publication date: 28/10/2022
Field of study

Numerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks:

\ell_2

-regularized logistic regression, Lasso, and ResNet18 training for image classification. These benchmarks highlight key practical findings that give a more nuanced view of the state-of-the-art for these problems, showing that for practical evaluation, the devil is in the details. We hope that Benchopt will foster collaborative work in the community hence improving the reproducibility of research findings.Comment: Accepted in proceedings of NeurIPS 22; Benchopt library documentation is available at https://benchopt.github.io

arXiv.org e-Print Archive

Feature Weighting Using Margin and Radius Based Error Bound Optimization in SVMs

Author: A. Kalousis
A. Rakotomamonjy
H. Zou
I. Guyon
J. Bonnans
J. Shawe-Taylor
J. Weston
K. Duan
N. Cristianini
N.M. Leo Liberti
O. Chapelle
R. Tibshirani
T. Hastie
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Enhanced protein fold recognition through a novel data integration approach

Author: A Andreeva
A Rakotomamonjy
AL Yuille
B Schölkopf
C Ding
CA Micchelli
CE Rasmussen
Colin Campbell
DT Jones
F Bach
F Bach
GRG Lanckriet
GRG Lanckriet
HB Shen
HW Mewes
I Dubchak
J Shawe-Taylor
J Ye
J Ye
JM Borwein
JV Davis
K Bleakley
K Chou
K Tsuda
Kaizhu Huang
L Liao
L Lo Conte
L Sun
L Vandenberghe
M Girolami
N Aronszajn
N Cristianini
ND Lawrence
PD Tao
R Hettich
RI Kondor
S Amari
S Ji
S Sonnenburg
T Damoulas
T Hastie
T Kato
Y Lin
Y Nesterov
Y Yamanishi
Y Ying
Yiming Ying
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein fold recognition is a key step in protein three-dimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the issue of finding the most efficient method for combining these different informative data sources and exploring their relative significance for protein fold classification. Kernel methods have been extensively used for biological data analysis. They can incorporate separate fold discriminatory features into kernel matrices which encode the similarity between samples in their respective data sources. Results In this paper we consider the problem of integrating multiple data sources using a kernel-based approach. We propose a novel information-theoretic approach based on a Kullback-Leibler (KL) divergence between the output kernel matrix and the input kernel matrix so as to integrate heterogeneous data sources. One of the most appealing properties of this approach is that it can easily cope with multi-class classification and multi-task learning by an appropriate choice of the output kernel matrix. Based on the position of the output and input kernel matrices in the KL-divergence objective, there are two formulations which we respectively refer to as <it>MKLdiv-dc </it>and <it>MKLdiv-conv</it>. We propose to efficiently solve MKLdiv-dc by a difference of convex (DC) programming method and MKLdiv-conv by a projected gradient descent algorithm. The effectiveness of the proposed approaches is evaluated on a benchmark dataset for protein fold recognition and a yeast protein function prediction problem. Conclusion Our proposed methods MKLdiv-dc and MKLdiv-conv are able to achieve state-of-the-art performance on the SCOP PDB-40D benchmark dataset for protein fold prediction and provide useful insights into the relative significance of informative data sources. In particular, MKLdiv-dc further improves the fold discrimination accuracy to 75.19% which is a more than 5% improvement over competitive Bayesian probabilistic and SVM margin-based kernel learning methods. Furthermore, we report a competitive performance on the yeast protein function prediction problem.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Explore Bristol Research

Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network

Author: A Butte
A Butte
A Ghazalpour
A Hoerl
A Rakotomamonjy
A Wille
B Efron
B Mangin
B Stranger
B Zhang
D Easton
E Segal
Eric P. Xing
F Bach
H Toh
J Friedman
J Weller
J Zhu
John D. Storey
K Basso
K Knight
M Keller
M Mehan
M Morley
N Friedman
N Li
N Malo
P Magwene
P Rosenberg
P Zhao
R Tibshirani
R Tibshirani
S Carter
S Kim
S Weisberg
S Wenzel
Seyoung Kim
SI Lee
SI Lee
T Hastie
V Cheung
V Emilsson
W Moore
W Shi
Y Benjamini
Y Chen
Publication venue: Public Library of Science
Publication date: 01/08/2009
Field of study

Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Morphological segmentation analysis and texture-based support vector machines classification on mice liver fibrosis microscopic images

Author: Amira Salah Ashour
Ashour A.S.
Bai X.
Chang H.H.
Chun M.G.
Das A.
Dey N.
Dey N.
Dorini F.A.
Farihan A.
Fuqian Shi
Gatiatulina E.R.
Gelzinis A.
Hore S.
Jiang W.
Kayasandik C.B.
Khakipour M.H.
Kotyk T.
Li B.
Li C.H.
Li H.
Lijun Wu
Luying Cao
López-Mir F.
Masoumi H.
Nellros F.
Nilanjan Dey
Oschatz M.
Preziosi B.M.
Qun Wu
Rakotomamonjy A.
Robert Simon Sherratt
Sayed G.I.
Tahir M.
Theodoridis S.
Venkatesan Rajinikanth
Vreuls C.P.H.
Wieclawek W.
Wong A.K.O.
Yamamoto S.
Yu Wang
Zia S.
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2019
Field of study

Background To reduce the intensity of the work of doctors, pre-classification work needs to be issued. In this paper, a novel and related liver microscopic image classification analysis method is proposed. Objective For quantitative analysis, segmentation is carried out to extract the quantitative information of special organisms in the image for further diagnosis, lesion localization, learning and treating anatomical abnormalities and computer-guided surgery. Methods in the current work, entropy based features of microscopic fibrosis mice’ liver images were analyzed using fuzzy c-cluster, k-means and watershed algorithms based on distance transformations and gradient. A morphological segmentation based on a local threshold was deployed to determine the fibrosis areas of images. Results the segmented target region using the proposed method achieved high effective microscopy fibrosis images segmenting of mice liver in terms of the running time, dice ratio and precision. The image classification experiments were conducted using Gray Level Co-occurrence Matrix (GLCM). The best classification model derived from the established characteristics was GLCM which performed the highest accuracy of classification using a developed Support Vector Machine (SVM). The training model using 11 features was found to be as accurate when only trained by 8 GLCMs. Conclusion The research illustrated the proposed method is a new feasible research approach for microscopy mice liver image segmentation and classification using intelligent image analysis techniques. It is also reported that the average computational time of the proposed approach was only 2.335 seconds, which outperformed other segmentation algorithms with 0.8125 dice ratio and 0.5253 precision

Central Archive at the University of Reading

Crossref