Search CORE

7 research outputs found

Monte Carlo Feature Selection and Interdependency Discovery in Supervised Classification

Author: A. Gyenesei
A.A. Alizadeh
C. Lu
D. Harris
H. Jonckheere
J. Ren
J.D. Bauman
K. Chrysostomou
K.J. Archer
Kaushik
L. Menédez-Arias
M. Dramiński
R. Tibshirani
R. Tibshirani
S. Dudoit
S. Sarafianos
S.Y. Rhee
T.R. Golub
Valverde-Garduño
W.R. Rudnicki
Y. Li
Y. Saeys
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Recommended from our members

R.ROSETTA: an interpretable machine learning framework.

Author: Baltzer Nicholas
Borneloev Susanne
Diamanti Klev
Feuk Lars
Garbulowski Mateusz
Komorowski Jan
Smolińska Karolina
Stoll Patricia
Øhrn Aleksander
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 07/03/2021
Field of study

Funder: Uppsala Universitet; doi: http://dx.doi.org/10.13039/501100007051Funder: Polska Akademia Nauk; doi: http://dx.doi.org/10.13039/501100004382Funder: Uppsala UniversityBACKGROUND: Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. RESULTS: We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. CONCLUSIONS: R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables

Apollo (Cambridge)

R.ROSETTA: an interpretable machine learning framework.

Author: Baltzer Nicholas
Bornelöv Susanne
Diamanti Klev
Feuk Lars
Garbulowski Mateusz
Komorowski Jan
Smolińska Karolina
Stoll Patricia
Øhrn Aleksander
Publication venue: BMC Bioinformatics
Publication date: 01/01/2021
Field of study

Repository for Publications and Research Data

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Apollo (Cambridge)

NORA - Norwegian Open Research Archives

Protein Networks as Logic Functions in Development and Cancer

Author: A Bureau
A Chariot
A Ransick
AS Sultan
B Alberts
BJ Frey
BME Moret
C Kingsford
C Lefebvre
CL Smith
CW Roberts
D Hanahan
D Lang
D Opitz
DR Rhodes
E Lee
E Segal
E Segal
EH Davidson
EH Davidson
EV Prochownik
F Rapaport
FJ Muller
H Aizawa
HJ Cordell
HS Phillips
HY Chuang
I Ulitsky
I Ulitsky
I Ulitsky
IA Stasinopoulos
IW Taylor
J Lessard
JA Blake
Janusz Dutkowski
KG Becker
L Bei
L Breiman
L Breiman
L Ein-Dor
L Ein-Dor
L Ho
L Ho
L Meng
LH Hartwell
LM Bundy
M Ashburner
M Dramiński
M Kang
M Wozniak
ME Higgins
MJ van de Vijver
MQ Hassan
MS Carro
MS Cline
R Ren
RK Nibbe
Russ B. Altman
S Efroni
SA Chowdhury
SC Materna
SH Li
T Hwang
T Ideker
T Ideker
T Ravasi
TM Williams
Trey Ideker
W Huang da
X Yang
Y Kwon
Y Ono
Y Wang
Publication venue: Public Library of Science
Publication date: 01/09/2011
Field of study

Many biological and clinical outcomes are based not on single proteins, but on modules of proteins embedded in protein networks. A fundamental question is how the proteins within each module contribute to the overall module activity. Here, we study the modules underlying three representative biological programs related to tissue development, breast cancer metastasis, or progression of brain cancer, respectively. For each case we apply a new method, called Network-Guided Forests, to identify predictive modules together with logic functions which tie the activity of each module to the activity of its component genes. The resulting modules implement a diverse repertoire of decision logic which cannot be captured using the simple approximations suggested in previous work such as gene summation or subtraction. We show that in cancer, certain combinations of oncogenes and tumor suppressors exert competing forces on the system, suggesting that medical genetics should move beyond cataloguing individual cancer genes to cataloguing their combinatorial logic

Crossref

Directory of Open Access Journals

PubMed Central

rmcfs: An R Package for Monte Carlo Feature Selection and Interdependency Discovery

Author: Jacek Koronacki
Michał Dramiński
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/07/2018
Field of study

We describe the R package rmcfs that implements an algorithm for ranking features from high dimensional data according to their importance for a given supervised classification task. The ranking is performed prior to addressing the classification task per se. This R package is the new and extended version of the MCFS (Monte Carlo feature selection) algorithm where an early version was published in 2005. The package provides an easy R interface, a set of tools to review results and the new ID (interdependency discovery) component. The algorithm can be used on continuous and/or categorical features (e.g., gene expression and phenotypic data) to produce an objective ranking of features with a statistically well-defined cutoff between informative and non-informative ones. Moreover, the directed ID graph that presents interdependencies between informative features is provided

Directory of Open Access Journals

Journal of Statistical Software

Monte Carlo feature selection and interdependency discovery in supervised classification

Author: A. Gyenesei
A.A. Alizadeh
C. Lu
D. Harris
H. Jonckheere
J. Ren
J.D. Bauman
K. Chrysostomou
K.J. Archer
Kaushik
L. Menédez-Arias
M. Dramiński
R. Tibshirani
R. Tibshirani
S. Dudoit
S. Sarafianos
S.Y. Rhee
T.R. Golub
Valverde-Garduño
W.R. Rudnicki
Y. Li
Y. Saeys
Publication venue: Heidelberg : Springer
Publication date: 01/01/2010
Field of study

Applications of machine learning techniques in Life Sciences are the main applications forcing a paradigm shift in the way these techniques are used. Rather than obtaining the best possible supervised classiﬁer, the Life Scientist needs to know which features contribute best to classifying distinct classes and what are the interdependencies between the features. To this end we signiﬁcantly extend our earlier work [Dramiński et al. (2008)] that introduced an effective and reliable method for ranking features according to their importance for classiﬁcation. We begin with adding a method for ﬁnding a cut-off between informative and non-informative fea- tures and then continue with a development of a methodology and an implementa- tion of a procedure for determining interdependencies between informative features. The reliability of our approach rests on multiple construction of tree classiﬁers. Essentially, each classiﬁer is trained on a randomly chosen subset of the original data using only a fraction of all of the observed features. This approach is conceptually simple yet computer-intensive. The methodology is validated on a large and difﬁcult task of modelling HIV-1 reverse transcriptase resistance to drugs which is a good example of the aforementioned paradigm shift. We construct a classiﬁer but of the main interest is the identiﬁcation of mutation points (i.e. features) and their combinations that model drug resistance.feature selection, interdependency discovery, MCFS-ID, biological sequence analysi

Crossref

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Monte Carlo feature selection and interdependency discovery in supervised classification

Author: Dramiński Michał
Kierczak Marcin
Komorowski Jan
Koronacki Jacek
Publication venue: Heidelberg : Springer
Publication date
Field of study