Search CORE

15 research outputs found

rFerns: An Implementation of the Random Ferns Method for General-Purpose Machine Learning

Author: Kursa Miron B.
Publication venue
Publication date: 01/11/2014
Field of study

In this paper I present an extended implementation of the Random ferns algorithm contained in the R package rFerns. It differs from the original by the ability of consuming categorical and numerical attributes instead of only binary ones. Also, instead of using simple attribute subspace ensemble it employs bagging and thus produce error approximation and variable importance measure modelled after Random forest algorithm. I also present benchmarks' results which show that although Random ferns' accuracy is mostly smaller than achieved by Random forest, its speed and good quality of importance measure it provides make rFerns a reasonable choice for a specific applications

arXiv.org e-Print Archive

Directory of Open Access Journals

Journal of Statistical Software

Robustness of Random Forest-based gene selection methods

Author: Kursa Miron B.
Publication venue
Publication date: 18/10/2013
Field of study

Gene selection is an important part of microarray data analysis because it provides information that can lead to a better mechanistic understanding of an investigated phenomenon. At the same time, gene selection is very difficult because of the noisy nature of microarray data. As a consequence, gene selection is often performed with machine learning methods. The Random Forest method is particularly well suited for this purpose. In this work, four state-of-the-art Random Forest-based feature selection methods were compared in a gene selection context. The analysis focused on the stability of selection because, although it is necessary for determining the significance of results, it is often ignored in similar studies. The comparison of post-selection accuracy in the validation of Random Forest classifiers revealed that all investigated methods were equivalent in this context. However, the methods substantially differed with respect to the number of selected genes and the stability of selection. Of the analysed methods, the Boruta algorithm predicted the most genes as potentially important. The post-selection classifier error rate, which is a frequently used measure, was found to be a potentially deceptive measure of gene selection quality. When the number of consistently selected genes was considered, the Boruta algorithm was clearly the best. Although it was also the most computationally intensive method, the Boruta algorithm's computational demands could be reduced to levels comparable to those of other algorithms by replacing the Random Forest importance with a comparable measure from Random Ferns (a similar but simplified classifier). Despite their design assumptions, the minimal optimal selection methods, were found to select a high fraction of false positives

arXiv.org e-Print Archive

Springer - Publisher Connector

Feature Selection with the Boruta Package

Author: Miron B. Kursa
Witold R. Rudnicki
Publication venue
Publication date
Field of study

This article describes a R package Boruta, implementing a novel feature selection algorithm for finding \emph{all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented.

Research Papers in Economics

Feature Selection with the Boruta Package

Author: Kursa Miron B.
Rudnicki Witold R.
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 16/09/2010
Field of study

This article describes a R package Boruta, implementing a novel feature selection algorithm for finding emph{all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented

Directory of Open Access Journals

Journal of Statistical Software

Inferring causal molecular networks: empirical assessment through a community-based effort

Author: Afsari Bahman
Al-Ouran Rami
Anton Bernat
Arodz Tomasz
Bagheri Neda
Berlow Noah
Bisberg Alexander J.
Bivol Adrian
Bohler Anwesha
Bonet Jaume
Bonneau Richard
Budak Gungor
Bunescu Razvan
Caglar Mehmet
Cai Binghuang
Cai Chunhui
Carlin Daniel E.
Carlon Azzurra
Chen Lujia
Ciaccio Mark F.
Cokelaer Thomas
Cooper Gregory
Coort Susan
Creighton Chad J.
Daneshmand Seyed-Mohammad-Hadi
Danilova Ludmila V.
De La Fuente Alberto
Di Camillo Barbara
Dutta-Moscato Joyeeta
Emmett Kevin
Evelo Chris
Fassia Mohammad-Kasim H.
Favorov Alexander V.
Fertig Elana J.
Finkle Justin D.
Finotello Francesca
Friend Stephen
Gao Jean
Gao Xi
Ghosh Samik
Giaretta Alberto
Graim Kiley
Gray Joe W.
Großeholz Ruth
Guan Yuanfang
Guinney Justin
Hafemeister Christoph
Hahn Oliver
Haider Saad
Hase Takeshi
Heiser Laura M.
Hill Steven M.
Hodgson Jay
Hoff Bruce
Hsu Chih Hao
Hu Chenyue W.
Hu Ying
Huang Xun
Jalili Mahdi
Jiang Xia
Kacprowski Tim
Kaderali Lars
Kang Mingon
Kannan Venkateshan
Kellen Michael
Kikuchi Kaito
Kim Dong-Chul
Kitano Hiroaki
Knapp Bettina
Koeppl Heinz
Komatsoulis George
Krämer Andreas
Kursa Miron Bartosz
Kutmon Martina
Lee Wai Shing
Li Yichao
Liang Xiaoyu
Linger Michael
Liu Yu
Liu Zhaoqi
Long Byron L.
Lu Songjian
Lu Xinghua
Manfrini Marco
Matos Marta R. A.
Meerzaman Daoud
Mills Gordon B.
Min Wenwen
Mukherjee Sach
Müller Christian Lorenz
Neapolitan Richard E.
Nesser Nicole K.
Noren David P.
Norman Thea
Oliva Baldo
Opiyo Stephen Obol
Pal Ranadip
Palinkas Aljoscha
Paull Evan O.
Planas-Iglesias Joan
Poglayen Daniel
Qutub Amina A.
Saez-Rodriguez Julio
Sambo Francesco
Sanavia Tiziana
Sharifi-Zarchi Ali
Sichani Omid Askari
Slawek Janusz
Sokolov Artem
Song Mingzhou
Spellman Paul T.
Stolovitzky Gustavo
Streck Adam
Strunz Sonja
Stuart Joshua M.
Taylor Dane
Tegnér Jesper
Thobe Kirste
Toffolo Gianna Maria
Trifoglio Emanuele
Unger Michael
Wan Qian
Wang Haizhou
Welch Lonnie
Wong Chris K.
Wu Jia J.
Xue Albert Y.
Yamanaka Ryota
Yan Chunhua
Zairis Sakellarios
Zengerling Michael
Zenil Hector
Zhang Yang
Zhu Fan
Zi Zhike
Publication venue
Publication date: 01/01/2016
Field of study

Inferring molecular networks is a central challenge in computational biology. However, it has remained unclear whether causal, rather than merely correlational, relationships can be effectively inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge that focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results constitute the most comprehensive assessment of causal network inference in a mammalian setting carried out to date and suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess the causal validity of inferred molecular networks

Carolina Digital Repository

Inferring causal molecular networks: empirical assessment through a community-based effort

Author: Adam Streck
Afsari Bahman
Albert Y. Xue
Alberto de la Fuente
Ali Sharifi Zarchi
Aljoscha Palinkas
Andreas Kr&#228
Anwesha Bohler
Azzurra Carlon
Baldo Oliva
Bernat Anton
Bettina Knapp
Binghuang Cai
Bisberg Alexander J
Bivol Adrian
Bruce Hoff
Carlin Daniel E
Chad J. Creighton
Chenyue W Hu
Chih Hao Hsu
Chris Evelo
Christian Lorenz M&#252
Christoph Hafemeister
Chunhua Yan
Chunhui Cai
Cokelaer Thomas
Daniel Poglayen
Danilova Ludmila V
Daoud Meerzaman
Di Camillo Barbara
Dong Chul Kim
Dutta Moscato
Favorov Alexander V
Fertig Elana J
Finotello Francesca
Friend Stephen
Gao Xi
George Komatsoulis
Giaretta Alberto
Graim Kiley
Gray Joe W
Gregory Cooper
Guan Yuanfang
Gungor Budak
Hector Zenil
Heiser Laura M
Hill Steven M
Hiroaki Kitano
Hpn Dream Consortium: Rami Al Ouran
Janusz Slawek
Jaume Bonet
Javier Garcia Garcia
Jay Hodgson
Jean Gao
Jesper Tegn&#233
Jia J. Wu
Joan Planas Iglesias
Justin D Finkle
Justin Guinney
Kaito Kikuchi
Kellen Michael
Kevin Emmett
Kirste Thobe
Koeppl Heinz
Lars Kaderali
Lee Wai Shing
Liu Yu
Long Byron L
Lonnie Welch
Lujia Chen
Mahdi Jalili
Manfrini Marco
Mark F. Ciaccio
Marta R. A Matos
Martina Kutmon
Mehmet Caglar
Michael Zengerling
Mills Gordon B
Mingon Kang
Miron Bartosz Kursa
Mohammad Kasim H. Fassia
Mukherjee Sach
Neda Bagheri
Nesser Nicole K
Noah Berlow
Noren David P
Norman Thea
Oliver Hahn
Omid Askari Sichani
Paull Evan O
Qian Wan
Qutub Amina A
Ranadip Pal
Razvan Bunescu
Richard E Neapolitan
Richard Bonneau
Ruth Gro&#223
Ryota Yamanaka
Saad Haider
Saez Rodriguez Julio
Sakellarios Zairis
Sambo Francesco
Samik Ghosh
Sanavia Tiziana
Seyed Mohammad Hadi Daneshmand
Shihua Zhang
Sokolov Artem
Song Mingzhou
Songjian Lu
Sonja Strunz
Spellman Paul T
Stephen Obol Opiyo
Stolovitzky Gustavo
Stuart Joshua M
Susan Coort
Takeshi Hase
Taylor Dane
Tim Kacprowski
Toffolo Gianna Maria
Tomasz Arodz
Trifoglio Emanuele
Unger Michael
Venkateshan Kannan
Wang Haizhou
Wenwen Min
Wong Chris K
Xia Jiang
Xiaoyu Liang
Xinghua Lu
Xun Huang
Yichao Li
Ying Hu
Zhang Yang
Zhaoqi Liu
Zhike Zi
Zhu Fan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense

Institutional Research Information System University of Turin

Archivio istituzionale della ricerca - Università di Padova