Search CORE

7 research outputs found

rmcfs: An R Package for Monte Carlo Feature Selection and Interdependency Discovery

Author: Jacek Koronacki
Michał Dramiński
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/07/2018
Field of study

We describe the R package rmcfs that implements an algorithm for ranking features from high dimensional data according to their importance for a given supervised classification task. The ranking is performed prior to addressing the classification task per se. This R package is the new and extended version of the MCFS (Monte Carlo feature selection) algorithm where an early version was published in 2005. The package provides an easy R interface, a set of tools to review results and the new ID (interdependency discovery) component. The algorithm can be used on continuous and/or categorical features (e.g., gene expression and phenotypic data) to produce an objective ranking of features with a statistically well-defined cutoff between informative and non-informative ones. Moreover, the directed ID graph that presents interdependencies between informative features is provided

Directory of Open Access Journals

Journal of Statistical Software

Incremental document map formation: multi-stage approach

Author: A. Kłopotek Mieczysław
Ciesielski Krzysztof
Czerski Dariusz
Dramiński Michał
T. Wierzchoń Sławomir
Publication venue: 'Uniwersytetu Marii Curie-Sklodowskiej w Lublinie'
Publication date: 01/01/2006
Field of study

The paper presents methodology for the incremental map formation in a multi-stage process of a search engine with the map based user interface1. The architecture of the experimental system allows for comparative evaluation of different constituent technologies for various stages of the process. The quality of the map generation process has been investigated based on a number of clustering and classification measures. Some conclusions concerning the impact of various technological solutions on map quality are presented

Biblioteka Nauki - repozytorium artykuÅÃ³w

University of Maria Curie-Skłodowska (UMCS): Scientific e-Journals / Uniwersytet Marii Curie-Skłodowskiej: e-czasopisma naukowe

A Rough Set-Based Model of HIV-1 Reverse Transcriptase Resistome

Author: Dramiński Michał
Ginalski Krzysztof
Kierczak Marcin
Komorowski Jan
Koronacki Jacek
Rudnicki Witold
Publication venue: Libertas Academica
Publication date: 01/01/2009
Field of study

Reverse transcriptase (RT) is a viral enzyme crucial for HIV-1 replication. Currently, 12 drugs are targeted against the RT. The low fidelity of the RT-mediated transcription leads to the quick accumulation of drug-resistance mutations. The sequence-resistance relationship remains only partially understood. Using publicly available data collected from over 15 years of HIV proteome research, we have created a general and predictive rule-based model of HIV-1 resistance to eight RT inhibitors. Our rough set-based model considers changes in the physicochemical properties of a mutated sequence as compared to the wild-type strain. Thanks to the application of the Monte Carlo feature selection method, the model takes into account only the properties that significantly contribute to the resistance phenomenon. The obtained results show that drug-resistance is determined in more complex way than believed. We confirmed the importance of many resistance-associated sites, found some sites to be less relevant than formerly postulated and—more importantly—identified several previously neglected sites as potentially relevant. By mapping some of the newly discovered sites on the 3D structure of the RT, we were able to suggest possible molecular-mechanisms of drug-resistance. Importantly, our model has the ability to generalize predictions to the previously unseen cases. The study is an example of how computational biology methods can increase our understanding of the HIV-1 resistome

Directory of Open Access Journals

Publikationer från Uppsala Universitet

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Monte Carlo feature selection and interdependency discovery in supervised classification

Author: Dramiński Michał
Kierczak Marcin
Komorowski Jan
Koronacki Jacek
Publication venue: Heidelberg : Springer
Publication date
Field of study

Applications of machine learning techniques in Life Sciences are the main applications forcing a paradigm shift in the way these techniques are used. Rather than obtaining the best possible supervised classiﬁer, the Life Scientist needs to know which features contribute best to classifying distinct classes and what are the interdependencies between the features. To this end we signiﬁcantly extend our earlier work [Dramiński et al. (2008)] that introduced an effective and reliable method for ranking features according to their importance for classiﬁcation. We begin with adding a method for ﬁnding a cut-off between informative and non-informative fea- tures and then continue with a development of a methodology and an implementa- tion of a procedure for determining interdependencies between informative features. The reliability of our approach rests on multiple construction of tree classiﬁers. Essentially, each classiﬁer is trained on a randomly chosen subset of the original data using only a fraction of all of the observed features. This approach is conceptually simple yet computer-intensive. The methodology is validated on a large and difﬁcult task of modelling HIV-1 reverse transcriptase resistance to drugs which is a good example of the aforementioned paradigm shift. We construct a classiﬁer but of the main interest is the identiﬁcation of mutation points (i.e. features) and their combinations that model drug resistance.feature selection, interdependency discovery, MCFS-ID, biological sequence analysi