Search CORE

23 research outputs found

Massively-Parallel Feature Selection for Big Data

Author: Borboudakis Giorgos
Christophides Vassilis
Katsogridakis Pavlos
Pratikakis Polyvios
Tsamardinos Ioannis
Publication venue
Publication date: 23/08/2017
Field of study

We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of

p

-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Using Feature Selection Methods to Discover Common Users’ Preferences for Online Recommender Systems

Author: Kamau Gabriel Ndung’u
Ndung'u Rachael Njeri
Wambugu Geoffrey Mariga
Publication venue: 'Asian Online Journals'
Publication date: 01/01/2021
Field of study

Recommender systems have taken over user’s choice to choose the items/services they want from online markets, where lots of merchandise is traded. Collaborative filtering-based recommender systems uses user opinions and preferences. Determination of commonly used attributes that influence preferences used for prediction and subsequent recommendation of unknown or new items to users is a significant objective while developing recommender engines. In conventional systems, study of user behavior to know their dis/like over items would be carried-out. In this paper, presents feature selection methods to mine such preferences through selection of high influencing attributes of the items. In machine learning, feature selection is used as a data pre-processing method but extended its use on this work to achieve two objectives; removal of redundant, uninformative features and for selecting formative, relevant features based on the response variable. The latter objective, was suggested to identify and determine the frequent and shared features that would be preferred mostly by marketplace online users as they express their preferences. The dataset used for experimentation and determination was synthetic dataset. The Jupyter Notebook™ using python was used to run the experiments. Results showed that given a number of formative features, there were those selected, with high influence to the response variable. Evidence showed that different feature selection methods resulted with different feature scores, and intrinsic method had the best overall results with 85% model accuracy. Selected features were used as frequently preferred attributes that influence users’ preferences

International Journal of Computer and Information Technology

MUT INSTITUTIONAL REPOSITORY

Using Feature Selection Methods to Discover Common Users’ Preferences for Online Recommender Systems

Author: Kamau Gabriel Ndung’u
Ndung'u Rachael Njeri
Wambugu Geoffrey Mariga
Publication venue: 'Asian Online Journals'
Publication date: 31/01/2021
Field of study

International Journal of Computer and Information Technology

Supervised Methods for Biomarker Detection from Microarray Experiments

Author: Cattelani Luca
Fortino Vittorio
Fratello Michele
Greco Dario
Kinaret Pia Anneli Sofia
Serra Angela
Publication venue: Springer, UK
Publication date: 01/01/2022
Field of study

Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.Non peer reviewe

Helsingin yliopiston digitaalinen arkisto

Trepo - Institutional Repository of Tampere University

A model-free feature selection technique of feature screening and random forest based recursive feature elimination

Author: Xia Siwei
Yang Yuehan
Publication venue
Publication date: 14/02/2023
Field of study

In this paper, we propose a model-free feature selection method for ultra-high dimensional data with mass features. This is a two phases procedure that we propose to use the fused Kolmogorov filter with the random forest based RFE to remove model limitations and reduce the computational complexity. The method is fully nonparametric and can work with various types of datasets. It has several appealing characteristics, i.e., accuracy, model-free, and computational efficiency, and can be widely used in practical problems, such as multiclass classification, nonparametric regression, and Poisson regression, among others. We show that the proposed method is selection consistent and

L_2

consistent under weak regularity conditions. We further demonstrate the superior performance of the proposed method over other existing methods by simulations and real data examples

arXiv.org e-Print Archive

A framework for feature selection through boosting

Author: Alsahaf Ahmad
Azzopardi George
Petkov Nicolai
Shenoy Vikram
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

As dimensions of datasets in predictive modelling continue to grow, feature selection becomes increasingly practical. Datasets with complex feature interactions and high levels of redundancy still present a challenge to existing feature selection methods. We propose a novel framework for feature selection that relies on boosting, or sample re-weighting, to select sets of informative features in classification problems. The method uses as its basis the feature rankings derived from fast and scalable tree-boosting models, such as XGBoost. We compare the proposed method to standard feature selection algorithms on 9 benchmark datasets. We show that the proposed approach reaches higher accuracies with fewer features on most of the tested datasets, and that the selected features have lower redundancy

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

The FEDHC Bayesian network learning algorithm

Author: Tsagris Michail
Publication venue
Publication date: 09/11/2021
Field of study

The paper proposes a new hybrid Bayesian network learning algorithm, termed Forward Early Dropping Hill Climbing (FEDHC), devised to work with either continuous or categorical variables. Specifically for the case of continuous data, a robust to outliers version of FEDHC, that can be adopted by other BN learning algorithms, is proposed. Further, the paper manifests that the only implementation of MMHC in the statistical software \textit{R}, is prohibitively expensive and a new implementation is offered. The FEDHC is tested via Monte Carlo simulations that distinctly show it is computationally efficient, and produces Bayesian networks of similar to, or of higher accuracy than MMHC and PCHC. Finally, an application of FEDHC, PCHC and MMHC algorithms to real data, from the field of economics, is demonstrated using the statistical software \textit{R}

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals