Search CORE

67 research outputs found

Benchmark study of feature selection strategies for multi-omics data

Author: Du Shangming
Hornung Roman
Li Yingxia
Mansmann Ulrich
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

BACKGROUND: In the last few years, multi-omics data, that is, datasets containing different types of high-dimensional molecular variables for the same samples, have become increasingly available. To date, several comparison studies focused on feature selection methods for omics data, but to our knowledge, none compared these methods for the special case of multi-omics data. Given that these data have specific structures that differentiate them from single-omics data, it is unclear whether different feature selection strategies may be optimal for such data. In this paper, using 15 cancer multi-omics datasets we compared four filter methods, two embedded methods, and two wrapper methods with respect to their performance in the prediction of a binary outcome in several situations that may affect the prediction results. As classifiers, we used support vector machines and random forests. The methods were compared using repeated fivefold cross-validation. The accuracy, the AUC, and the Brier score served as performance metrics. RESULTS: The results suggested that, first, the chosen number of selected features affects the predictive performance for many feature selection methods but not all. Second, whether the features were selected by data type or from all data types concurrently did not considerably affect the predictive performance, but for some methods, concurrent selection took more time. Third, regardless of which performance measure was considered, the feature selection methods mRMR, the permutation importance of random forests, and the Lasso tended to outperform the other considered methods. Here, mRMR and the permutation importance of random forests already delivered strong predictive performance when considering only a few selected features. Finally, the wrapper methods were computationally much more expensive than the filter and embedded methods. CONCLUSIONS: We recommend the permutation importance of random forests and the filter method mRMR for feature selection using multi-omics data, where, however, mRMR is considerably more computationally costly. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04962-x

Open Access LMU

PubMed Central

Hybrid feature selection of breast cancer gene expression microarray data based on metaheuristic methods: a comprehensive review

Author: Ab. Aziz Nor Azlina
Besar Rosli
Mohd Ali Nursabillilah
Publication venue: 'MDPI AG'
Publication date: 20/09/2022
Field of study

Breast cancer (BC) remains the most dominant cancer among women worldwide. Numerous BC gene expression microarray-based studies have been employed in cancer classification and prognosis. The availability of gene expression microarray data together with advanced classification methods has enabled accurate and precise classification. Nevertheless, the microarray datasets suffer from a large number of gene expression levels, limited sample size, and irrelevant features. Additionally, datasets are often asymmetrical, where the number of samples from different classes is not balanced. These limitations make it difficult to determine the actual features that contribute to the existence of cancer classification in the gene expression profiles. Various accurate feature selection methods exist, and they are being widely applied. The objective of feature selection is to search for a relevant, discriminant feature subset from the basic feature space. In this review, we aim to compile and review the latest hybrid feature selection methods based on bio-inspired metaheuristic methods and wrapper methods for the classification of BC and other types of cancer

Universiti Teknikal Malaysia Melaka (UTeM) Repository

An interpretable multi-stage forecasting framework for energy consumption and CO2 emissions for the transportation sector

Author: Hamid Eskandari
Publication venue: Elsevier BV
Publication date
Field of study

The transportation sector is deemed one of the primary sources of energy consumption and greenhouse gases throughout the world. To realise and design sustainable transport, it is imperative to comprehend relationships and evaluate interactions among a set of variables, which may influence transport energy consumption and CO2 emissions. Unlike recent published papers, this study strives to achieve a balance between machine learning (ML) model accuracy and model interpretability using the Shapley additive explanation (SHAP) method for forecasting the energy consumption and CO2 emissions in the UK's transportation sector. To this end, this paper proposes an interpretable multi-stage forecasting framework to simultaneously maximise the ML model accuracy and determine the relationship between the predictions and the influential variables by revealing the contribution of each variable to the predictions. For the UK's transportation sector, the experimental results indicate that road carbon intensity is found to be the most contributing variable to both energy consumption and CO2 emissions predictions. Unlike other studies, population and GDP per capita are found to be uninfluential variables. The proposed multi-stage forecasting framework may assist policymakers in making more informed energy decisions and establishing more accurate investment

Cronfa at Swansea University

Feature selection algorithms for Malaysian dengue outbreak detection model

Author: Azuraliza Abu Bakar
Husam I.S. Abuhamad
Mazura Sahani
Suhaila Zainudin
Zainudin Mohd Ali
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/02/2017
Field of study

Dengue fever is considered as one of the most common mosquito borne diseases worldwide. Dengue outbreak detection can be very useful in terms of practical efforts to overcome the rapid spread of the disease by providing the knowledge to predict the next outbreak occurrence. Many studies have been conducted to model and predict dengue outbreak using different data mining techniques. This research aimed to identify the best features that lead to better predictive accuracy of dengue outbreaks using three different feature selection algorithms; particle swarm optimization (PSO), genetic algorithm (GA) and rank search (RS). Based on the selected features, three predictive modeling techniques (J48, DTNB and Naive Bayes) were applied for dengue outbreak detection. The dataset used in this research was obtained from the Public Health Department, Seremban, Negeri Sembilan, Malaysia. The experimental results showed that the predictive accuracy was improved by applying feature selection process before the predictive modeling process. The study also showed the set of features to represent dengue outbreak detection for Malaysian health agencies

UKM Journal Article Repository

R : A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset

Author: Al-Rajab Murad
Arasaradnam Ramesh
Joy Mike
Kentour Mohamed
Lu Joan
Sawsa Ahlam
Shuweikeh Emad
Xu Qiang
Publication venue: Public Library of Science
Publication date: 01/01/2023
Field of study

Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance

Directory of Open Access Journals

Warwick Research Archives Portal Repository

Huddersfield Research Portal

A Probabilistic Multi-Objective Artificial Bee Colony Algorithm for Gene Selection

Author: Banu Diri
Bulent Bolat
Zeynep Ozger
Publication venue: 'Verlag der Technischen Universitat Graz'
Publication date: 01/01/2019
Field of study

Microarray technology is widely used to report gene expression data. The inclusion of many features and few samples is one of the characteristic features of this platform. In order to define significant genes for a particular disease, the problem of high-dimensionality microarray data should be overcome. The Artificial Bee Colony (ABC) Algorithm is a successful meta-heuristic algorithm that solves optimization problems effectively. In this paper, we propose a hybrid gene selection method for discriminatively selecting genes. We propose a new probabilistic binary Artificial Bee Colony Algorithm, namely PrBABC, that is hybridized with three different filter methods. The proposed method is applied to nine microarray datasets in order to detect distinctive genes for classifying cancer data. Results are compared with other wellknown meta-heuristic algorithms: Binary Differential Evolution Algorithm (BinDE), Binary Particle Swarm Optimization Algorithm (BinPSO), and Genetic Algorithm (GA), as well as with other methods in the literature. Experimental results show that the probabilistic self-adaptive learning strategy integrated into the employed-bee phase can boost classification accuracy with a minimal number of genes

ZENODO

Directory of Open Access Journals

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Swarm Intelligence Based Feature Selection for High Dimensional Classification: A Literature Survey

Author: Hnin Myint Phyu
Saw Thinzar
Publication venue: 'International Journal of Computer Engineering and Applications'
Publication date: 16/05/2019
Field of study

Feature selection is an important and challenging task in machine learning and data mining techniques to avoid the curse of dimensionality and maximize the classification accuracy. Moreover, feature selection helps to reduce computational complexity of learning algorithm, improve prediction performance, better data understanding and reduce data storage space. Swarm intelligence based feature selection approach enables to find an optimal feature subset from an extremely large dimensionality of features for building the most accurate classifier model. There is still a type of researches that is not done yet in data mining. In this paper, the utilization of swarm intelligence algorithms for feature selection process in high dimensional data focusing on medical data classification is form the subject matter. The results shows that swarm intelligence algorithms reviewed based on state-of-the-art literature have a promising capability that can be applied in feature selections techniques. The significance of this work is to present the comparison and various alternatives of swarm algorithms to be applied in feature selections for high dimensional classification

International Journal of Computer (IJC - Global Society of Scientific Research and Researchers, GSSRR)

Supervised Methods for Biomarker Detection from Microarray Experiments

Author: Cattelani Luca
Fortino Vittorio
Fratello Michele
Greco Dario
Kinaret Pia Anneli Sofia
Serra Angela
Publication venue: Springer, UK
Publication date: 01/01/2022
Field of study

Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.Non peer reviewe

Helsingin yliopiston digitaalinen arkisto

Trepo - Institutional Repository of Tampere University

Recommended from our members

Experienced grey wolf optimizer through reinforcement learning and neural networks

Author: Emary E
Grosan C
Zawbaa HM
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/01/2017
Field of study

In this paper, a variant of Grey Wolf Optimizer (GWO) that uses reinforcement learning principles combined with neural networks to enhance the performance is proposed. The aim is to overcome, by reinforced learning, the common challenges of setting the right parameters for the algorithm. In GWO, a single parameter is used to control the exploration/exploitation rate which influences the performance of the algorithm. Rather than using a global way to change this parameter for all the agents, we use reinforcement learning to set it on an individual basis. The adaptation of the exploration rate for each agent depends on the agent’s own experience and the current terrain of the search space. In order to achieve this, an experience repository is built based on the neural network to map a set of agents’ states to a set of corresponding actions that specifically influence the exploration rate. The experience repository is updated by all the search agents to reflect experience and to enhance the future actions continuously. The resulted algorithm is called Experienced Grey Wolf Optimizer (EGWO) and its performance is assessed on solving feature selection problems and on finding optimal weights for neural networks algorithm. We use a set of performance indicators to evaluate the efficiency of the method. Results over various datasets demonstrate an advance of the EGWO over the original GWO and other meta-heuristics such as genetic algorithms and particle swarm optimizationIPROCOM Marie Curie initial training network; 10.13039/501100004963-People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/2007-2013/; Romanian National Authority for Scientific Research, CNDI-UEFISCDI

Brunel University Research Archive