6,686 research outputs found

    A Novel Memetic Feature Selection Algorithm

    Get PDF
    Feature selection is a problem of finding efficient features among all features in which the final feature set can improve accuracy and reduce complexity. In feature selection algorithms search strategies are key aspects. Since feature selection is an NP-Hard problem; therefore heuristic algorithms have been studied to solve this problem. In this paper, we have proposed a method based on memetic algorithm to find an efficient feature subset for a classification problem. It incorporates a filter method in the genetic algorithm to improve classification performance and accelerates the search in identifying core feature subsets. Particularly, the method adds or deletes a feature from a candidate feature subset based on the multivariate feature information. Empirical study on commonly data sets of the university of California, Irvine shows that the proposed method outperforms existing methods

    An Online Sparse Streaming Feature Selection Algorithm

    Full text link
    Online streaming feature selection (OSFS), which conducts feature selection in an online manner, plays an important role in dealing with high-dimensional data. In many real applications such as intelligent healthcare platform, streaming feature always has some missing data, which raises a crucial challenge in conducting OSFS, i.e., how to establish the uncertain relationship between sparse streaming features and labels. Unfortunately, existing OSFS algorithms never consider such uncertain relationship. To fill this gap, we in this paper propose an online sparse streaming feature selection with uncertainty (OS2FSU) algorithm. OS2FSU consists of two main parts: 1) latent factor analysis is utilized to pre-estimate the missing data in sparse streaming features before con-ducting feature selection, and 2) fuzzy logic and neighborhood rough set are employed to alleviate the uncertainty between estimated streaming features and labels during conducting feature selection. In the experiments, OS2FSU is compared with five state-of-the-art OSFS algorithms on six real datasets. The results demonstrate that OS2FSU outperforms its competitors when missing data are encountered in OSFS

    Enhanced feature selection algorithm for pneumonia detection

    Get PDF
    Pneumonia is a type of lung disease that can be detected using X-ray images. The analysis of chest X-ray images is an active research area in medical image analysis and computer-aided radiology. This research aims to improve the accuracy and efficiency of radiologists' work by providing a technique for identifying and categorizing diseases. More attention should be given to applying machine learning approaches to develop a robust chest X-ray image classification method. The typical method for detecting Pneumonia is through chest X-ray images, but analyzing these images can be complex and requires the expertise of a radiographer. This paper demonstrates the feasibility of detecting the disease using chest X-ray images as datasets and a Support Vector Machine combined with a Naive Bayesian classifier, with PCA and GA as feature selection methods. The selected features are essential for training many classifiers. The proposed system achieved an accuracy of 92.26%, using 91% of the principal component. The study's result suggests that using PCA and GA for feature selection in chest X-ray image classification can achieve a good accuracy of 97.44%. Further research is needed to explore the use of other data mining models and care components to improve the accuracy and effectiveness of the system

    A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data

    Get PDF
    DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes

    Forecasting day-ahead electricity prices in Europe: the importance of considering market integration

    Full text link
    Motivated by the increasing integration among electricity markets, in this paper we propose two different methods to incorporate market integration in electricity price forecasting and to improve the predictive performance. First, we propose a deep neural network that considers features from connected markets to improve the predictive accuracy in a local market. To measure the importance of these features, we propose a novel feature selection algorithm that, by using Bayesian optimization and functional analysis of variance, evaluates the effect of the features on the algorithm performance. In addition, using market integration, we propose a second model that, by simultaneously predicting prices from two markets, improves the forecasting accuracy even further. As a case study, we consider the electricity market in Belgium and the improvements in forecasting accuracy when using various French electricity features. We show that the two proposed models lead to improvements that are statistically significant. Particularly, due to market integration, the predictive accuracy is improved from 15.7% to 12.5% sMAPE (symmetric mean absolute percentage error). In addition, we show that the proposed feature selection algorithm is able to perform a correct assessment, i.e. to discard the irrelevant features

    Research of Feature Selection Algorithm Based on Sparse Representation

    Get PDF
    在模式识别学科中,特征选择作为其范畴内的一个重要方向,已经演变成近些年来的学习热点。在现实生活中,科学研究的成果已经渗透到很多行业,并在行业中获得实际应用。在学科研究和现实生活应用中,将会面对和处理庞大的数据。该数据往往样本数不多,但是其数据维数很大并且冗余特征多,对计算机的处理资源和处理实时性是很大的挑战,解决“维度灾难”的问题有非常重要的作用。所以特征选择作为数据处理的重要步骤,发挥关键的作用。 由于维度过大的原因,高维数据的回归问题是一个比较大的挑战,一个有效的解决方法就是特征选择。而基于稀疏表示的线性回归已经被证明在处理高维数据时非常有效。传统的稀疏表示的线性回归算法有Lasso算法...In the pattern recognition disciplines, the feature selection as an important direction within its scope, which has evolved into a hotspot in recent years. In real life, the results of scientific research have penetrated into many industries, and obtain practical application in the industries. In disciplinary research and real-life applications, we will face and deal with huge amounts of data. How...学位:工学硕士院系专业:信息科学与技术学院_工程硕士(计算机技术)学号:2302014115319
    corecore