14,232 research outputs found
A new multi-objective wrapper method for feature selection – Accuracy and stability analysis for BCI
Feature selection is an important step in building classifiers for high-dimensional data problems, such as EEG classification for BCI applications. This paper proposes a new wrapper method for feature selection, based on a multi-objective evolutionary algorithm, where the representation of the individuals or potential solutions, along with the breeding operators and objective functions, have been carefully designed to select a small subset of features that has good generalization capability, trying to avoid the over-fitting problems that wrapper methods usually suffer. A novel feature ranking procedure is also proposed in order to analyze the stability of the proposed wrapper method.
Four different classification schemes have been applied within the proposed wrapper method in order to evaluate its accuracy and stability for feature selection on a real motor imagery dataset. Experimental results show that the wrapper method presented in this paper is able to obtain very small subsets of features, which are quite stable and also achieve high classification accuracy, regardless of the classifiers used.Project TIN2015-67020-P (Spanish “Ministerio de EconomĂa y Competitividad”)European Regional Development Funds (ERDF
Machine learning for automatic prediction of the quality of electrophysiological recordings
The quality of electrophysiological recordings varies a lot due to technical and biological variability and neuroscientists inevitably have to select “good” recordings for further analyses. This procedure is time-consuming and prone to selection biases. Here, we investigate replacing human decisions by a machine learning approach. We define 16 features, such as spike height and width, select the most informative ones using a wrapper method and train a classifier to reproduce the judgement of one of our expert electrophysiologists. Generalisation performance is then assessed on unseen data, classified by the same or by another expert. We observe that the learning machine can be equally, if not more, consistent in its judgements as individual experts amongst each other. Best performance is achieved for a limited number of informative features; the optimal feature set being different from one data set to another. With 80–90% of correct judgements, the performance of the system is very promising within the data sets of each expert but judgments are less reliable when it is used across sets of recordings from different experts. We conclude that the proposed approach is relevant to the selection of electrophysiological recordings, provided parameters are adjusted to different types of experiments and to individual experimenters
Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling
Satellite imagery and remote sensing provide explanatory variables at
relatively high resolutions for modeling geospatial phenomena, yet regional
summaries are often desirable for analysis and actionable insight. In this
paper, we propose a novel method of inducing spatial aggregations as a
component of the machine learning process, yielding regional model features
whose construction is driven by model prediction performance rather than prior
assumptions. Our results demonstrate that Genetic Programming is particularly
well suited to this type of feature construction because it can automatically
synthesize appropriate aggregations, as well as better incorporate them into
predictive models compared to other regression methods we tested. In our
experiments we consider a specific problem instance and real-world dataset
relevant to predicting snow properties in high-mountain Asia
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Recommended from our members
Prediction of progression in idiopathic pulmonary fibrosis using CT scans atbaseline: A quantum particle swarm optimization - Random forest approach
Idiopathic pulmonary fibrosis (IPF) is a fatal lung disease characterized by an unpredictable progressive declinein lung function. Natural history of IPF is unknown and the prediction of disease progression at the time ofdiagnosis is notoriously difficult. High resolution computed tomography (HRCT) has been used for the diagnosisof IPF, but not generally for monitoring purpose. The objective of this work is to develop a novel predictivemodel for the radiological progression pattern at voxel-wise level using only baseline HRCT scans. Mainly, thereare two challenges: (a) obtaining a data set of features for region of interest (ROI) on baseline HRCT scans andtheir follow-up status; and (b) simultaneously selecting important features from high-dimensional space, andoptimizing the prediction performance. We resolved the first challenge by implementing a study design andhaving an expert radiologist contour ROIs at baseline scans, depending on its progression status in follow-upvisits. For the second challenge, we integrated the feature selection with prediction by developing an algorithmusing a wrapper method that combines quantum particle swarm optimization to select a small number of featureswith random forest to classify early patterns of progression. We applied our proposed algorithm to analyzeanonymized HRCT images from 50 IPF subjects from a multi-center clinical trial. We showed that it yields aparsimonious model with 81.8% sensitivity, 82.2% specificity and an overall accuracy rate of 82.1% at the ROIlevel. These results are superior to other popular feature selections and classification methods, in that ourmethod produces higher accuracy in prediction of progression and more balanced sensitivity and specificity witha smaller number of selected features. Our work is the first approach to show that it is possible to use onlybaseline HRCT scans to predict progressive ROIs at 6 months to 1year follow-ups using artificial intelligence
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
- …