1,628 research outputs found

    Classification of high dimensional data using LASSO ensembles

    Get PDF
    Urda, D., Franco, L. and Jerez, J.M. (2017). Classification of high dimensional data using LASSO ensembles. Proceedings IEEE SSCI'17, Symposium Series on Computational Intelligence, Honolulu, Hawaii, U.S.A. (2017). ISBN: 978-1-5386-2726-6The estimation of multivariable predictors with good performance in high dimensional settings is a crucial task in biomedical contexts. Usually, solutions based on the application of a single machine learning model are provided while the use of ensemble methods is often overlooked within this area despite the well-known benefits that these methods provide in terms of predictive performance. In this paper, four ensemble approaches are described using LASSO base learners to predict the vital status of a patient from RNA-Seq gene expression data. The results of the analysis carried out in a public breast invasive cancer (BRCA) dataset shows that the ensemble approaches outperform statistically significant the standard LASSO model considered as baseline case. We also perform an analysis of the computational costs involved for each of the approaches, providing different usage recommendations according to the available computational power.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    Interaction Forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects

    Get PDF
    Although interaction effects can be exploited to improve predictions and allow for valuable insights into covariate interplay, they are given little attention in analysis. We introduce interaction forests, which are a variant of random forests for categorical, continuous, and survival outcomes, explicitly considering quantitative and qualitative interaction effects in bivariable splits performed by the trees constituting the forests. The new effect importance measure (EIM) associated with interaction forests allows ranking of the covariate pairs with respect to their interaction effects' importance for prediction. Using EIM, separate importance value lists for univariable effects, quantitative interaction effects, and qualitative interaction effects are obtained. In the spirit of interpretable machine learning, the bivariable split types of interaction forests target well interpretable interaction effects that are easy to communicate. To learn about the nature of the interplay between identified interacting covariate pairs it is convenient to visualise their estimated bivariable influence. We provide functions that perform this task in the R package diversityForest that implements interaction forests. In a large-scale empirical study using 220 data sets, interaction forests tended to deliver better predictions than conventional random forests and competing random forest variants that use multivariable splitting. In a simulation study, EIM delivered considerably better rankings for the relevant quantitative and qualitative interaction effects than competing approaches. These results indicate that interaction forests are suitable tools for the challenging task of identifying and making use of well interpretable interaction effects in predictive modelling

    Enhancement of Random Forests Using Trees with Oblique Splits

    Get PDF
    This work presents an enhancement to the classification tree algorithm which forms the basis for Random Forests. Differently from the classical tree-based methods that focus on one variable at a time to separate the observations, the new algorithm performs the search for the best split in two-dimensional space using a linear combination of variables. Besides the classification, the method can be used to determine variables interaction and perform feature extraction. Theoretical investigations and numerical simulations were used to analyze the properties and performance of the new approach. Comparison with other popular classification methods was performed using simulated and real data examples. The algorithm was implemented as an extension package for the statistical computing environment R and is available for free download under the GNU General Public License

    FINE SCALE MAPPING OF LAURENTIAN MIXED FOREST NATURAL HABITAT COMMUNITIES USING MULTISPECTRAL NAIP AND UAV DATASETS COMBINED WITH MACHINE LEARNING METHODS

    Get PDF
    Natural habitat communities are an important element of any forest ecosystem. Mapping and monitoring Laurentian Mixed Forest natural communities using high spatial resolution imagery is vital for management and conservation purposes. This study developed integrated spatial, spectral and Machine Learning (ML) approaches for mapping complex vegetation communities. The study utilized ultra-high and high spatial resolution National Agriculture Imagery Program (NAIP) and Unmanned Aerial Vehicle (UAV) datasets, and Digital Elevation Model (DEM). Complex natural vegetation community habitats in the Laurentian Mixed Forest of the Upper Midwest. A detailed workflow is presented to effectively process UAV imageries in a dense forest environment where the acquisition of ground control points (GCPs) is extremely difficult. Statistical feature selection methods such as Joint Mutual Information Maximization (JMIM) which is not that widely used in the natural resource field and variable importance (varImp) were used to discriminate spectrally similar habitat communities. A comprehensive approach to training set delineation was implemented including the use of Principal Components Analysis (PCA), Independent Components Analysis (ICA), soils data, and expert image interpretation. The developed approach resulted in robust training sets to delineate and accurately map natural community habitats. Three ML algorithms were implemented Random Forest (RF), Support Vector Machine (SVM), and Averaged Neural Network (avNNet). RF outperformed SVM and avNNet. Overall RF accuracies across the three study sites ranged from 79.45-87.74% for NAIP and 87.31-93.74% for the UAV datasets. Different ancillary datasets including spectral enhancement and image transformation techniques (PCA and ICA), GLCM-Texture, spectral indices, and topography features (elevation, slope, and aspect) were evaluated using the JMIM and varImp feature selection methods, overall accuracy assessment, and kappa calculations. The robustness of the workflow was evaluated with three study sites which are geomorphologically unique and contain different natural habitat communities. This integrated approach is recommended for accurate natural habitat community classification in ecologically complex landscapes

    Longitudinal clinical score prediction in Alzheimer's disease with soft-split sparse regression based random forest

    Get PDF
    Alzheimer’s disease (AD) is an irreversible neurodegenerative disease and affects a large population in the world. Cognitive scores at multiple time points can be reliably used to evaluate the progression of the disease clinically. In recent studies, machine learning techniques have shown promising results on the prediction of AD clinical scores. However, there are multiple limitations in the current models such as linearity assumption and missing data exclusion. Here, we present a nonlinear supervised sparse regression–based random forest (RF) framework to predict a variety of longitudinal AD clinical scores. Furthermore, we propose a soft-split technique to assign probabilistic paths to a test sample in RF for more accurate predictions. In order to benefit from the longitudinal scores in the study, unlike the previous studies that often removed the subjects with missing scores, we first estimate those missing scores with our proposed soft-split sparse regression–based RF and then utilize those estimated longitudinal scores at all the previous time points to predict the scores at the next time point. The experiment results demonstrate that our proposed method is superior to the traditional RF and outperforms other state-of-art regression models. Our method can also be extended to be a general regression framework to predict other disease scores

    OBJECT-BASED CLASSIFICATION OF EARTHQUAKE DAMAGE FROM HIGH-RESOLUTION OPTICAL IMAGERY USING MACHINE LEARNING

    Get PDF
    Object-based approaches to the segmentation and supervised classification of remotely-sensed images yield more promising results compared to traditional pixel-based approaches. However, the development of an object-based approach presents challenges in terms of algorithm selection and parameter tuning. Subjective methods and trial and error are often used, but time consuming and yield less than optimal results. Objective methods are warranted, especially for rapid deployment in time sensitive applications such as earthquake induced damage assessment. Our research takes a systematic approach to evaluating object-based image segmentation and machine learning algorithms for the classification of earthquake damage in remotely-sensed imagery using Trimble’s eCognition software. We tested a variety of algorithms and parameters on post-event aerial imagery of the 2011 earthquake in Christchurch, New Zealand. Parameters and methods are adjusted and results compared against manually selected test cases representing different classifications used. In doing so, we can evaluate the effectiveness of the segmentation and classification of buildings, earthquake damage, vegetation, vehicles and paved areas, and compare different levels of multi-step image segmentations. Specific methods and parameters explored include classification hierarchies, object selection strategies, and multilevel segmentation strategies. This systematic approach to object-based image classification is used to develop a classifier that is then compared against current pixel-based classification methods for post-event imagery of earthquake damage. Our results show a measurable improvement against established pixel-based methods as well as object-based methods for classifying earthquake damage in high resolution, post-event imagery
    • …
    corecore