5 research outputs found

    A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing

    Full text link
    The past years have witnessed many dedicated open-source projects that built and maintain implementations of Support Vector Machines (SVM), parallelized for GPU, multi-core CPUs and distributed systems. Up to this point, no comparable effort has been made to parallelize the Elastic Net, despite its popularity in many high impact applications, including genetics, neuroscience and systems biology. The first contribution in this paper is of theoretical nature. We establish a tight link between two seemingly different algorithms and prove that Elastic Net regression can be reduced to SVM with squared hinge loss classification. Our second contribution is to derive a practical algorithm based on this reduction. The reduction enables us to utilize prior efforts in speeding up and parallelizing SVMs to obtain a highly optimized and parallel solver for the Elastic Net and Lasso. With a simple wrapper, consisting of only 11 lines of MATLAB code, we obtain an Elastic Net implementation that naturally utilizes GPU and multi-core CPUs. We demonstrate on twelve real world data sets, that our algorithm yields identical results as the popular (and highly optimized) glmnet implementation but is one or several orders of magnitude faster.Comment: 10 page

    Multimodal neuroimaging signatures of early cART-treated paediatric HIV - Distinguishing perinatally HIV-infected 7-year-old children from uninfected controls

    Get PDF
    Introduction: HIV-related brain alterations can be identified using neuroimaging modalities such as proton magnetic resonance spectroscopy (1H-MRS), structural magnetic resonance imaging (sMRI), diffusion tensor imaging (DTI), and functional MRI (fMRI). However, few studies have combined multiple MRI measures/features to identify a multivariate neuroimaging signature that typifies HIV infection. Elastic net (EN) regularisation uses penalised regression to perform variable selection, shrinking the weighting of unimportant variables to zero. We chose to use the embedded feature selection of EN logistic regression to identify a set of neuroimaging features characteristic of paediatric HIV infection. We aimed to determine 1) the most useful features across MRI modalities to separate HIV+ children from HIV- controls and 2) whether better classification performance is obtained by combining multimodal MRI features rather than using features from a single modality. Methods: The study sample comprised 72 HIV+ 7-year-old children from the Children with HIV Early Antiretroviral Therapy (CHER) trial in Cape Town, who initiated combination antiretroviral therapy (cART) in infancy and had their viral loads suppressed from a young age, and 55 HIV- control children. Neuroimaging features were extracted to generate 7 MRI-derived sets. For sMRI, 42 regional brain volumes (1st set), mean cortical thickness and gyrification in 68 brain regions (2nd and 3rd set) were used. For DTI data: radial (RD), axial (AD), mean (MD) diffusivities, and fractional anisotropy (FA) in each of 20 atlas regions were extracted for a total of 80 DTI features (4th set). For 1H-MRS, concentrations of 14 metabolites and their ratios to creatine in the basal ganglia, peritrigonal white matter, and midfrontal gray matter voxels (5th, 6th and 7th set) were considered. A logistic EN regression model with repeated 10-fold cross validation (CV) was implemented in R, initially on each feature set separately. Sex, age and total intracranial volume (TIV) were included as confounders with no shrinkage penalty. For each model, the classification performance for HIV+ vs HIV- was assessed by computing accuracy, specificity, sensitivity, and mean area under the receiver operator characteristic curve (AUC) across 10 CV folds and 100 iterations. To combine feature sets, the best performing set was concatenated with each of the other sets and further EN regressions were run. The combination giving the largest AUC was combined with each of the remaining sets until there was no further increase in AUC. Two concatenation techniques were explored: nested and non-nested modelling. All models were assessed for their goodness of fit using χ 2 likelihood ratio tests for non-nested models and Akaike information criterion (AIC) for nested models. To identify features most useful in distinguishing HIV infection, the EN model was retrained on all the data, to find features with non-zero weights. Finally, multivariate imputation using chained equations (MICE) was explored to investigate the effect of increased sample size on classification and feature selection. Results: The best performing modality in the single modality analysis was sMRI volume
    corecore