59 research outputs found

    Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.</p> <p>We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.</p> <p>Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution.</p> <p>Results</p> <p>Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (<it>L</it><sub>1</sub>) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.</p> <p>Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations.</p> <p>Conclusions</p> <p>The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.</p> <p>The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.</p> <p>We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.</p

    An organelle-specific protein landscape identifies novel diseases and molecular mechanisms

    Get PDF
    Cellular organelles provide opportunities to relate biological mechanisms to disease. Here we use affinity proteomics, genetics and cell biology to interrogate cilia: poorly understood organelles, where defects cause genetic diseases. Two hundred and seventeen tagged human ciliary proteins create a final landscape of 1,319 proteins, 4,905 interactions and 52 complexes. Reverse tagging, repetition of purifications and statistical analyses, produce a high-resolution network that reveals organelle-specific interactions and complexes not apparent in larger studies, and links vesicle transport, the cytoskeleton, signalling and ubiquitination to ciliary signalling and proteostasis. We observe sub-complexes in exocyst and intraflagellar transport complexes, which we validate biochemically, and by probing structurally predicted, disruptive, genetic variants from ciliary disease patients. The landscape suggests other genetic diseases could be ciliary including 3M syndrome. We show that 3M genes are involved in ciliogenesis, and that patient fibroblasts lack cilia. Overall, this organelle-specific targeting strategy shows considerable promise for Systems Medicine

    Demographic variation in incidence of adult glioma by subtype, United States, 1992-2007

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We hypothesized that race/ethnic group, sex, age, and/or calendar period variation in adult glioma incidence differs between the two broad subtypes of glioblastoma (GBM) and non-GBM. Primary GBM, which constitute 90-95% of GBM, differ from non-GBM with respect to a number of molecular characteristics, providing a molecular rationale for these two broad glioma subtypes.</p> <p>Methods</p> <p>We utilized data from the Surveillance, Epidemiology, and End Results Program for 1992-2007, ages 30-69 years. We compared 15,088 GBM cases with 9,252 non-GBM cases. We used Poisson regression to calculate adjusted rate ratios and 95% confidence intervals.</p> <p>Results</p> <p>The GBM incidence rate increased proportionally with the 4<sup>th </sup>power of age, whereas the non-GBM rate increased proportionally with the square root of age. For each subtype, compared to non-Hispanic Whites, the incidence rate among Blacks, Asians/Pacific Islanders, and American Indians/Alaskan Natives was substantially lower (one-fourth to one-half for GBM; about two-fifths for non-GBM). Secondary to this primary effect, race/ethnic group variation in incidence was significantly less for non-GBM than for GBM. For each subtype, the incidence rate was higher for males than for females, with the male/female rate ratio being significantly higher for GBM (1.6) than for non-GBM (1.4). We observed significant calendar period trends of increasing incidence for GBM and decreasing incidence for non-GBM. For the two subtypes combined, we observed a 3% decrease in incidence between 1992-1995 and 2004-2007.</p> <p>Conclusions</p> <p>The substantial difference in age effect between GBM and non-GBM suggests a fundamental difference in the genesis of primary GBM (the driver of GBM incidence) versus non-GBM. However, the commonalities between GBM and non-GBM with respect to race/ethnic group and sex variation, more notable than the somewhat subtle, albeit statistically significant, differences, suggest that within the context of a fundamental difference, some aspects of the complex process of gliomagenesis are shared by these subtypes as well. The increasing calendar period trend of GBM incidence coupled with the decreasing trend of non-GBM incidence may at least partly be due to a secular trend in diagnostic fashion, as opposed to real changes in incidence of these subtypes.</p

    Site-specific protein modification to identify the MutL interface of MutH

    No full text
    • …
    corecore