34 research outputs found

    Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.</p> <p>We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.</p> <p>Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution.</p> <p>Results</p> <p>Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (<it>L</it><sub>1</sub>) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.</p> <p>Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations.</p> <p>Conclusions</p> <p>The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.</p> <p>The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.</p> <p>We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.</p

    An organelle-specific protein landscape identifies novel diseases and molecular mechanisms

    Get PDF
    Cellular organelles provide opportunities to relate biological mechanisms to disease. Here we use affinity proteomics, genetics and cell biology to interrogate cilia: poorly understood organelles, where defects cause genetic diseases. Two hundred and seventeen tagged human ciliary proteins create a final landscape of 1,319 proteins, 4,905 interactions and 52 complexes. Reverse tagging, repetition of purifications and statistical analyses, produce a high-resolution network that reveals organelle-specific interactions and complexes not apparent in larger studies, and links vesicle transport, the cytoskeleton, signalling and ubiquitination to ciliary signalling and proteostasis. We observe sub-complexes in exocyst and intraflagellar transport complexes, which we validate biochemically, and by probing structurally predicted, disruptive, genetic variants from ciliary disease patients. The landscape suggests other genetic diseases could be ciliary including 3M syndrome. We show that 3M genes are involved in ciliogenesis, and that patient fibroblasts lack cilia. Overall, this organelle-specific targeting strategy shows considerable promise for Systems Medicine

    Nos2 Inactivation Promotes the Development of Medulloblastoma in Ptch1+/− Mice by Deregulation of Gap43–Dependent Granule Cell Precursor Migration

    Get PDF
    Medulloblastoma is the most common malignant brain tumor in children. A subset of medulloblastoma originates from granule cell precursors (GCPs) of the developing cerebellum and demonstrates aberrant hedgehog signaling, typically due to inactivating mutations in the receptor PTCH1, a pathomechanism recapitulated in Ptch1+/− mice. As nitric oxide may regulate GCP proliferation and differentiation, we crossed Ptch1+/− mice with mice lacking inducible nitric oxide synthase (Nos2) to investigate a possible influence on tumorigenesis. We observed a two-fold higher medulloblastoma rate in Ptch1+/− Nos2−/− mice compared to Ptch1+/− Nos2+/+ mice. To identify the molecular mechanisms underlying this finding, we performed gene expression profiling of medulloblastomas from both genotypes, as well as normal cerebellar tissue samples of different developmental stages and genotypes. Downregulation of hedgehog target genes was observed in postnatal cerebellum from Ptch1+/+ Nos2−/− mice but not from Ptch1+/− Nos2−/− mice. The most consistent effect of Nos2 deficiency was downregulation of growth-associated protein 43 (Gap43). Functional studies in neuronal progenitor cells demonstrated nitric oxide dependence of Gap43 expression and impaired migration upon Gap43 knock-down. Both effects were confirmed in situ by immunofluorescence analyses on tissue sections of the developing cerebellum. Finally, the number of proliferating GCPs at the cerebellar periphery was decreased in Ptch1+/+ Nos2−/− mice but increased in Ptch1+/− Nos2−/− mice relative to Ptch1+/− Nos2+/+ mice. Taken together, these results indicate that Nos2 deficiency promotes medulloblastoma development in Ptch1+/− mice through retention of proliferating GCPs in the external granular layer due to reduced Gap43 expression. This study illustrates a new role of nitric oxide signaling in cerebellar development and demonstrates that the localization of pre-neoplastic cells during morphogenesis is crucial for their malignant progression

    Outcome prediction in pediatric medulloblastoma based on DNA copy-number aberrations of chromosomes 6q and 17q and the MYC and MYCN loci

    Full text link
    PURPOSE: Medulloblastoma is the most common malignant brain tumor in children. Current treatment decisions are based on clinical variables. Novel tumor-derived biomarkers may improve the risk stratification of medulloblastoma patients. PATIENTS AND METHODS: A model for the molecular risk stratification was proposed from an array-based comparative genomic hybridization (array-CGH) screen (n = 80). Fluorescence in situ hybridization (FISH) analyses for chromosome arms 6q, 17p, and 17q and the MYC and MYCN loci were performed in an independent validation set (n = 260). Copy number aberrations were correlated with clinical, histologic, and survival data. RESULTS: Gain of 6q and 17q and genomic amplification of MYC or MYCN were each associated with poor outcome in the array-CGH study (n = 80). In contrast, all patients with 6q-deleted tumors survived. Given these findings, the following hierarchical molecular staging system was defined: (1) MYC/MYCN amplification, (2) 6q gain, (3) 17q gain, (4) 6q and 17q balanced, and (5) 6q deletion. The prognostic value of this staging system was investigated by FISH analysis (n = 260). The addition of molecular markers to clinical risk factors resulted in the identification of a large proportion of patients (72 of 260 patients; 30%) at high risk for relapse and death who would be considered standard risk by application of clinical variables alone. CONCLUSION: Genomic aberrations in medulloblastoma are powerful independent markers of disease progression and survival. By adding genomic markers to established clinical and histologic variables, outcome prediction can be substantially improved. Because the analyses can be conducted on routine paraffin-embedded material, it will be especially feasible to use this novel molecular staging system in large multicenter clinical trials

    ELM--the database of eukaryotic linear motifs.

    Get PDF
    Linear motifs are short, evolutionarily plastic components of regulatory proteins and provide low-affinity interaction interfaces. These compact modules play central roles in mediating every aspect of the regulatory functionality of the cell. They are particularly prominent in mediating cell signaling, controlling protein turnover and directing protein localization. Given their importance, our understanding of motifs is surprisingly limited, largely as a result of the difficulty of discovery, both experimentally and computationally. The Eukaryotic Linear Motif (ELM) resource at http://elm.eu.org provides the biological community with a comprehensive database of known experimentally validated motifs, and an exploratory tool to discover putative linear motifs in user-submitted protein sequences. The current update of the ELM database comprises 1800 annotated motif instances representing 170 distinct functional classes, including approximately 500 novel instances and 24 novel classes. Several older motif class entries have been also revisited, improving annotation and adding novel instances. Furthermore, addition of full-text search capabilities, an enhanced interface and simplified batch download has improved the overall accessibility of the ELM data. The motif discovery portion of the ELM resource has added conservation, and structural attributes have been incorporated to aid users to discriminate biologically relevant motifs from stochastically occurring non-functional instance
    corecore