907 research outputs found

    MissForest - nonparametric missing value imputation for mixed-type data

    Full text link
    Modern data acquisition based on high-throughput technology is often facing the problem of missing data. Algorithms commonly used in the analysis of such large-scale data often depend on a complete set. Missing value imputation offers a solution to this problem. However, the majority of available imputation methods are restricted to one type of variable only: continuous or categorical. For mixed-type data the different types are usually handled separately. Therefore, these methods ignore possible relations between variable types. We propose a nonparametric method which can cope with different types of variables simultaneously. We compare several state of the art methods for the imputation of missing values. We propose and evaluate an iterative imputation method (missForest) based on a random forest. By averaging over many unpruned classification or regression trees random forest intrinsically constitutes a multiple imputation scheme. Using the built-in out-of-bag error estimates of random forest we are able to estimate the imputation error without the need of a test set. Evaluation is performed on multiple data sets coming from a diverse selection of biological fields with artificially introduced missing values ranging from 10% to 30%. We show that missForest can successfully handle missing values, particularly in data sets including different types of variables. In our comparative study missForest outperforms other methods of imputation especially in data settings where complex interactions and nonlinear relations are suspected. The out-of-bag imputation error estimates of missForest prove to be adequate in all settings. Additionally, missForest exhibits attractive computational efficiency and can cope with high-dimensional data.Comment: Submitted to Oxford Journal's Bioinformatics on 3rd of May 201

    gap: Genetic Analysis Package

    Get PDF
    A preliminary attempt at collecting tools and utilities for genetic data as an R package called gap is described. Genomewide association is then described as a specific example, linking the work of Risch and Merikangas (1996), Long and Langley (1997) for family-based and population-based studies, and the counterpart for case-cohort design established by Cai and Zeng (2004). Analysis of staged design as outlined by Skol et al. (2006) and associate methods are discussed. The package is flexible, customizable, and should prove useful to researchers especially in its application to genomewide association studies.

    Evaluation of the relevance and impact of kinase dysfunction in neurological disorders through proteomics and phosphoproteomics bioinformatics

    Get PDF
    Phosphorylation is an important post-translational modification that is involved in various biological processes and its dysregulation has in particular been linked to diseases of the central nervous system including neurological disorders. The present thesis characterizes alterations in the phosphoproteome and protein abundance associated with schizophrenia and Parkinson's disease, with the goal of uncovering the underlying disease mechanisms. To support this goal, I eventually created an automated analysis pipeline in R to streamline the analysis process of proteomics and phosphoproteomics data. Mass spectrometry (MS) technology is utilized to generate proteomics and phosphoproteomics data. Study I of the thesis demonstrates an automated R pipeline, PhosPiR, created to perform multi-level functional analyses of MS data after the identification and quantification of the raw spectral data. The pipeline does not require coding knowledge to run. It supports 18 different organisms, and provides analyses of MS intensity data from preprocessing, normalization and imputation, through to figure overviews, statistical analysis, enrichment analysis, PTM-SEA, kinase prediction and activity analysis, network analysis, hub analysis, annotation mining, and homolog alignment. The LRRK2-G2019S mutation, a frequent genetic cause of late onset Parkinson's disease, was investigated in Study II and III. One study investigated the mechanism of LRRK2-G2019S function in brain, and the other identified proteins with significantly altered overall translation patterns in sporadic and LRRK2-G2019S patient samples. Specifically, study II identified that LRRK2 is localized to the small 40S ribosomal subunit and that LRRK2 activity suppresses RNA translation, as validated in cell and animal models of Parkinson's disease and in patient cells. Study III utilized bio-orthogonal non-canonical amino acid tagging to label newly translated proteins in order to identify which proteins were affected by repressed translation in patient samples, using mass spectrometry analysis. The analysis revealed 33 and 30 nascent proteins with reduced synthesis in sporadic and LRRK2-G2019S Parkinson’s cases, respectively. The biological process "cytosolic signal recognition particle (SRP)-dependent co-translational protein targeting to membrane" was functionally significantly affected in both sporadic and LRRK2-G2019S Parkinson's, while "Tubulin/FTsz C-terminal domain superfamily network" was only significantly enriched in LRRK2-G2019S Parkinson’s cases. The findings were validated bytargeted proteomics and immunoblotting. Study IV is conducted to investigate the role of JNK1 in schizophrenia. Wild type and Jnk1-/- mice were used to analyze the phosphorylation profile using LC-MS/MS analysis. 126 proteins associated with schizophrenia were identified to overlap with the significantly differentially phosphorylated proteins in Jnk1-/- mice brain. The NMDAR trafficking pathway was found to be highly enriched, and surface staining of NMDAR subunits in neurons showed that surface expression of both subunits in Jnk1-/- neurons was significantly decreased. Further behavioral tests conducted with MK801 treatment have associated the Jnk1-/- molecular and behavioral phenotype with schizophrenia and neuropsychiatric disease

    Parkinson's disease subtypes in the Oxford Parkinson disease centre (OPDC) discovery cohort

    Get PDF
    Background: Within Parkinson’s there is a spectrum of clinical features at presentation which may represent sub-types of the disease. However there is no widely accepted consensus of how best to group patients. Objective: Use a data-driven approach to unravel any heterogeneity in the Parkinson’s phenotype in a well-characterised, population-based incidence cohort. Methods: 769 consecutive patients, with mean disease duration of 1.3 years, were assessed using a broad range of motor, cognitive and non-motor metrics. Multiple imputation was carried out using the chained equations approach to deal with missing data. We used an exploratory and then a confirmatory factor analysis to determine suitable domains to include within our cluster analysis. K-means cluster analysis of the factor scores and all the variables not loading into a factor was used to determine phenotypic subgroups. Results: Our factor analysis found three important factors that were characterised by: psychological well-being features; non-tremor motor features, such as posture and rigidity; and cognitive features. Our subsequent five cluster model identified groups characterised by (1) mild motor and non-motor disease (25.4%), (2) poor posture and cognition (23.3%), (3) severe tremor (20.8%), (4) poor psychological well-being, RBD and sleep (18.9%), and (5) severe motor and non-motor disease with poor psychological well-being (11.7%). Conclusion: Our approach identified several Parkinson’s phenotypic sub-groups driven by largely dopaminergic-resistant features (RBD, impaired cognition and posture, poor psychological well-being) that, in addition to dopaminergic-responsive motor features may be important for studying the aetiology, progression, and medication response of early Parkinson’s

    Genomewide association study for onset age in Parkinson disease

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Age at onset in Parkinson disease (PD) is a highly heritable quantitative trait for which a significant genetic influence is supported by multiple segregation analyses. Because genes associated with onset age may represent invaluable therapeutic targets to delay the disease, we sought to identify such genetic modifiers using a genomewide association study in familial PD. There have been previous genomewide association studies (GWAS) to identify genes influencing PD susceptibility, but this is the first to identify genes contributing to the variation in onset age.</p> <p>Methods</p> <p>Initial analyses were performed using genotypes generated with the Illumina HumanCNV370Duo array in a sample of 857 unrelated, familial PD cases. Subsequently, a meta-analysis of imputed SNPs was performed combining the familial PD data with that from a previous GWAS of 440 idiopathic PD cases. The SNPs from the meta-analysis with the lowest p-values and consistency in the direction of effect for onset age were then genotyped in a replication sample of 747 idiopathic PD cases from the Parkinson Institute Biobank of Milan, Italy.</p> <p>Results</p> <p>Meta-analysis across the three studies detected consistent association (p < 1 × 10<sup>-5</sup>) with five SNPs, none of which reached genomewide significance. On chromosome 11, the SNP with the lowest p-value (rs10767971; p = 5.4 × 10<sup>-7</sup>) lies between the genes <it>QSER1 </it>and <it>PRRG4</it>. Near the PARK3 linkage region on chromosome 2p13, association was observed with a SNP (rs7577851; p = 8.7 × 10<sup>-6</sup>) which lies in an intron of the <it>AAK1 </it>gene. This gene is closely related to <it>GAK</it>, identified as a possible PD susceptibility gene in the GWAS of the familial PD cases.</p> <p>Conclusion</p> <p>Taken together, these results suggest an influence of genes involved in endocytosis and lysosomal sorting in PD pathogenesis.</p

    Identification of novel risk loci, causal insights, and heritable risk for Parkinson's disease: a meta-analysis of genome-wide association studies

    Get PDF
    Background: Genome-wide association studies (GWAS) in Parkinson's disease have increased the scope of biological knowledge about the disease over the past decade. We aimed to use the largest aggregate of GWAS data to identify novel risk loci and gain further insight into the causes of Parkinson's disease. / Methods: We did a meta-analysis of 17 datasets from Parkinson's disease GWAS available from European ancestry samples to nominate novel loci for disease risk. These datasets incorporated all available data. We then used these data to estimate heritable risk and develop predictive models of this heritability. We also used large gene expression and methylation resources to examine possible functional consequences as well as tissue, cell type, and biological pathway enrichments for the identified risk factors. Additionally, we examined shared genetic risk between Parkinson's disease and other phenotypes of interest via genetic correlations followed by Mendelian randomisation. / Findings: Between Oct 1, 2017, and Aug 9, 2018, we analysed 7·8 million single nucleotide polymorphisms in 37 688 cases, 18 618 UK Biobank proxy-cases (ie, individuals who do not have Parkinson's disease but have a first degree relative that does), and 1·4 million controls. We identified 90 independent genome-wide significant risk signals across 78 genomic regions, including 38 novel independent risk signals in 37 loci. These 90 variants explained 16–36% of the heritable risk of Parkinson's disease depending on prevalence. Integrating methylation and expression data within a Mendelian randomisation framework identified putatively associated genes at 70 risk signals underlying GWAS loci for follow-up functional studies. Tissue-specific expression enrichment analyses suggested Parkinson's disease loci were heavily brain-enriched, with specific neuronal cell types being implicated from single cell data. We found significant genetic correlations with brain volumes (false discovery rate-adjusted p=0·0035 for intracranial volume, p=0·024 for putamen volume), smoking status (p=0·024), and educational attainment (p=0·038). Mendelian randomisation between cognitive performance and Parkinson's disease risk showed a robust association (p=8·00 × 10−7). / Interpretation: These data provide the most comprehensive survey of genetic risk within Parkinson's disease to date, to the best of our knowledge, by revealing many additional Parkinson's disease risk loci, providing a biological context for these risk factors, and showing that a considerable genetic component of this disease remains unidentified. These associations derived from European ancestry datasets will need to be followed-up with more diverse data. / Funding: The National Institute on Aging at the National Institutes of Health (USA), The Michael J Fox Foundation, and The Parkinson's Foundation (see appendix for full list of funding sources)

    Oral health-related quality of life in patients with Parkinson’s disease

    Get PDF
    Background: Parkinson's disease (PD) is a neurodegenerative condition affecting the quality of life. Due to a worsening of oral health in PD patients with the progression of the disease, oral health-related quality of life (OHRQoL) could be impaired as well. Objectives: To assess whether PD patients in The Netherlands experience worse OHRQoL than historical controls, and to investigate which factors are associated with OHRQoL in PD patients. Materials & Methods: In total, 341 PD patients (65.5 ± 8.4 years) and 411 historical controls (62.6 ± 5.3 years) participated. Both groups completed a questionnaire. The PD patients were asked questions regarding demographics, PD, oral health, and OHRQoL. The historical controls filled in demographic information and questions regarding OHRQoL. The latter construct was assessed using the Dutch 14-item version of the Oral Health Impact Profile (OHIP-14). Data were analysed using independent samples t-tests and univariate and multivariate linear regression analysis. Results: The mean OHIP-14 score was higher in PD patients (19.1 ± 6.7) than in historical controls (16.5 ± 4.4) (t(239) = 6.5; p <.001). OHRQoL in PD patients was statistically significant associated with motor aspects of experiences of daily living (B = 0.31; t(315) = 7.03; p <.001), worsening of the oral environment during disease course (B = 3.39; t(315) = 4.21; p <.001), being dentate (B = −5.60; t(315) = −4.5; p <.001), tooth wear (B = 2.25; t(315) = 3.29; p =.001), and possible burning mouth syndrome (B = 5.87; t(315) = 2.87; p =.004). Conclusion: PD patients had a lower OHRQoL than historical controls. Besides, PD-related variables and oral health-related variables were associated with OHRQoL
    • …
    corecore