20 research outputs found

    Reliable Single Chip Genotyping with Semi-Parametric Log-Concave Mixtures

    Get PDF
    The common approach to SNP genotyping is to use (model-based) clustering per individual SNP, on a set of arrays. Genotyping all SNPs on a single array is much more attractive, in terms of flexibility, stability and applicability, when developing new chips. A new semi-parametric method, named SCALA, is proposed. It is based on a mixture model using semi-parametric log-concave densities. Instead of using the raw data, the mixture is fitted on a two-dimensional histogram, thereby making computation time almost independent of the number of SNPs. Furthermore, the algorithm is effective in low-MAF situations. Comparisons between SCALA and CRLMM on HapMap genotypes show very reliable calling of single arrays. Some heterozygous genotypes from HapMap are called homozygous by SCALA and to lesser extent by CRLMM too. Furthermore, HapMap's NoCalls (NN) could be genotyped by SCALA, mostly with high probability. The software is available as R scripts from the website www.math.leidenuniv.nl/~rrippe

    Modelling trends in digit preference patterns

    Get PDF
    Digit preference is the habit of reporting certain end digits more often than others. If such a misreporting pattern is a concern, then measures to reduce digit preference can be taken and monitoring changes in digit preference becomes important. We propose a two-dimensional penalized composite link model to estimate the true distributions unaffected by misreporting, the digit preference pattern and a trend in the preference pattern simultaneously. A transfer pattern is superimposed on a series of smooth latent distributions and is modulated along a second dimension. Smoothness of the latent distributions is enforced by a roughness penalty. Ridge regression with an L1-penalty is used to extract the misreporting pattern, and an additional weighted least squares regression estimates the modulating trend vector. Smoothing parameters are selected by the Akaike information criterion. We present a simulation study and apply the model to data on birth weight and on self-reported weight of adults

    Bilinear modulation models for seasonal tables of counts

    Get PDF
    We propose generalized linear models for time or age-time tables of seasonal counts, with the goal of better understanding seasonal patterns in the data. The linear predictor contains a smooth component for the trend and the product of a smooth component (the modulation) and a periodic time series of arbitrary shape (the carrier wave). To model rates, a population offset is added. Two-dimensional trends and modulation are estimated using a tensor product B-spline basis of moderate dimension. Further smoothness is ensured using difference penalties on the rows and columns of the tensor product coefficients. The optimal penalty tuning parameters are chosen based on minimization of a quasi-information criterion. Computationally efficient estimation is achieved using array regression techniques, avoiding excessively large matrices. The model is applied to female death rate in the US due to cerebrovascular diseases and respiratory diseases

    GWAS on your notebook: Fast semi-parallel linear and logistic regression for genome-wide association studies

    Get PDF
    Background: Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy encounters the following computational issues: a large number of tests and very large genotype files (many Gigabytes) which cannot be directly loaded into the software memory. One of the solutions applied on a grand scale is cluster computing involving large-scale resources. We show how to speed up the computations using matrix operations in pure R code.Results: We improve speed: computation time from 6 hours is reduced to 10-15 minutes. Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them into computer memory becomes an important issue. However, much improvement can be made if the data is structured beforehand in a way allowing for easy access to blocks of SNPs. We propose several solutions based on the R packages ff and ncdf.We adapted the semi-parallel computations for logistic regression. We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures.Conclusions: We provide very fast algorithms for GWAS written in pure R code. We also show how to rearrange SNP data for fast access

    Risk factors and outcomes associated with first-trimester fetal growth restriction

    Get PDF
    Context: Adverse environmental exposures lead to developmental adaptations in fetal life. The influences of maternal physical characteristics and lifestyle habits on first-trimester fetal adaptations and the postnatal consequences are not known. Objective: To determine the risk factors and outcomes associated with firsttrimester growth restriction. Design, Setting, and Participants: Prospective evaluation of the associations of maternal physical characteristics and lifestyle habits with first-trimester fetal crown to rump length in 1631 mothers with a known and reliable first day of their last menstrual period and a regular menstrual cycle. Subsequently, we assessed the associations of first-trimester fetal growth restriction with the risks of adverse birth outcomes and postnatal growth acceleration until the age of 2 years. The study was based in Rotterdam, the Netherlands. Mothers were enrolled between 2001 and 2005. Main Outcome Measures: First-trimester fetal growth was measured as fetal crown to rump length by ultrasound between the gestational age of 10 weeks 0 days and 13 weeks 6 days. Main birth outcomes were preterm birth (gestational age <37 weeks), low birth weight (<2500 g), and small size for gestational age (lowest fifth birth centile). Postnatal growth was measured until the age of 2 years. Results In the multivariate analysis, maternal age was positively associated with firsttrimester fetal crown to rump length (difference per maternal year of age, 0.79 mm; 95% confidence interval [CI], 0.41 to 1.18 per standard deviation score increase). Higher diastolic blood pressure and higher hematocrit levels were associated with a shorter crown to rump length (differences, -0.40 mm; 95% CI, -0.74 to -0.06 and -0.52 mm; 95% CI, -0.90 to -0.14 per standard deviation increase, respectively). Compared with mothers who were nonsmokers and optimal users of folic acid supplements, those who both smoked and did not use folic acid supplements had shorter fetal crown to rump lengths (difference, -3.84 mm; 95% CI, -5.71 to -1.98). Compared with normal first-trimester fetal growth, first-trimester growth restriction was associated with increased risks of preterm birth (4.0% vs 7.2%; adjusted odds ratio [OR], 2.12; 95% CI, 1.24 to 3.61), low birth weight (3.5% vs 7.5%; adjusted OR, 2.42; 95% CI, 1.41 to 4.16), and small size for gestational age at birth (4.0% vs 10.6%; adjusted OR, 2.64; 95% CI, 1.64 to 4.25). Each standard deviation decrease in firsttrimester fetal crown to rump length was associated with a postnatal growth acceleration until the age of 2 years (standard deviation score increase, 0.139 per 2 years; 95% CI, 0.097 to 0.181). Conclusions Maternal physical characteristics and lifestyle habits were independently associated with early fetal growth. First-trimester fetal growth restriction was associated with an increased risk of adverse birth outcomes and growth acceleration in early childhood

    Genome-wide Analysis of Large-scale Longitudinal Outcomes using Penalization - GALLOP algorithm

    Get PDF
    Genome-wide association studies (GWAS) with longitudinal phenotypes provide opportunities to identify genetic variations associated with changes in human traits over time. Mixed models are used to correct for the correlated nature of longitudinal data. GWA studies are notorious for their computational challenges, which are considerable when mixed models for thousands of individuals are fitted to millions of SNPs. We present a new algorithm that speeds up a genome-wide analysis of longitudinal data by several orders of magnitude. It solves the equivalent penalized least squares problem efficiently, computing variances in an initial step. Factorizations and transformations are used to avoid inversion of large matrices. Because the system of equations is bordered, we can re-use components, which can be precomputed for the mixed model without a SNP. Two SNP effects (main and its interaction with time) are obtained. Our method completes the analysis a thousand times faster than the R package lme4, providing an almost identical solution for the coefficients and p-values. We provide an R implementation of our algorithm

    Gene expression profiles of gliomas in formalin-fixed paraffin-embedded material

    Get PDF
    Background: We have recently demonstrated that expression profiling is a more accurate and objective method to classify gliomas than histology. Similar to most expression profiling studies, our experiments were performed using fresh frozen (FF) glioma samples whereas most archival samples are fixed in formalin and embedded in paraffin (FFPE). Identification of the same, expression-based intrinsic subtypes in FFPE-stored samples would enable validation of the prognostic value of these subtypes on these archival samples. In this study, we have therefore determined whether the intrinsic subtypes identified using FF material can be reproduced in FFPE-stored samples.Methods: We have performed expression profiling on 55 paired FF-FFPE glioma samples using HU133 plus 2.0 arrays (FF) and Exon 1.0 ST arrays (FFPE). The median time in paraffin of the FFPE samples was 14.1 years (range 6.6-26.4 years). Results: In general, the correlation between FF and FFPE expression in a single sample was poor. We then selected the most variable probe sets per gene (n17 583), and of these, the 5000 most variable probe sets on FFPE expre

    Bayesian hierarchical modeling of longitudinal glaucomatous visual fields using a two-stage approach

    Get PDF
    The Bayesian approach has become increasingly popular because it allows to fit quite complex models to data via Markov chain Monte Carlo sampling. However, it is also recognized nowadays that Markov chain Monte Carlo sampling can become computationally prohibitive when applied to a large data set. We encountered serious computational difficulties when fitting an hierarchical model to longitudinal glaucoma data of patients who participate in an ongoing Dutch study. To overcome this problem, we applied and extended a recently proposed two-stage approach to model these data. Glaucoma is one of the leading causes of blindness in the world. In order to detect deterioration at an early stage, a model for predicting visual fields (VFs) in time is needed. Hence, the true underlying VF progression can be determined, and treatment strategies can then be optimized to prevent further VF loss. Because we were unable to fit these data with the classical one-stage approach upon which the current popular Bayesian software is based, we made use of the two-stage Bayesian approach. The considered hierarchical longitudinal model involves estimating a large number of random effects and deals with censoring and high measurement variability. In addition, we extended the approach with tools for model evaluation. Copyrigh

    MLPAinter for MLPA interpretation: An integrated approach for the analysis, visualisation and data management of Multiplex Ligation-dependent Probe Amplification

    Get PDF
    Background: Multiplex Ligation-Dependent Probe Amplification (MLPA) is an application that can be used for the detection of multiple chromosomal aberrations in a single experiment. In one reaction, up to 50 different genomic sequences can be analysed. For a reliable work-flow, tools are needed for administrative support, data management, normalisation, visualisation, reporting and interpretation.Results: Here, we developed a data management system, MLPAInter for MLPA interpretation, that is windows executable and has a stand-alone database for monitoring and interpreting the MLPA data stream that is generated from the experimental setup to analysis, quality control and visualisation. A statistical approach is applied for the normalisation and analysis of large series of MLPA traces, making use of multiple control samples and internal controls.Conclusions: MLPAinter visualises MLPA data in plots with information about sample replicates, normalisation settings, and sample characteristics. This integrated approach helps in the automated handling of large series of MLPA data and guarantees a quick and streamlined dataflow from the beginning of an experiment to an authorised report

    Epigenetic profiles in children with a neural tube defect; a case-control study in two populations

    Get PDF
    Folate deficiency is implicated in the causation of neural tube defects (NTDs). The preventive effect of periconceptional folic acid supplement use is partially explained by the treatment of a deranged folate-dependent one carbon metabolism, which provides methyl groups for DNA-methylation as an epigenetic mechanism. Here, we hypothesize that variations in DNA-methylation of genes implicated in the development of NTDs and embryonic growth are part of the underlying mechanism. In 48 children with a neural tube defect and 62 controls from a Dutch case-control study and 34 children with a neural tube defect and 78 controls from a Texan case-control study, we measured the DNA-methylation levels of imprinted candidate genes (IGF2-DMR, H19, KCNQ1OT1) and non-imprinted genes (the LEKR/CCNL gene region associated with birth weight, and MTHFR and VANGL1 associated with NTD). We used the MassARRAY EpiTYPER assay from Sequenom for the assessment of DNA-methylation. Linear mixed model analysis was used to estimate associations between DNA-methylation levels of the genes and a neural tube defect. In the Dutch study group, but not in the Texan study group we found a significant association between the risk of having an NTD and DNA methylation levels of MTHFR (absolute decrease in methylation of -0.33% in cases, P-value = 0.001), and LEKR/CCNL (absolute increase in methylation: 1.36% in cases, P-value = 0.048), and a borderline significant association for VANGL (absolute increase in methylation: 0.17% in cases, P-value = 0.063). Only the association between MTHFR and NTD-risk remained significant after multiple testing correction. The associations in the Dutch study were not replicated in the Texan study. We conclude that the associations between NTDs and the methylation of the MTHFR gene, and maybe VANGL and LEKKR/CNNL, are in line with previous studies showing polymorphisms in the same genes in association with NTDs and embryonic development, respectively
    corecore