87 research outputs found

    Does pathway analysis make it easier for common variants to tag rare ones?

    Get PDF
    Analyzing sequencing data is difficult because of the low frequency of rare variants, which may result in low power to detect associations. We consider pathway analysis to detect multiple common and rare variants jointly and to investigate whether analysis at the pathway level provides an alternative strategy for identifying susceptibility genes. Available pathway analysis methods for data from genome-wide association studies might not be efficient because these methods are designed to detect common variants. Here, we investigate the performance of several existing pathway analysis methods for sequencing data. In particular, we consider the global test, which does not consider linkage disequilibrium between the variants in a gene. We improve the performance of the global test by assigning larger weights to rare variants, as proposed in the weighted-sum approach. Our conclusion is that straightforward application of pathway analysis is not satisfactory; hence, when common and rare variants are jointly analyzed, larger weights should be assigned to rare variants

    Survival analysis with delayed entry in selected families with application to human longevity

    Get PDF
    In the field of aging research, family-based sampling study designs are commonly used to study the lifespans of long-lived family members. However, the specific sampling procedure should be carefully taken into account in order to avoid biases. This work is motivated by the Leiden Longevity Study, a family-based cohort of long-lived siblings. Families were invited to participate in the study if at least two siblings were ‘long-lived’, where ‘long-lived’ meant being older than 89 years for men or older than 91 years for women. As a result, more than 400 families were included in the study and followed for around 10 years. For estimation of marker-specific survival probabilities and correlations among life times of family members, delayed entry due to outcome-dependent sampling mechanisms has to be taken into account. We consider shared frailty models to model left-truncated correlated survival data. The treatment of left truncation in shared frailty models is still an open issue and the literature on this topic is scarce. We show that the current approaches provide, in general, biased estimates and we propose a new method to tackle this selection problem by applying a correction on the likelihood estimation by means of inverse probability weighting at the family level

    Sequential double cross-validation for assessment of added predictive ability in high-dimensional omic applications

    Get PDF
    Enriching existing predictive models with new biomolecular markers is an important task in the new multi-omic era. Clinical studies increasingly include new sets of omic measurements which may prove their added value in terms of predictive performance. We introduce a two-step approach for the assessment of the added predictive ability of omic predictors, based on sequential double cross-validation and regularized regression models. We propose several performance indices to summarize the two-stage prediction procedure and a permutation test to formally assess the added predictive value of a second omic set of predictors over a primary omic source. The performance of the test is investigated through simulations. We illustrate the new method through the systematic assessment and comparison of the performance of transcriptomics and metabolomics sources in the prediction of body mass index (BMI) using longitudinal data from the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study, a population-based cohort from Finland

    Estimating Constraints for Protection Factors from HDX-MS Data

    Get PDF
    Hydrogen/deuterium exchange monitored by mass spectrometry is a promising technique for rapidly fingerprinting structural and dynamical properties of proteins. The time-dependent change in the mass of any fragment of the polypeptide chain depends uniquely on the rate of exchange of its amide hydrogens, but determining the latter from the former is generally not possible. Here, we show that, if time-resolved measurements are available for a number of overlapping peptides that cover the whole sequence, rate constants for each amide hydrogen exchange (or equivalently, their protection factors) may be extracted and the uniqueness of the solutions obtained depending on the degree of peptide overlap. However, in most cases, the solution is not unique, and multiple alternatives must be considered. We provide a statistical method that clusters the solutions to further reduce their number. Such analysis always provides meaningful constraints on protection factors and can be used in situations in which obtaining more refined experimental data is impractical. It also provides a systematic way to improve data collection strategies to obtain unambiguous information at single-residue level (e.g., for assessing protein structure predictions at atomistic level)

    The mixed model for the analysis of a repeated‐measurement multivariate count data

    Get PDF
    Clustered overdispersed multivariate count data are challenging to model due to the presence of correlation within and between samples. Typically, the first source of correlation needs to be addressed but its quantification is of less interest. Here, we focus on the correlation between time points. In addition, the effects of covariates on the multivariate counts distribution need to be assessed. To fulfill these requirements, a regression model based on the Dirichlet‐multinomial distribution for association between covariates and the categorical counts is extended by using random effects to deal with the additional clustering. This model is the Dirichlet‐multinomial mixed regression model. Alternatively, a negative binomial regression mixed model can be deployed where the corresponding likelihood is conditioned on the total count. It appears that these two approaches are equivalent when the total count is fixed and independent of the random effects. We consider both subject‐specific and categorical‐specific random effects. However, the latter has a larger computational burden when the number of categories increases. Our work is motivated by microbiome data sets obtained by sequencing of the amplicon of the bacterial 16S rRNA gene. These data have a compositional structure and are typically overdispersed. The microbiome data set is from an epidemiological study carried out in a helminth‐endemic area in Indonesia. The conclusions are as follows: time has no statistically significant effect on microbiome composition, the correlation between subjects is statistically significant, and treatment has a significant effect on the microbiome composition only in infected subjects who remained infected

    Genetic, household and spatial clustering of leprosy on an island in Indonesia: a population-based study

    Get PDF
    BACKGROUND: It is generally accepted that genetic factors play a role in susceptibility to both leprosy per se and leprosy type, but only few studies have tempted to quantify this. Estimating the contribution of genetic factors to clustering of leprosy within families is difficult since these persons often share the same environment. The first aim of this study was to test which correlation structure (genetic, household or spatial) gives the best explanation for the distribution of leprosy patients and seropositive persons and second to quantify the role of genetic factors in the occurrence of leprosy and seropositivity. METHODS: The three correlation structures were proposed for population data (n = 560), collected on a geographically isolated island highly endemic for leprosy, to explain the distribution of leprosy per se, leprosy type and persons harbouring Mycobacterium leprae-specific antibodies. Heritability estimates and risk ratios for siblings were calculated to quantify the genetic effect. Leprosy was clinically diagnosed and specific anti-M. leprae antibodies were measured using ELISA. RESULTS: For leprosy per se in the total population the genetic correlation structure fitted best. In the population with relative stable household status (persons under 21 years and above 39 years) all structures were significant. For multibacillary leprosy (MB) genetic factors seemed more important than for paucibacillary leprosy. Seropositivity could be explained best by the spatial model, but the genetic model was also significant. Heritability was 57% for leprosy per se and 31% for seropositivity. CONCLUSION: Genetic factors seem to play an important role in the clustering of patients with a more advanced form of leprosy, and they could explain more than half of the total phenotypic variance
    • …
    corecore