78 research outputs found

    Sparse reduced-rank regression for imaging genetics studies: models and applications

    Get PDF
    We present a novel statistical technique; the sparse reduced rank regression (sRRR) model which is a strategy for multivariate modelling of high-dimensional imaging responses and genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity in the regression coefficients, identifying subsets of genetic markers that best explain the variability observed in subsets of the phenotypes. To properly exploit the rich structure present in each of the imaging and genetics domains, we additionally propose the use of several structured penalties within the sRRR model. Using simulation procedures that accurately reflect realistic imaging genetics data, we present detailed evaluations of the sRRR method in comparison with the more traditional univariate linear modelling approach. In all settings considered, we show that sRRR possesses better power to detect the deleterious genetic variants. Moreover, using a simple genetic model, we demonstrate the potential benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to extracting averages over regions of interest in the brain. Since this entails the use of phenotypic vectors of enormous dimensionality, we suggest the use of a sparse classification model as a de-noising step, prior to the imaging genetics study. Finally, we present the application of a data re-sampling technique within the sRRR model for model selection. Using this approach we are able to rank the genetic markers in order of importance of association to the phenotypes, and similarly rank the phenotypes in order of importance to the genetic markers. In the very end, we illustrate the application perspective of the proposed statistical models in three real imaging genetics datasets and highlight some potential associations

    An epigenetic human cytomegalovirus infection score predicts viremia risk in seropositive lung transplant recipients.

    Get PDF
    Cytomegalovirus (CMV) infection and reactivation in solid organ transplant (SOT) recipients increases the risk of viremia, graft failure and death. Clinical studies of CMV serostatus indicate that donor positive recipient negative (D+/R-) patients have greater viremia risk than D-/R-. The majority of patients are R+ having intermediate serologic risk. To characterize the long-term impact of CMV infection and assess viremia risk, we sought to measure the effects of CMV on the recipient immune epigenome. Specifically, we profiled DNA methylation in 156 individuals before lung or kidney transplant. We found that the methylome of CMV positive SOT recipients is hyper-methylated at loci associated with neural development and Polycomb group (PcG) protein binding, and hypo-methylated at regions critical for the maturation of lymphocytes. In addition, we developed a machine learning-based model to predict the recipient CMV serostatus after correcting for cell type composition and ancestry. This CMV episcore measured at baseline in R+ individual stratifies viremia risk accurately in the lung transplant cohort, and along with serostatus the CMV episcore could be a potential biomarker for identifying R+ patients at high viremia risk

    Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model

    Full text link
    The expectation-maximization (EM) algorithm and its variants are widely used in statistics. In high-dimensional mixture linear regression, the model is assumed to be a finite mixture of linear regression and the number of predictors is much larger than the sample size. The standard EM algorithm, which attempts to find the maximum likelihood estimator, becomes infeasible for such model. We devise a group lasso penalized EM algorithm and study its statistical properties. Existing theoretical results of regularized EM algorithms often rely on dividing the sample into many independent batches and employing a fresh batch of sample in each iteration of the algorithm. Our algorithm and theoretical analysis do not require sample-splitting, and can be extended to multivariate response cases. The proposed methods also have encouraging performances in numerical studies

    Comparative Study of the Distribution of Repetitive DNA in Model Organisms

    Get PDF
    Repetitive DNA elements are abundant in the genome of a wide range of organisms. In mammals, repetitive elements comprise about 40-50% of the total genomes. However, their biological functions remain largely unknown. Analysis of their abundance and distribution may shed some light on how they affect genome structure, function, and evolution. We conducted a detailed comparative analysis of repetitive DNA elements across ten different eukaryotic organisms, including chicken (G. gallus), zebrafish (D. rerio), Fugu (T. rubripes), fruit fly (D. melanogaster), and nematode worm (C. elegans), along with five mammalian organisms: human (H. sapiens), mouse (M. musculus), cow (B. taurus), rat (R. norvegicus), and rhesus (M. mulatta). Our results show that repetitive DNA content varies widely, from 7.3% in the Fugu genome to 52% in the zebrafish, based on RepeatMasker data. The most frequently observed transposable elements (TEs) in mammals are SINEs (Short Interspersed Nuclear Elements), followed by LINEs (Long Interspersed Nuclear Elements). In contrast, LINEs, DNA transposons, simple repeats, and low complexity repeats are the most frequently observed repeat classes in the chicken, zebrafish, fruit fly, and nematode worm genomes, respectively. LTRs (Long Terminal Repeats) have significant genomic coverage and diversity, which may make them suitable for regulatory roles. With the exception of the nematode worm and fruit fly, the frequency of the repetitive elements follows a log-normal distribution, characterized by a few highly prevalent repeats in each organism. In mammals, SINEs are enriched near genic regions, and LINEs are often found away from genes. We also identified many LTRs that are specifically enriched in promoter regions, some with a strong bias towards the same strand as the nearby gene. This raises the possibility that the LTRs may play a regulatory role. Surprisingly, most intronic repeats, with the exception of DNA transposons, have a strong tendency to be on the opposite DNA strand as the host gene. One possible explanation is that intronic RNAs which result from splicing may contribute to retrotransposition to the original intronic loci. Moreover, our observations of repetitive DNA elements enrichment near genic regions and, specifically, the promoter region of genes, raise the question as to whether repetitive DNA elements have a significant impact on gene expression in both human and mouse genomes. In order to investigate the impact of these repeats on gene expression, we calculate the total number of base pairs (bp) for these repeats in two different locations upstream from the genes — namely, the 2kbp and 20kbp promoter regions. In addition to that, we quantified the gene expression levels in both human and mouse tissues using RNAseq analysis. Then, we used different statistical modeling approaches to investigate the association between repetitive DNA elements and gene expression in two different promoter regions. Although most transposable elements are primarily involved in reduced gene expression, our model\u27s results showed that Alu elements in both human and mouse are significantly associated with higher average expression in the promoter region. Furthermore, we found that the B2 in both mouse 2kbp and 20kbp and hAT.Charlie elements in the human 20kbp, are also significantly associated with up-regulated gene expression in the 2kpb promoter. In addition to Alu and B2 in 2kbp, we found that the ERV1 have a significant association with higher average expression in the 20kbp promoter in mouse tissues. We also found that L1 and Simple_repeat elements are significantly associated with lower average expression in both human and mouse tissues. Furthermore, in the human, we found that the MIR is also associated with lower average expression. The effects of Alu elements in both human and mouse are stronger at 2kbp than at 20kbp. In contrast, the L1 effect at 20kbp is stronger than at 2kbp. Our results indicate that comparative studies of repetitive DNA elements in multiple organisms can provide insights into their evolution and expansion, and lead to the elucidation of their potential functions. The non-random distribution of repeats across multiple organisms adds to the existing evidence that some repetitive DNA elements are drivers of genome evolution, rather than just “junk” DNA

    The construction of a partial least squares biplot

    Get PDF
    Includes bibliographical references.In multivariate analysis, data matrices are often very large, which sometimes makes it difficult to describe their structure and to make a visual inspection of the relationship between their respective rows (samples) and columns (variables). For this reason, biplots, the joint graphical display of the rows and columns of a data matrix, can be useful tools for analysis. Since they were first introduced, biplots have been employed in a number of multivariate methods, such as Correspondence Analysis (CA), Principal Component Analysis (PCA), Canonical Variate Analysis (CVA) and Discriminant Analysis (DA), as a form of graphical display of data. Another possible employment is in Partial Least Squares (PLS). First introduced as a regression method, PLS is more flexible than multivariate regression, but better suited than Principal Component Regression (PCR) for the prediction of a set of response variables from a large set of predictor variables. Employing the biplot in PLS gave rise to the PLS biplot, a new addition to the biplot family. In the current study, this biplot was successfully applied to the sensory data to investigate the relationships between the sensory panel characteristics and the chemical quality measurements of sixteen olive oils. It was also applied to a large set of mineral sorting production data to investigate the relationships between the output variables and the process factors used to produce a final product. Furthermore, the PLS biplot was applied to a Binomialdistributed data concerning the diabetes testing of Indian women and to a Poisson-distributed data showing the diversity of arboreal marsupials (possum) in the Montane ash forest. After these applications, it is proposed that the PLS biplot is a useful graphical tool for displaying results from the (univariate) Partial Least Squares-Generalized Linear Model (PLS-GLM) analysis of a data set. With Partial Least Squares Regression (PLSR) being a valuable method for modelling high-dimensional data, especially in chemometrics, the PLS biplot was successfully applied to a cereal evaluation containing one hundred and forty five infrared spectra and six chemical properties, and a gene expression data with two thousand genes

    Fitness is positively associated with hippocampal formation subfield volumes in schizophrenia: a multiparametric magnetic resonance imaging study

    Get PDF
    Hippocampal formation (HF) volume loss is a well-established finding in schizophrenia, with select subfields, such as the cornu ammonis and dentate gyrus, being particularly vulnerable. These morphologic alterations are related to functional abnormalities and cognitive deficits, which are at the core of the insufficient recovery frequently seen in this illness. To counteract HF volume decline, exercise to improve aerobic fitness is considered as a promising intervention. However, the effects of aerobic fitness levels on HF subfields are not yet established in individuals with schizophrenia. Therefore, our study investigated potential associations between aerobic fitness and HF subfield structure, functional connectivity, and related cognitive impact in a multiparametric research design. In this cross-sectional study, 53 participants diagnosed with schizophrenia (33 men, 20 women; mean [SD] age, 37.4 [11.8] years) underwent brain structural and functional magnetic resonance imaging and assessments of aerobic fitness and verbal memory. Multivariate multiple linear regressions were performed to determine whether aerobic fitness was associated with HF subfield volumes and functional connections. In addition, we explored whether identified associations mediated verbal memory functioning. Significant positive associations between aerobic fitness levels and volumes were demonstrated for most HF subfields, with the strongest associations for the cornu ammonis, dentate gyrus, and subiculum. No significant associations were found for HF functional connectivity or mediation effects on verbal memory. Aerobic fitness may mitigate HF volume loss, especially in the subfields most affected in schizophrenia. This finding should be further investigated in longitudinal studies

    Self-Care, Anticipated Stigma, and Personal Therapy in Mental Health Professional Trainees

    Get PDF
    Self-care has increasingly become encouraged as a means for maintaining well-being for mental health professionals; yet, there exists an unsettling lack of research and guidance on this topic for those within the field (Callan et al., 2021; Colman et al., 2016; Norcross & VandenBos, 2018). This has led to call for change and reform to recognize the importance of self-care as an ethical imperative and to incorporate it within the education and training of mental health professionals (Barnett et al., 2007; Barnett & Cooper, 2009; Wise & Reuman, 2019; Zahniser et al., 2017). These calls for reform and the increased importance of self-care have only grown given the realities of the strains included within the work that mental health professionals do and the increased stress placed on the field from the COVID-19 worldwide pandemic (El-Ghoroury et al., 2012; Posluns & Gall, 2020; Sciberras & Pilkington, 2018). Given the need for research on self-care and ways to implement it combined with the lack of prior research, the current research set out to contribute quantitative research on areas related to self-care for mental health professional trainees. The first purpose was to determine how much of the variation in the five factors of self-care was explained by anticipated stigma and attendance in personal therapy. The second purpose was to determine the contribution of both anticipated stigma and personal therapy separately on the variation within self-care. The third purpose was to determine if there was a difference in self-care between mental health professional trainee groups who had experienced personal therapy. In the current study, the Self-Care Assessment for Psychologists was used (Dorociak, Rupert, Bryant, et al., 2017). The other variables of interest anticipated stigma and attendance in personal therapy were measured by the Anticipated Stigma Scale ( Quinn & Chaudoir, 2009; Quinn et al., 2014) and having participants detail their therapy experience similarly to what prior researchers had done (Bike et al., 2009; Byrne & Ost, 2016; Byrne & Shufelt, 2014; Geller et al., 2005; Kalkbrenner & Neukrug, 2019; Kalkbrenner et al., 2019; Norcross, 2005; Norcross et al., 2008; Orlinsky et al., 2011; Ziede & Norcross, 2020). A multivariate multiple linear regression was used to analyze the data of 100 participants (Keith, 2019; Remler & Van Ryzin, 2015; Rencher & Christensen, 2012). The results did not provide any evidence that anticipated stigma and personal therapy explained a significant amount of the variation within self-care for mental health professional trainees; no evidence was found for either of the variables separately nor was there evidence found for a difference between groups of those who did and did not attend therapy. Theoretical, research, and clinical implications are discussed suggesting how further inquiry might be conducted to better understand self-care for the mental health trainee population
    corecore