19,531 research outputs found

    Gene expression in large pedigrees: analytic approaches.

    Get PDF
    BackgroundWe currently have the ability to quantify transcript abundance of messenger RNA (mRNA), genome-wide, using microarray technologies. Analyzing genotype, phenotype and expression data from 20 pedigrees, the members of our Genetic Analysis Workshop (GAW) 19 gene expression group published 9 papers, tackling some timely and important problems and questions. To study the complexity and interrelationships of genetics and gene expression, we used established statistical tools, developed newer statistical tools, and developed and applied extensions to these tools.MethodsTo study gene expression correlations in the pedigree members (without incorporating genotype or trait data into the analysis), 2 papers used principal components analysis, weighted gene coexpression network analysis, meta-analyses, gene enrichment analyses, and linear mixed models. To explore the relationship between genetics and gene expression, 2 papers studied expression quantitative trait locus allelic heterogeneity through conditional association analyses, and epistasis through interaction analyses. A third paper assessed the feasibility of applying allele-specific binding to filter potential regulatory single-nucleotide polymorphisms (SNPs). Analytic approaches included linear mixed models based on measured genotypes in pedigrees, permutation tests, and covariance kernels. To incorporate both genotype and phenotype data with gene expression, 4 groups employed linear mixed models, nonparametric weighted U statistics, structural equation modeling, Bayesian unified frameworks, and multiple regression.Results and discussionRegarding the analysis of pedigree data, we found that gene expression is familial, indicating that at least 1 factor for pedigree membership or multiple factors for the degree of relationship should be included in analyses, and we developed a method to adjust for familiality prior to conducting weighted co-expression gene network analysis. For SNP association and conditional analyses, we found FaST-LMM (Factored Spectrally Transformed Linear Mixed Model) and SOLAR-MGA (Sequential Oligogenic Linkage Analysis Routines -Major Gene Analysis) have similar type 1 and type 2 errors and can be used almost interchangeably. To improve the power and precision of association tests, prior knowledge of DNase-I hypersensitivity sites or other relevant biological annotations can be incorporated into the analyses. On a biological level, eQTL (expression quantitative trait loci) are genetically complex, exhibiting both allelic heterogeneity and epistasis. Including both genotype and phenotype data together with measurements of gene expression was found to be generally advantageous in terms of generating improved levels of significance and in providing more interpretable biological models.ConclusionsPedigrees can be used to conduct analyses of and enhance gene expression studies

    TREEOME: A framework for epigenetic and transcriptomic data integration to explore regulatory interactions controlling transcription

    Get PDF
    Motivation: Predictive modelling of gene expression is a powerful framework for the in silico exploration of transcriptional regulatory interactions through the integration of high-throughput -omics data. A major limitation of previous approaches is their inability to handle conditional and synergistic interactions that emerge when collectively analysing genes subject to different regulatory mechanisms. This limitation reduces overall predictive power and thus the reliability of downstream biological inference. Results: We introduce an analytical modelling framework (TREEOME: tree of models of expression) that integrates epigenetic and transcriptomic data by separating genes into putative regulatory classes. Current predictive modelling approaches have found both DNA methylation and histone modification epigenetic data to provide little or no improvement in accuracy of prediction of transcript abundance despite, for example, distinct anti-correlation between mRNA levels and promoter-localised DNA methylation. To improve on this, in TREEOME we evaluate four possible methods of formulating gene-level DNA methylation metrics, which provide a foundation for identifying gene-level methylation events and subsequent differential analysis, whereas most previous techniques operate at the level of individual CpG dinucleotides. We demonstrate TREEOME by integrating gene-level DNA methylation (bisulfite-seq) and histone modification (ChIP-seq) data to accurately predict genome-wide mRNA transcript abundance (RNA-seq) for H1-hESC and GM12878 cell lines. Availability: TREEOME is implemented using open-source software and made available as a pre-configured bootable reference environment. All scripts and data presented in this study are available online at http://sourceforge.net/projects/budden2015treeome/.Comment: 14 pages, 6 figure

    Grey-matter texture abnormalities and reduced hippocampal volume are distinguishing features of schizophrenia

    Get PDF
    Neurodevelopmental processes are widely believed to underlie schizophrenia. Analysis of brain texture from conventional magnetic resonance imaging (MRI) can detect disturbance in brain cytoarchitecture. We tested the hypothesis that patients with schizophrenia manifest quantitative differences in brain texture that, alongside discrete volumetric changes, may serve as an endophenotypic biomarker. Texture analysis (TA) of grey matter distribution and voxel-based morphometry (VBM) of regional brain volumes were applied to MRI scans of 27 patients with schizophrenia and 24 controls. Texture parameters (uniformity and entropy) were also used as covariates in VBM analyses to test for correspondence with regional brain volume. Linear discriminant analysis tested if texture and volumetric data predicted diagnostic group membership (schizophrenia or control). We found that uniformity and entropy of grey matter differed significantly between individuals with schizophrenia and controls at the fine spatial scale (filter width below 2 mm). Within the schizophrenia group, these texture parameters correlated with volumes of the left hippocampus, right amygdala and cerebellum. The best predictor of diagnostic group membership was the combination of fine texture heterogeneity and left hippocampal size. This study highlights the presence of distributed grey-matter abnormalities in schizophrenia, and their relation to focal structural abnormality of the hippocampus. The conjunction of these features has potential as a neuroimaging endophenotype of schizophrenia

    Combined population dynamics and entropy modelling supports patient stratification in chronic myeloid leukemia

    Get PDF
    Modelling the parameters of multistep carcinogenesis is key for a better understanding of cancer progression, biomarker identification and the design of individualized therapies. Using chronic myeloid leukemia (CML) as a paradigm for hierarchical disease evolution we show that combined population dynamic modelling and CML patient biopsy genomic analysis enables patient stratification at unprecedented resolution. Linking CD34+ similarity as a disease progression marker to patientderived gene expression entropy separated established CML progression stages and uncovered additional heterogeneity within disease stages. Importantly, our patient data informed model enables quantitative approximation of individual patients’ disease history within chronic phase (CP) and significantly separates “early” from “late” CP. Our findings provide a novel rationale for personalized and genome-informed disease progression risk assessment that is independent and complementary to conventional measures of CML disease burden and prognosis

    Automated Discrimination of Pathological Regions in Tissue Images: Unsupervised Clustering vs Supervised SVM Classification

    Get PDF
    Recognizing and isolating cancerous cells from non pathological tissue areas (e.g. connective stroma) is crucial for fast and objective immunohistochemical analysis of tissue images. This operation allows the further application of fully-automated techniques for quantitative evaluation of protein activity, since it avoids the necessity of a preventive manual selection of the representative pathological areas in the image, as well as of taking pictures only in the pure-cancerous portions of the tissue. In this paper we present a fully-automated method based on unsupervised clustering that performs tissue segmentations highly comparable with those provided by a skilled operator, achieving on average an accuracy of 90%. Experimental results on a heterogeneous dataset of immunohistochemical lung cancer tissue images demonstrate that our proposed unsupervised approach overcomes the accuracy of a theoretically superior supervised method such as Support Vector Machine (SVM) by 8%

    MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification

    Full text link
    Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structures, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate isoform quantification from RNA-seq data is challenging due to the information loss in sequencing experiments. A recent accumulation of multiple RNA-seq data sets from the same tissue or cell type provides new opportunities to improve the accuracy of isoform quantification. However, existing statistical or computational methods for multiple RNA-seq samples either pool the samples into one sample or assign equal weights to the samples when estimating isoform abundance. These methods ignore the possible heterogeneity in the quality of different samples and could result in biased and unrobust estimates. In this article, we develop a method, which we call "joint modeling of multiple RNA-seq samples for accurate isoform quantification" (MSIQ), for more accurate and robust isoform quantification by integrating multiple RNA-seq samples under a Bayesian framework. Our method aims to (1) identify a consistent group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples by allowing for higher weights on the consistent group. We show that MSIQ provides a consistent estimator of isoform abundance, and we demonstrate the accuracy and effectiveness of MSIQ compared with alternative methods through simulation studies on D. melanogaster genes. We justify MSIQ's advantages over existing approaches via application studies on real RNA-seq data from human embryonic stem cells, brain tissues, and the HepG2 immortalized cell line
    • 

    corecore