134 research outputs found
Applications and extensions of Random Forests in genetic and environmental studies
Transcriptional regulation refers to the molecular systems that control the concentration of mRNA species within the cell. Variation in these controlling systems is not only responsible for many diseases, but also contributes to the vast phenotypic diversity in the biological world. There are powerful experimental approaches to probe these regulatory systems, and the focus of my doctoral research has been to develop and apply effective computational methods that exploit these rich data sets more completely. First, I present a method for mapping genetic regulators of gene expression (expression quantitative trait loci, or eQTL) using Random Forests. This approach allows for flexible modeling and feature selection, and results in eQTL that are more biologically supportable than those mapped with competing methods. Next, I present a method that finds interactions between genes that in turn regulate the expression of other genes. This is accomplished by finding recurring decision motifs in the forest structure that represent dependencies between genetic loci. Third, I present a method to use distributional differences in eQTL data to establish the regulatory roles of genes relative to other disease-associated genes. Using this method, we found that genes that are master regulators of other disease genes are more likely to be consistently associated with the disease in genetic association studies. Finally, I present a novel application of Random Forests to determine the mode of regulation of toxin-perturbed genes, using time-resolved gene expression. The results demonstrate a novel approach to supervised weighted clustering of gene expression data
Data-driven assessment of eQTL mapping methods
<p>Abstract</p> <p>Background</p> <p>The analysis of expression quantitative trait loci (eQTL) is a potentially powerful way to detect transcriptional regulatory relationships at the genomic scale. However, eQTL data sets often go underexploited because legacy QTL methods are used to map the relationship between the expression trait and genotype. Often these methods are inappropriate for complex traits such as gene expression, particularly in the case of epistasis.</p> <p>Results</p> <p>Here we compare legacy QTL mapping methods with several modern multi-locus methods and evaluate their ability to produce eQTL that agree with independent external data in a systematic way. We found that the modern multi-locus methods (Random Forests, sparse partial least squares, lasso, and elastic net) clearly outperformed the legacy QTL methods (Haley-Knott regression and composite interval mapping) in terms of biological relevance of the mapped eQTL. In particular, we found that our new approach, based on Random Forests, showed superior performance among the multi-locus methods.</p> <p>Conclusions</p> <p>Benchmarks based on the recapitulation of experimental findings provide valuable insight when selecting the appropriate eQTL mapping method. Our battery of tests suggests that Random Forests map eQTL that are more likely to be validated by independent data, when compared to competing multi-locus and legacy eQTL mapping methods.</p
p53-mediated neurodegeneration in the absence of the nuclear protein Akirin2.
Proper gene regulation is critical for both neuronal development and maintenance as the brain matures. We previously demonstrated that Akirin2, an essential nuclear protein that interacts with transcription factors and chromatin remodeling complexes, is required for the embryonic formation of the cerebral cortex. Here we show that Akirin2 plays a mechanistically distinct role in maintaining healthy neurons during cortical maturation. Restricting Akirin2 loss to excitatory cortical neurons resulted in progressive neurodegeneration via necroptosis and severe cortical atrophy with age. Comparing transcriptomes from Akirin2-null postnatal neurons and cortical progenitors revealed that targets of the tumor suppressor p53, a regulator of both proliferation and cell death encoded b
Identification of Stage‐Specific Genes Associated With Lupus Nephritis and Response to Remission Induction in (NZB × NZW)F1 and NZM2410 Mice
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/108024/1/art38679.pd
Integrative Analysis of Low- and High-Resolution eQTL
The study of expression quantitative trait loci (eQTL) is a powerful way of detecting transcriptional regulators at a genomic scale and for elucidating how natural genetic variation impacts gene expression. Power and genetic resolution are heavily affected by the study population: whereas recombinant inbred (RI) strains yield greater statistical power with low genetic resolution, using diverse inbred or outbred strains improves genetic resolution at the cost of lower power. In order to overcome the limitations of both individual approaches, we combine data from RI strains with genetically more diverse strains and analyze hippocampus eQTL data obtained from mouse RI strains (BXD) and from a panel of diverse inbred strains (Mouse Diversity Panel, MDP). We perform a systematic analysis of the consistency of eQTL independently obtained from these two populations and demonstrate that a significant fraction of eQTL can be replicated. Based on existing knowledge from pathway databases we assess different approaches for using the high-resolution MDP data for fine mapping BXD eQTL. Finally, we apply this framework to an eQTL hotspot on chromosome 1 (Qrr1), which has been implicated in a range of neurological traits. Here we present the first systematic examination of the consistency between eQTL obtained independently from the BXD and MDP populations. Our analysis of fine-mapping approaches is based on ‘real life’ data as opposed to simulated data and it allows us to propose a strategy for using MDP data to fine map BXD eQTL. Application of this framework to Qrr1 reveals that this eQTL hotspot is not caused by just one (or few) ‘master regulators’, but actually by a set of polymorphic genes specific to the central nervous system
Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism
Increased risk for autism spectrum disorders (ASD) is attributed to hundreds of genetic loci. The convergence of ASD variants have been investigated using various approaches, including protein interactions extracted from the published literature. However, these datasets are frequently incomplete, carry biases and are limited to interactions of a single splicing isoform, which may not be expressed in the disease-relevant tissue. Here we introduce a new interactome mapping approach by experimentally identifying interactions between brain-expressed alternatively spliced variants of ASD risk factors. The Autism Spliceform Interaction Network reveals that almost half of the detected interactions and about 30% of the newly identified interacting partners represent contribution from splicing variants, emphasizing the importance of isoform networks. Isoform interactions greatly contribute to establishing direct physical connections between proteins from the de novo autism CNVs. Our findings demonstrate the critical role of spliceform networks for translating genetic knowledge into a better understanding of human diseases
Language and reading impairments are associated with increased prevalence of non-right handedness
Funding: Royal Society - UF150663, RGF\EA\180141; Wellcome Trust - 217065/Z/19/Z; H2020 European Research Council - 694189; NWO - 451-15-017; National Health and Medical Research Council - 1173896; Canadian Institute for Health Research - MOP-133440.Handedness has been studied for association with language-related disorders because of its link with language hemispheric dominance. No clear pattern has emerged, possibly because of small samples, publication bias, and heterogeneous criteria across studies. Non-right-handedness (NRH) frequency was assessed in N = 2503 cases with reading and/or language impairment and N = 4316 sex-matched controls identified from 10 distinct cohorts (age range 6–19 years old; European ethnicity) using a priori set criteria. A meta-analysis (Ncases = 1994) showed elevated NRH % in individuals with language/reading impairment compared with controls (OR = 1.21, CI = 1.06–1.39, p = .01). The association between reading/language impairments and NRH could result from shared pathways underlying brain lateralization, handedness, and cognitive functions.Publisher PDFPeer reviewe
Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people
The use of spoken and written language is a fundamental human capacity. Individual differences in reading- and language-related skills are influenced by genetic variation, with twin-based heritability estimates of 30 to 80% depending on the trait. The genetic architecture is complex, heterogeneous, and multifactorial, but investigations of contributions of single-nucleotide polymorphisms (SNPs) were thus far underpowered. We present a multicohort genome-wide association study (GWAS) of five traits assessed individually using psychometric measures (word reading, nonword reading, spelling, phoneme awareness, and nonword repetition) in samples of 13,633 to 33,959 participants aged 5 to 26 y. We identified genome-wide significant association with word reading (rs11208009, P = 1.098 x 10(-8)) at a locus that has not been associated with intelligence or educational attainment. All five reading-/language-related traits showed robust SNP heritability, accounting for 13 to 26% of trait variability. Genomic structural equation modeling revealed a shared genetic factor explaining most of the variation in word/nonword reading, spelling, and phoneme awareness, which only partially overlapped with genetic variation contributing to nonword repetition, intelligence, and educational attainment. A multivariate GWAS of word/nonword reading, spelling, and phoneme awareness maximized power for follow-up investigation. Genetic correlation analysis with neuroimaging traits identified an association with the surface area of the banks of the left superior temporal sulcus, a brain region linked to the processing of spoken and written language. Heritability was enriched for genomic elements regulating gene expression in the fetal brain and in chromosomal regions that are depleted of Neanderthal variants. Together, these results provide avenues for deciphering the biological underpinnings of uniquely human traits.Peer reviewe
- …