34 research outputs found

    Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

    Get PDF
    The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct annotation is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3-prime untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3-prime polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3-prime UTR re-annotation (including extension of one 3-prime UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental dataComment: 44 pages, 9 figure

    Intra-Organ Variation in Age-Related Mutation Accumulation in the Mouse

    Get PDF
    Using a transgenic mouse model harboring chromosomally integrated lacZ mutational target genes, we previously demonstrated that mutations accumulate with age much more rapidly in the small intestine than in the brain. Here it is shown that in the small intestine point mutations preferentially accumulate in epithelial cells of the mucosa scraped off the underlying serosa. The mucosal cells are the differentiated villus cells that have undergone multiple cell divisions. A smaller age-related increase, also involving genome rearrangements, was observed in the serosa, which consists mainly of the remaining crypts and non-dividing smooth muscle cells. In the brain we observed an accumulation of only point mutations in no other areas than hypothalamus and hippocampus. To directly test for cell division as the determining factor in the generation of point mutations we compared mutation induction between mitotically active and quiescent embryonic fibroblasts from the same lacZ mice, treated with either UV (a point mutagen) or hydrogen peroxide (a clastogen). The results indicate that while point mutations are highly replication-dependent, genome rearrangements are as easily induced in non-dividing cells as in mitotically active ones. This strongly suggests that the point mutations found to have accumulated in the mucosal part of the small intestine are the consequence of replication errors. The same is likely true for point mutations accumulating in hippocampus and hypothalamus of the brain since neurogenesis in these two areas continues throughout life. The observed intra-organ variation in mutation susceptibility as well as the variation in replication dependency of different types of mutations indicates the need to not only extend observations made on whole organs to their sub-structures but also take the type of mutations and mitotic activity of the cells into consideration. This should help elucidating the impact of genome instability and its consequences on aging and disease

    Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC).</p> <p>Results</p> <p>We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large <it>p </it>small <it>n</it>" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets.</p> <p>Conclusion</p> <p>The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.</p

    Advanced backcross-QTL analysis in spring barley (H. vulgare ssp. spontaneum) comparing a REML versus a Bayesian model in multi-environmental field trials

    Get PDF
    A common difficulty in mapping quantitative trait loci (QTLs) is that QTL effects may show environment specificity and thus differ across environments. Furthermore, quantitative traits are likely to be influenced by multiple QTLs or genes having different effect sizes. There is currently a need for efficient mapping strategies to account for both multiple QTLs and marker-by-environment interactions. Thus, the objective of our study was to develop a Bayesian multi-locus multi-environmental method of QTL analysis. This strategy is compared to (1) Bayesian multi-locus mapping, where each environment is analysed separately, (2) Restricted Maximum Likelihood (REML) single-locus method using a mixed hierarchical model, and (3) REML forward selection applying a mixed hierarchical model. For this study, we used data on multi-environmental field trials of 301 BC2DH lines derived from a cross between the spring barley elite cultivar Scarlett and the wild donor ISR42-8 from Israel. The lines were genotyped by 98 SSR markers and measured for the agronomic traits “ears per m²,” “days until heading,” “plant height,” “thousand grain weight,” and “grain yield”. Additionally, a simulation study was performed to verify the QTL results obtained in the spring barley population. In general, the results of Bayesian QTL mapping are in accordance with REML methods. In this study, Bayesian multi-locus multi-environmental analysis is a valuable method that is particularly suitable if lines are cultivated in multi-environmental field trials

    Path and Ridge Regression Analysis of Seed Yield and Seed Yield Components of Russian Wildrye (Psathyrostachys juncea Nevski) under Field Conditions

    Get PDF
    The correlations among seed yield components, and their direct and indirect effects on the seed yield (Z) of Russina wildrye (Psathyrostachys juncea Nevski) were investigated. The seed yield components: fertile tillers m-2 (Y1), spikelets per fertile tillers (Y2), florets per spikelet- (Y3), seed numbers per spikelet (Y4) and seed weight (Y5) were counted and the Z were determined in field experiments from 2003 to 2006 via big sample size. Y1 was the most important seed yield component describing the Z and Y2 was the least. The total direct effects of the Y1, Y3 and Y5 to the Z were positive while Y4 and Y2 were weakly negative. The total effects (directs plus indirects) of the components were positively contributed to the Z by path analyses. The seed yield components Y1, Y2, Y4 and Y5 were significantly (P<0.001) correlated with the Z for 4 years totally, while in the individual years, Y2 were not significant correlated with Y3, Y4 and Y5 by Peason correlation analyses in the five components in the plant seed production. Therefore, selection for high seed yield through direct selection for large Y1, Y2 and Y3 would be effective for breeding programs in grasses. Furthermore, it is the most important that, via ridge regression, a steady algorithm model between Z and the five yield components was founded, which can be closely estimated the seed yield via the components

    Brain classification reveals the right cerebellum as the best biomarker of dyslexia

    Get PDF
    Background Developmental dyslexia is a specific cognitive disorder in reading acquisition that has genetic and neurological origins. Despite histological evidence for brain differences in dyslexia, we recently demonstrated that in large cohort of subjects, no differences between control and dyslexic readers can be found at the macroscopic level (MRI voxel), because of large variances in brain local volumes. In the present study, we aimed at finding brain areas that most discriminate dyslexic from control normal readers despite the large variance across subjects. After segmenting brain grey matter, normalizing brain size and shape and modulating the voxels' content, normal readers' brains were used to build a 'typical' brain via bootstrapped confidence intervals. Each dyslexic reader's brain was then classified independently at each voxel as being within or outside the normal range. We used this simple strategy to build a brain map showing regional percentages of differences between groups. The significance of this map was then assessed using a randomization technique. Results The right cerebellar declive and the right lentiform nucleus were the two areas that significantly differed the most between groups with 100% of the dyslexic subjects (N = 38) falling outside of the control group (N = 39) 95% confidence interval boundaries. The clinical relevance of this result was assessed by inquiring cognitive brain-based differences among dyslexic brain subgroups in comparison to normal readers' performances. The strongest difference between dyslexic subgroups was observed between subjects with lower cerebellar declive (LCD) grey matter volumes than controls and subjects with higher cerebellar declive (HCD) grey matter volumes than controls. Dyslexic subjects with LCD volumes performed worse than subjects with HCD volumes in phonologically and lexicon related tasks. Furthermore, cerebellar and lentiform grey matter volumes interacted in dyslexic subjects, so that lower and higher lentiform grey matter volumes compared to controls differently modulated the phonological and lexical performances. Best performances (observed in controls) corresponded to an optimal value of grey matter and they dropped for higher or lower volumes. Conclusion These results provide evidence for the existence of various subtypes of dyslexia characterized by different brain phenotypes. In addition, behavioural analyses suggest that these brain phenotypes relate to different deficits of automatization of language-based processes such as grapheme/phoneme correspondence and/or rapid access to lexicon entries. article available here: http://www.biomedcentral.com/1471-2202/10/6

    Characterization of the macrophage transcriptome in glomerulonephritis-susceptible and -resistant rat strains

    Get PDF
    Crescentic glomerulonephritis (CRGN) is a major cause of rapidly progressive renal failure for which the underlying genetic basis is unknown. WKY rats show marked susceptibility to CRGN, while Lewis rats are resistant. Glomerular injury and crescent formation are macrophage-dependent and mainly explained by seven quantitative trait loci (Crgn1-7). Here, we used microarray analysis in basal and lipopolysaccharide (LPS)-stimulated macrophages to identify genes that reside on pathways predisposing WKY rats to CRGN. We detected 97 novel positional candidates for the uncharacterised Crgn3-7. We identified 10 additional secondary effector genes with profound differences in expression between the two strains (>5-fold change, <1% False Discovery Rate) for basal and LPS-stimulated macrophages. Moreover, we identified 8 genes with differentially expressed alternatively spliced isoforms, by using an in depth analysis at probe-level that allowed us to discard false positives due to polymorphisms between the two rat strains. Pathway analysis identified several common linked pathways, enriched for differentially expressed genes, which affect macrophage activation. In summary, our results identify distinct macrophage transcriptome profiles between two rat strains that differ in susceptibility to glomerulonephritis, provide novel positional candidates for Crgn3-7, and define groups of genes that play a significant role in differential regulation of macrophage activity
    corecore