49 research outputs found

    Comparative Analysis of RNA Families Reveals Distinct Repertoires for Each Domain of Life

    Get PDF
    The RNA world hypothesis, that RNA genomes and catalysts preceded DNA genomes and genetically-encoded protein catalysts, has been central to models for the early evolution of life on Earth. A key part of such models is continuity between the earliest stages in the evolution of life and the RNA repertoires of extant lineages. Some assessments seem consistent with a diverse RNA world, yet direct continuity between modern RNAs and an RNA world has not been demonstrated for the majority of RNA families, and, anecdotally, many RNA functions appear restricted in their distribution. Despite much discussion of the possible antiquity of RNA families, no systematic analyses of RNA family distribution have been performed. To chart the broad evolutionary history of known RNA families, we performed comparative genomic analysis of over 3 million RNA annotations spanning 1446 families from the Rfam 10 database. We report that 99% of known RNA families are restricted to a single domain of life, revealing discrete repertoires for each domain. For the 1% of RNA families/clans present in more than one domain, over half show evidence of horizontal gene transfer, and the rest show a vertical trace, indicating the presence of a complex protein synthesis machinery in the Last Universal Common Ancestor (LUCA) and consistent with the evolutionary history of the most ancient protein-coding genes. However, with limited interdomain transfer and few RNA families exhibiting demonstrable antiquity as predicted under RNA world continuity, our results indicate that the majority of modern cellular RNA repertoires have primarily evolved in a domain-specific manner.Comment: 47 pages, 4 main figures, 3 supplementary figures, 4 supplementary tables. Submitted to PLOS Computational Biolog

    Assembly and analysis of the genome sequence of the yeast Brettanomyces naardenensis CBS 7540

    Get PDF
    Brettanomyces naardenensis is a spoilage yeast with potential for biotechnological applications for production of innovative beverages with low alcohol content and high attenuation degree. Here, we present the first annotated genome of B. naardenensis CBS 7540. The genome of B. naardenensis CBS 7540 was assembled into 76 contigs, totaling 11,283,072 nucleotides. In total, 5168 protein-coding sequences were annotated. The study provides functional genome annotation, phylogenetic analysis, and discusses genetic determinants behind notable stress tolerance and biotechnological potential of B. naardenensis

    Reference Genomes from Distantly Related Species Can Be Used for Discovery of Single Nucleotide Polymorphisms to Inform Conservation Management

    Get PDF
    Threatened species recovery programmes benefit from incorporating genomic data into conservation management strategies to enhance species recovery. However, a lack of readily available genomic resources, including conspecific reference genomes, often limits the inclusion of genomic data. Here, we investigate the utility of closely related high-quality reference genomes for single nucleotide polymorphism (SNP) discovery using the critically endangered kakī/black stilt (Himantopus novaezelandiae) and four Charadriiform reference genomes as proof of concept. We compare diversity estimates (i.e., nucleotide diversity, individual heterozygosity, and relatedness) based on kakī SNPs discovered from genotyping-by-sequencing and whole genome resequencing reads mapped to conordinal (killdeer, Charadrius vociferus), confamilial (pied avocet, Recurvirostra avosetta), congeneric (pied stilt, Himantopus himantopus) and conspecific reference genomes. Results indicate that diversity estimates calculated from SNPs discovered using closely related reference genomes correlate significantly with estimates calculated from SNPs discovered using a conspecific genome. Congeneric and confamilial references provide higher correlations and more similar measures of nucleotide diversity, individual heterozygosity, and relatedness. While conspecific genomes may be necessary to address other questions in conservation, SNP discovery using high-quality reference genomes of closely related species is a cost-effective approach for estimating diversity measures in threatened species

    An integrated molecular risk score early in life for subsequent childhood asthma risk.

    Get PDF
    BACKGROUND Numerous children present with early wheeze symptoms, yet solely a subgroup develops childhood asthma. Early identification of children at risk is key for clinical monitoring, timely patient-tailored treatment, and preventing chronic, severe sequelae. For early prediction of childhood asthma, we aimed to define an integrated risk score combining established risk factors with genome-wide molecular markers at birth, complemented by subsequent clinical symptoms/diagnoses (wheezing, atopic dermatitis, food allergy). METHODS Three longitudinal birth cohorts (PAULINA/PAULCHEN, n = 190 + 93 = 283, PASTURE, n = 1133) were used to predict childhood asthma (age 5-11) including epidemiological characteristics and molecular markers: genotype, DNA methylation and mRNA expression (RNASeq/NanoString). Apparent (ap) and optimism-corrected (oc) performance (AUC/R2) was assessed leveraging evidence from independent studies (Naïve-Bayes approach) combined with high-dimensional logistic regression models (LASSO). RESULTS Asthma prediction with epidemiological characteristics at birth (maternal asthma, sex, farm environment) yielded an ocAUC = 0.65. Inclusion of molecular markers as predictors resulted in an improvement in apparent prediction performance, however, for optimism-corrected performance only a moderate increase was observed (upto ocAUC = 0.68). The greatest discriminate power was reached by adding the first symptoms/diagnosis (up to ocAUC = 0.76; increase of 0.08, p = .002). Longitudinal analysis of selected mRNA expression in PASTURE (cord blood, 1, 4.5, 6 years) showed that expression at age six had the strongest association with asthma and correlation of genes getting larger over time (r = .59, p < .001, 4.5-6 years). CONCLUSION Applying epidemiological predictors alone showed moderate predictive abilities. Molecular markers from birth modestly improved prediction. Allergic symptoms/diagnoses enhanced the power of prediction, which is important for clinical practice and for the design of future studies with molecular markers

    An Improved Canine Genome and a Comprehensive Catalogue of Coding Genes and Non-Coding Transcripts

    Get PDF
    The domestic dog, Canis familiaris, is a well-established model system for mapping trait and disease loci. While the original draft sequence was of good quality, gaps were abundant particularly in promoter regions of the genome, negatively impacting the annotation and study of candidate genes. Here, we present an improved genome build, canFam3.1, which includes 85 MB of novel sequence and now covers 99.8% of the euchromatic portion of the genome. We also present multiple RNA-Sequencing data sets from 10 different canine tissues to catalog ∼175,000 expressed loci. While about 90% of the coding genes previously annotated by EnsEMBL have measurable expression in at least one sample, the number of transcript isoforms detected by our data expands the EnsEMBL annotations by a factor of four. Syntenic comparison with the human genome revealed an additional ∼3,000 loci that are characterized as protein coding in human and were also expressed in the dog, suggesting that those were previously not annotated in the EnsEMBL canine gene set. In addition to ∼20,700 high-confidence protein coding loci, we found ∼4,600 antisense transcripts overlapping exons of protein coding genes, ∼7,200 intergenic multi-exon transcripts without coding potential, likely candidates for long intergenic non-coding RNAs (lincRNAs) and ∼11,000 transcripts were reported by two different library construction methods but did not fit any of the above categories. Of the lincRNAs, about 6,000 have no annotated orthologs in human or mouse. Functional analysis of two novel transcripts with shRNA in a mouse kidney cell line altered cell morphology and motility. All in all, we provide a much-improved annotation of the canine genome and suggest regulatory functions for several of the novel non-coding transcripts

    Evolutionarily Stable Association of Intronic snoRNAs and microRNAs with Their Host Genes

    Get PDF
    Small nucleolar RNAs (snoRNAs) and microRNAs (miRNAs) are integral to a range of processes, including ribosome biogenesis and gene regulation. Some are intron encoded, and this organization may facilitate coordinated coexpression of host gene and RNA. However, snoRNAs and miRNAs are known to be mobile, so intron-RNA associations may not be evolutionarily stable. We have used genome alignments across 11 mammals plus chicken to examine positional orthology of snoRNAs and miRNAs and report that 21% of annotated snoRNAs and 11% of miRNAs are positionally conserved across mammals. Among RNAs traceable to the bird–mammal common ancestor, 98% of snoRNAs and 76% of miRNAs are intronic. Comparison of the most evolutionarily stable mammalian intronic snoRNAs with those positionally conserved among primates reveals that the former are more overrepresented among host genes involved in translation or ribosome biogenesis and are more broadly and highly expressed. This stability is likely attributable to a requirement for overlap between host gene and intronic snoRNA expression profiles, consistent with an ancestral role in ribosome biogenesis. In contrast, whereas miRNA positional conservation is comparable to that observed for snoRNAs, intronic miRNAs show no obvious association with host genes of a particular functional category, and no statistically significant differences in host gene expression are found between those traceable to mammalian or primate ancestors. Our results indicate evolutionarily stable associations of numerous intronic snoRNAs and miRNAs and their host genes, with probable continued diversification of snoRNA function from an ancestral role in ribosome biogenesis

    Large-scale sequencing identifies multiple genes and rare variants associated with Crohn’s disease susceptibility

    Full text link
    peer reviewe

    Comparative genomics of eukaryotic small nucleolar RNAs reveals deep evolutionary ancestry amidst ongoing intragenomic mobility

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Small nucleolar (sno)RNAs are required for posttranscriptional processing and modification of ribosomal, spliceosomal and messenger RNAs. Their presence in both eukaryotes and archaea indicates that snoRNAs are evolutionarily ancient. The location of some snoRNAs within the introns of ribosomal protein genes has been suggested to belie an RNA world origin, with the exons of the earliest protein-coding genes having evolved around snoRNAs after the advent of templated protein synthesis. Alternatively, this intronic location may reflect more recent selection for coexpression of snoRNAs and ribosomal components, ensuring rRNA modification by snoRNAs during ribosome synthesis. To gain insight into the evolutionary origins of this genetic organization, we examined the antiquity of snoRNA families and the stability of their genomic location across 44 eukaryote genomes.</p> <p>Results</p> <p>We report that dozens of snoRNA families are traceable to the Last Eukaryotic Common Ancestor (LECA), but find only weak similarities between the oldest eukaryotic snoRNAs and archaeal snoRNA-like genes. Moreover, many of these LECA snoRNAs are located within the introns of host genes independently traceable to the LECA. Comparative genomic analyses reveal the intronic location of LECA snoRNAs is not ancestral however, suggesting the pattern we observe is the result of ongoing intragenomic mobility. Analysis of human transcriptome data indicates that the primary requirement for hosting intronic snoRNAs is a broad expression profile. Consistent with ongoing mobility across broadly-expressed genes, we report a case of recent migration of a non-LECA snoRNA from the intron of a ubiquitously expressed non-LECA host gene into the introns of two LECA genes during the evolution of primates.</p> <p>Conclusions</p> <p>Our analyses show that snoRNAs were a well-established family of RNAs at the time when eukaryotes began to diversify. While many are intronic, this association is not evolutionarily stable across the eukaryote tree; ongoing intragenomic mobility has erased signal of their ancestral gene organization, and neither introns-first nor evolved co-expression adequately explain our results. We therefore present a third model — constrained drift — whereby individual snoRNAs are intragenomically mobile and may occupy any genomic location from which expression satisfies phenotype.</p
    corecore