49 research outputs found
Comparative Analysis of RNA Families Reveals Distinct Repertoires for Each Domain of Life
The RNA world hypothesis, that RNA genomes and catalysts preceded DNA genomes
and genetically-encoded protein catalysts, has been central to models for the
early evolution of life on Earth. A key part of such models is continuity
between the earliest stages in the evolution of life and the RNA repertoires of
extant lineages. Some assessments seem consistent with a diverse RNA world, yet
direct continuity between modern RNAs and an RNA world has not been
demonstrated for the majority of RNA families, and, anecdotally, many RNA
functions appear restricted in their distribution. Despite much discussion of
the possible antiquity of RNA families, no systematic analyses of RNA family
distribution have been performed. To chart the broad evolutionary history of
known RNA families, we performed comparative genomic analysis of over 3 million
RNA annotations spanning 1446 families from the Rfam 10 database. We report
that 99% of known RNA families are restricted to a single domain of life,
revealing discrete repertoires for each domain. For the 1% of RNA
families/clans present in more than one domain, over half show evidence of
horizontal gene transfer, and the rest show a vertical trace, indicating the
presence of a complex protein synthesis machinery in the Last Universal Common
Ancestor (LUCA) and consistent with the evolutionary history of the most
ancient protein-coding genes. However, with limited interdomain transfer and
few RNA families exhibiting demonstrable antiquity as predicted under RNA world
continuity, our results indicate that the majority of modern cellular RNA
repertoires have primarily evolved in a domain-specific manner.Comment: 47 pages, 4 main figures, 3 supplementary figures, 4 supplementary
tables. Submitted to PLOS Computational Biolog
Assembly and analysis of the genome sequence of the yeast Brettanomyces naardenensis CBS 7540
Brettanomyces naardenensis is a spoilage yeast with potential for biotechnological applications for production of innovative beverages with low alcohol content and high attenuation degree. Here, we present the first annotated genome of B. naardenensis CBS 7540. The genome of B. naardenensis CBS 7540 was assembled into 76 contigs, totaling 11,283,072 nucleotides. In total, 5168 protein-coding sequences were annotated. The study provides functional genome annotation, phylogenetic analysis, and discusses genetic determinants behind notable stress tolerance and biotechnological potential of B. naardenensis
Reference Genomes from Distantly Related Species Can Be Used for Discovery of Single Nucleotide Polymorphisms to Inform Conservation Management
Threatened species recovery programmes benefit from incorporating genomic data into conservation management strategies to enhance species recovery. However, a lack of readily available genomic resources, including conspecific reference genomes, often limits the inclusion of genomic data. Here, we investigate the utility of closely related high-quality reference genomes for single nucleotide polymorphism (SNP) discovery using the critically endangered kakī/black stilt (Himantopus novaezelandiae) and four Charadriiform reference genomes as proof of concept. We compare diversity estimates (i.e., nucleotide diversity, individual heterozygosity, and relatedness) based on kakī SNPs discovered from genotyping-by-sequencing and whole genome resequencing reads mapped to conordinal (killdeer, Charadrius vociferus), confamilial (pied avocet, Recurvirostra avosetta), congeneric (pied stilt, Himantopus himantopus) and conspecific reference genomes. Results indicate that diversity estimates calculated from SNPs discovered using closely related reference genomes correlate significantly with estimates calculated from SNPs discovered using a conspecific genome. Congeneric and confamilial references provide higher correlations and more similar measures of nucleotide diversity, individual heterozygosity, and relatedness. While conspecific genomes may be necessary to address other questions in conservation, SNP discovery using high-quality reference genomes of closely related species is a cost-effective approach for estimating diversity measures in threatened species
An integrated molecular risk score early in life for subsequent childhood asthma risk.
BACKGROUND
Numerous children present with early wheeze symptoms, yet solely a subgroup develops childhood asthma. Early identification of children at risk is key for clinical monitoring, timely patient-tailored treatment, and preventing chronic, severe sequelae. For early prediction of childhood asthma, we aimed to define an integrated risk score combining established risk factors with genome-wide molecular markers at birth, complemented by subsequent clinical symptoms/diagnoses (wheezing, atopic dermatitis, food allergy).
METHODS
Three longitudinal birth cohorts (PAULINA/PAULCHEN, n = 190 + 93 = 283, PASTURE, n = 1133) were used to predict childhood asthma (age 5-11) including epidemiological characteristics and molecular markers: genotype, DNA methylation and mRNA expression (RNASeq/NanoString). Apparent (ap) and optimism-corrected (oc) performance (AUC/R2) was assessed leveraging evidence from independent studies (Naïve-Bayes approach) combined with high-dimensional logistic regression models (LASSO).
RESULTS
Asthma prediction with epidemiological characteristics at birth (maternal asthma, sex, farm environment) yielded an ocAUC = 0.65. Inclusion of molecular markers as predictors resulted in an improvement in apparent prediction performance, however, for optimism-corrected performance only a moderate increase was observed (upto ocAUC = 0.68). The greatest discriminate power was reached by adding the first symptoms/diagnosis (up to ocAUC = 0.76; increase of 0.08, p = .002). Longitudinal analysis of selected mRNA expression in PASTURE (cord blood, 1, 4.5, 6 years) showed that expression at age six had the strongest association with asthma and correlation of genes getting larger over time (r = .59, p < .001, 4.5-6 years).
CONCLUSION
Applying epidemiological predictors alone showed moderate predictive abilities. Molecular markers from birth modestly improved prediction. Allergic symptoms/diagnoses enhanced the power of prediction, which is important for clinical practice and for the design of future studies with molecular markers
An Improved Canine Genome and a Comprehensive Catalogue of Coding Genes and Non-Coding Transcripts
The domestic dog, Canis familiaris, is a well-established model system for mapping trait and disease loci. While the original draft sequence was of good quality, gaps were abundant particularly in promoter regions of the genome, negatively impacting the annotation and study of candidate genes. Here, we present an improved genome build, canFam3.1, which includes 85 MB of novel sequence and now covers 99.8% of the euchromatic portion of the genome. We also present multiple RNA-Sequencing data sets from 10 different canine tissues to catalog ∼175,000 expressed loci. While about 90% of the coding genes previously annotated by EnsEMBL have measurable expression in at least one sample, the number of transcript isoforms detected by our data expands the EnsEMBL annotations by a factor of four. Syntenic comparison with the human genome revealed an additional ∼3,000 loci that are characterized as protein coding in human and were also expressed in the dog, suggesting that those were previously not annotated in the EnsEMBL canine gene set. In addition to ∼20,700 high-confidence protein coding loci, we found ∼4,600 antisense transcripts overlapping exons of protein coding genes, ∼7,200 intergenic multi-exon transcripts without coding potential, likely candidates for long intergenic non-coding RNAs (lincRNAs) and ∼11,000 transcripts were reported by two different library construction methods but did not fit any of the above categories. Of the lincRNAs, about 6,000 have no annotated orthologs in human or mouse. Functional analysis of two novel transcripts with shRNA in a mouse kidney cell line altered cell morphology and motility. All in all, we provide a much-improved annotation of the canine genome and suggest regulatory functions for several of the novel non-coding transcripts
A Similar but Distinctive Pattern of Impaired Cortical Excitability in First-Episode Schizophrenia and ADHD
Evolutionarily Stable Association of Intronic snoRNAs and microRNAs with Their Host Genes
Small nucleolar RNAs (snoRNAs) and microRNAs (miRNAs) are integral to a range of processes, including ribosome biogenesis and gene regulation. Some are intron encoded, and this organization may facilitate coordinated coexpression of host gene and RNA. However, snoRNAs and miRNAs are known to be mobile, so intron-RNA associations may not be evolutionarily stable. We have used genome alignments across 11 mammals plus chicken to examine positional orthology of snoRNAs and miRNAs and report that 21% of annotated snoRNAs and 11% of miRNAs are positionally conserved across mammals. Among RNAs traceable to the bird–mammal common ancestor, 98% of snoRNAs and 76% of miRNAs are intronic. Comparison of the most evolutionarily stable mammalian intronic snoRNAs with those positionally conserved among primates reveals that the former are more overrepresented among host genes involved in translation or ribosome biogenesis and are more broadly and highly expressed. This stability is likely attributable to a requirement for overlap between host gene and intronic snoRNA expression profiles, consistent with an ancestral role in ribosome biogenesis. In contrast, whereas miRNA positional conservation is comparable to that observed for snoRNAs, intronic miRNAs show no obvious association with host genes of a particular functional category, and no statistically significant differences in host gene expression are found between those traceable to mammalian or primate ancestors. Our results indicate evolutionarily stable associations of numerous intronic snoRNAs and miRNAs and their host genes, with probable continued diversification of snoRNA function from an ancestral role in ribosome biogenesis
Linked genetic variants on chromosome 10 control ear morphology and body mass among dog breeds
Large-scale sequencing identifies multiple genes and rare variants associated with Crohn’s disease susceptibility
peer reviewe
Comparative genomics of eukaryotic small nucleolar RNAs reveals deep evolutionary ancestry amidst ongoing intragenomic mobility
<p>Abstract</p> <p>Background</p> <p>Small nucleolar (sno)RNAs are required for posttranscriptional processing and modification of ribosomal, spliceosomal and messenger RNAs. Their presence in both eukaryotes and archaea indicates that snoRNAs are evolutionarily ancient. The location of some snoRNAs within the introns of ribosomal protein genes has been suggested to belie an RNA world origin, with the exons of the earliest protein-coding genes having evolved around snoRNAs after the advent of templated protein synthesis. Alternatively, this intronic location may reflect more recent selection for coexpression of snoRNAs and ribosomal components, ensuring rRNA modification by snoRNAs during ribosome synthesis. To gain insight into the evolutionary origins of this genetic organization, we examined the antiquity of snoRNA families and the stability of their genomic location across 44 eukaryote genomes.</p> <p>Results</p> <p>We report that dozens of snoRNA families are traceable to the Last Eukaryotic Common Ancestor (LECA), but find only weak similarities between the oldest eukaryotic snoRNAs and archaeal snoRNA-like genes. Moreover, many of these LECA snoRNAs are located within the introns of host genes independently traceable to the LECA. Comparative genomic analyses reveal the intronic location of LECA snoRNAs is not ancestral however, suggesting the pattern we observe is the result of ongoing intragenomic mobility. Analysis of human transcriptome data indicates that the primary requirement for hosting intronic snoRNAs is a broad expression profile. Consistent with ongoing mobility across broadly-expressed genes, we report a case of recent migration of a non-LECA snoRNA from the intron of a ubiquitously expressed non-LECA host gene into the introns of two LECA genes during the evolution of primates.</p> <p>Conclusions</p> <p>Our analyses show that snoRNAs were a well-established family of RNAs at the time when eukaryotes began to diversify. While many are intronic, this association is not evolutionarily stable across the eukaryote tree; ongoing intragenomic mobility has erased signal of their ancestral gene organization, and neither introns-first nor evolved co-expression adequately explain our results. We therefore present a third model — constrained drift — whereby individual snoRNAs are intragenomically mobile and may occupy any genomic location from which expression satisfies phenotype.</p