28 research outputs found
Participation of Multifunctional RNA in Replication, Recombination and Regulation of Endogenous Plant Pararetroviruses (EPRVs)
Pararetroviruses, taxon Caulimoviridae, are typical of retroelements with reverse transcriptase and share a common origin with retroviruses and LTR retrotransposons, presumably dating back 1.6 billion years and illustrating the transition from an RNA to a DNA world. After transcription of the viral genome in the host nucleus, viral DNA synthesis occurs in the cytoplasm on the generated terminally redundant RNA including inter- and intra-molecule recombination steps rather than relying on nuclear DNA replication. RNA recombination events between an ancestral genomic retroelement with exogenous RNA viruses were seminal in pararetrovirus evolution resulting in horizontal transmission and episomal replication. Instead of active integration, pararetroviruses use the host DNA repair machinery to prevail in genomes of angiosperms, gymnosperms and ferns. Pararetrovirus integration – leading to Endogenous ParaRetroViruses, EPRVs – by illegitimate recombination can happen if their sequences instead of homologous host genomic sequences on the sister chromatid (during mitosis) or homologous chromosome (during meiosis) are used as template. Multiple layers of RNA interference exist regulating episomal and chromosomal forms of the pararetrovirus. Pararetroviruses have evolved suppressors against this plant defense in the arms race during co-evolution which can result in deregulation of plant genes. Small RNAs serve as signaling molecules for Transcriptional and Post-Transcriptional Gene Silencing (TGS, PTGS) pathways. Different populations of small RNAs comprising 21–24 nt and 18–30 nt in length have been reported for Citrus, Fritillaria, Musa, Petunia, Solanum and Beta. Recombination and RNA interference are driving forces for evolution and regulation of EPRVs
A sheep pangenome reveals the spectrum of structural variations and their effects on tail phenotypes
Structural variations (SVs) are a major contributor to genetic diversity and phenotypic variations, but their prevalence and functions in domestic animals are largely unexplored. Here we generated high-quality genome assemblies for 15 individuals from genetically diverse sheep breeds using Pacific Biosciences (PacBio) high-fidelity sequencing, discovering 130.3 Mb nonreference sequences, from which 588 genes were annotated. A total of 149,158 biallelic insertions/deletions, 6531 divergent alleles, and 14,707 multiallelic variations with precise breakpoints were discovered. The SV spectrum is characterized by an excess of derived insertions compared to deletions (94,422 vs. 33,571), suggesting recent active LINE expansions in sheep. Nearly half of the SVs display low to moderate linkage disequilibrium with surrounding single-nucleotide polymorphisms (SNPs) and most SVs cannot be tagged by SNP probes from the widely used ovine 50K SNP chip. We identified 865 population-stratified SVs including 122 SVs possibly derived in the domestication process among 690 individuals from sheep breeds worldwide. A novel 168-bp insertion in the 5' untranslated region (5' UTR) of HOXB13 is found at high frequency in long-tailed sheep. Further genome-wide association study and gene expression analyses suggest that this mutation is causative for the long-tail trait. In summary, we have developed a panel of high-quality de novo assemblies and present a catalog of structural variations in sheep. Our data capture abundant candidate functional variations that were previously unexplored and provide a fundamental resource for understanding trait biology in sheep
Repetitive DNA in eukaryotic genomes
Repetitive DNA-sequence motifs repeated hundreds or thousands of times in the genome-makes up the major proportion of all the nuclear DNA in most eukaryotic genomes. However, the significance of repetitive DNA in the genome is not completely understood, and it has been considered to have both structural and functional roles, or perhaps even no essential role. High-throughput DNA sequencing reveals huge numbers of repetitive sequences. Most bioinformatic studies focus on low-copy DNA including genes, and hence, the analyses collapse repeats in assemblies presenting only one or a few copies, often masking out and ignoring them in both DNA and RNA read data. Chromosomal studies are proving vital to examine the distribution and evolution of sequences because of the challenges of analysis of sequence data. Many questions are open about the origin, evolutionary mode and functions that repetitive sequences might have in the genome. Some, the satellite DNAs, are present in long arrays of similar motifs at a small number of sites, while others, particularly the transposable elements (DNA transposons and retrotranposons), are dispersed over regions of the genome; in both cases, sequence motifs may be located at relatively specific chromosome domains such as centromeres or subtelomeric regions. Here, we overview a range of works involving detailed characterization of the nature of all types of repetitive sequences, in particular their organization, abundance, chromosome localization, variation in sequence within and between chromosomes, and, importantly, the investigation of their transcription or expression activity. Comparison of the nature and locations of sequences between more, and less, related species is providing extensive information about their evolution and amplification. Some repetitive sequences are extremely well conserved between species, while others are among the most variable, defining differences between even closely relative species. These data suggest contrasting modes of evolution of repetitive DNA of different types, including selfish sequences that propagate themselves and may even be transferred horizontally between species rather than by descent, through to sequences that have a tendency to amplification because of their sequence motifs, to those that have structural significance because of their bulk rather than precise sequence. Functional consequences of repeats include generation of variability by movement and insertion in the genome (giving useful genetic markers), the definition of centromeres, expression under stress conditions and regulation of gene expression via RNA moieties. Molecular cytogenetics and bioinformatic studies in a comparative context are now enabling understanding of the nature and behaviour of this major genomic component
Characterization and Diversity of Novel PIF/Harbinger DNA Transposons in Brassica Genomes
Among DNA transposons, PIF/Harbinger is most recently identified superfamily characterized by 3 bp target site duplications (TSDs), flanked by 14-45 bp terminal inverted repeats (TIRs) and displaying DDD or DDE domain displaying transposase. Their autonomous elements contain two open reading frames, ORF1 and ORF2 encoding superfamily specific transposase and DNA-binding domain. Harbinger DNA transposons are recently identified in few plants. In present study, computational and molecular approaches were used for the identification of 8 Harbinger transposons, of which only 2 were complete with putative transposase, while rest 6 lack transposase and are considered as defective or non-autonomous elements. They ranged in size from 0.5-4 kb with 3 bp TSDs, 15-42 bp TIRs and internal AT rich regions. The PCR amplification of Brassica Harbinger transposase revealed diversity and ancient nature of these elements. The amplification polymorphism of some non-autonomous Harbingers showed species specific distribution. Phylogenetic analyses of transposase clustered them into two clades (monocot and dicot) and five sub-clades. The Brassica, Arabidopsis and Malus transposase clustered into genera specific sub-clades; although a lot of homology in transposase was observed. The multiple sequence alignment of Brassica and related transposase showed homology in five conserved blocks. The DD₃₅E triad and sequences showed similarity to already known Pong-like or Arabidopsis ATISI12 Harbinger transposase in contrast to other transposase having DD₄₇E or DD₄₈E motifs. The present study will be helpful in the characterization of Harbingers, their structural diversity in related genera and Harbinger based molecular markers for varietal/lines identifications
The repetitive DNA landscape in sheep
Repetitive DNA sequences, representing the majority of most mammalian genomes,
can be broadly divided into tandemly repeated or satellite sequences (mostly located
in the heterochromatin) and transposable elements (TEs) dispersed over the
genome. Some repetitive DNA sequences are highly conserved but other sequences
show substantial diversification in copy number, sequence and organization
between individuals, breeds, and related species. Here, we report the repetitive DNA
landscape of sheep (Ovis aries) based on de novo analysis of >6Gbp of sequence
from each of five individuals. Major classes of repetitive DNA sequences were
identified and quantified by network analysis (using the program RepeatExplorer),
frequency analysis of short motifs (K-mers), and alignment to reference genome
assemblies. The genomic organization of the major repetitive motifs was
characterized by in situ hybridization to chromosomes. The well-known c. 816 bplong
centromere-associated satellite SatI represented 4 to 6 % of the genome while
SatII (c. 600 bp long) was 1 to 2 % of the genome. Notably, these satellites showed
contrasting behaviour at meiotic prophase: Sat I sequences cover a larger area
indicating a looser chromatin loop organization. While, Sat II sequences are tightly
organized and are attached to the synaptonemal complex (SC) at a more distal
position than SatI sequences at the end of SCs of acrocentric chromosomes. The
repetitive sequence analysis identified other much less abundant satellite sequences
and simple repeats, some with novel genomic distributions. Families of non-LTR
retrotransposons including LINEs (L1 and RTE) and derived SINEs represented
more than 25 % of the genome. Non-LTR families showed characteristic
distributions on chromosomes with some showing greater abundance on
metacentric autosomes or on sex chromosomes. Endogenous retrovirus classes
grouped into clusters with some families showing centromeric and others more
dispersed distributions. Rapidly evolving repetitive sequences allow us to study
processes of chromosome or genome evolution and diversification in sheep, and
more broadly across the Bovidae