470,386 research outputs found
Amplification and adaptation of centromeric repeats in polyploid switchgrass species.
Centromeres in most higher eukaryotes are composed of long arrays of satellite repeats from a single satellite repeat family. Why centromeres are dominated by a single satellite repeat and how the satellite repeats originate and evolve are among the most intriguing and long-standing questions in centromere biology. We identified eight satellite repeats in the centromeres of tetraploid switchgrass (Panicum virgatum). Seven repeats showed characteristics associated with classical centromeric repeats with monomeric lengths ranging from 166 to 187Â bp. Interestingly, these repeats share an 80-bp DNA motif. We demonstrate that this 80-bp motif may dictate translational and rotational phasing of the centromeric repeats with the cenH3 nucleosomes. The sequence of the last centromeric repeat, Pv156, is identical to the 5S ribosomal RNA genes. We demonstrate that a 5S ribosomal RNA gene array was recruited to be the functional centromere for one of the switchgrass chromosomes. Our findings reveal that certain types of satellite repeats, which are associated with unique sequence features and are composed of monomers in mono-nucleosomal length, are favorable for centromeres. Centromeric repeats may undergo dynamic amplification and adaptation before the centromeres in the same species become dominated by the best adapted satellite repeat
A role for non-B DNA forming sequences in mediating microlesions causing human inherited disease
Missense/nonsense mutations and micro-deletions/micro-insertions of <21bp together represent ~76% of all mutations causing human inherited disease. Previous studies have shown that their occurrence is influenced by sequences capable of non-B DNA formation (direct, inverted and mirror repeats; G-quartets). We found that a greater than expected proportion (~21%) of both micro-deletions and micro-insertions occur within direct repeats and are explicable by slipped misalignment. A novel mutational mechanism, non-B DNA triplex formation followed by DNA repair, is proposed to explain ~5 % of micro-deletions and micro-insertions at mirror repeats. Further, G-quadruplex-forming sequences, direct and inverted repeats appear to play a prominent role in mediating missense mutations, whereas only direct and inverted repeats mediate nonsense mutations. We suggest a mutational mechanism involving slipped strand mispairing, slipped structure formation and DNA repair, to explain ~15% of missense and ~12% of nonsense mutations leading to the formation of perfect direct repeat s from imperfect repeats, or to the extension of existing direct repeats. Similar proportions of missense and nonsense mutations were explicable by the mechanism of hairpin loop formation and DNA repair leading to the formation of perfect inverted repeats from imperfect repeats. The proposed mechanisms provide new insights into mutagenesis underlying pathogenic micro-lesions
Genomic abundance is not predictive of tandem repeat localization in grass genomes.
Highly repetitive regions have historically posed a challenge when investigating sequence variation and content. High-throughput sequencing has enabled researchers to use whole-genome shotgun sequencing to estimate the abundance of repetitive sequence, and these methodologies have been recently applied to centromeres. Previous research has investigated variation in centromere repeats across eukaryotes, positing that the highest abundance tandem repeat in a genome is often the centromeric repeat. To test this assumption, we used shotgun sequencing and a bioinformatic pipeline to identify common tandem repeats across a number of grass species. We find that de novo assembly and subsequent abundance ranking of repeats can successfully identify tandem repeats with homology to known tandem repeats. Fluorescent in-situ hybridization shows that de novo assembly and ranking of repeats from non-model taxa identifies chromosome domains rich in tandem repeats both near pericentromeres and elsewhere in the genome
Inverted and mirror repeats in model nucleotide sequences
We analytically and numerically study the probabilistic properties of
inverted and mirror repeats in model sequences of nucleic acids. We consider
both perfect and non-perfect repeats, i.e. repeats with mismatches and gaps.
The considered sequence models are independent identically distributed (i.i.d.)
sequences, Markov processes and long range sequences. We show that the number
of repeats in correlated sequences is significantly larger than in i.i.d.
sequences and that this discrepancy increases exponentially with the repeat
length for long range sequences.Comment: 12 pages, 6 figure
Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips
This paper discusses real-time alignment of audio signals of music
performance to the corresponding score (a.k.a. score following) which can
handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips)
in performances. This type of score following is particularly useful in
automatic accompaniment for practices and rehearsals, where errors and
repeats/skips are often made. Simple extensions of the algorithms previously
proposed in the literature are not applicable in these situations for scores of
practical length due to the problem of large computational complexity. To cope
with this problem, we present two hidden Markov models of monophonic
performance with errors and arbitrary repeats/skips, and derive efficient
score-following algorithms with an assumption that the prior probability
distributions of score positions before and after repeats/skips are independent
from each other. We confirmed real-time operation of the algorithms with music
scores of practical length (around 10000 notes) on a modern laptop and their
tracking ability to the input performance within 0.7 s on average after
repeats/skips in clarinet performance data. Further improvements and extension
for polyphonic signals are also discussed.Comment: 12 pages, 8 figures, version accepted in IEEE/ACM Transactions on
Audio, Speech, and Language Processin
Comparative Analysis of Tandem Repeats from Hundreds of Species Reveals Unique Insights into Centromere Evolution
Centromeres are essential for chromosome segregation, yet their DNA sequences
evolve rapidly. In most animals and plants that have been studied, centromeres
contain megabase-scale arrays of tandem repeats. Despite their importance, very
little is known about the degree to which centromere tandem repeats share
common properties between different species across different phyla. We used
bioinformatic methods to identify high-copy tandem repeats from 282 species
using publicly available genomic sequence and our own data. The assumption that
the most abundant tandem repeat is the centromere DNA was true for most species
whose centromeres have been previously characterized, suggesting this is a
general property of genomes. Our methods are compatible with all current
sequencing technologies. Long Pacific Biosciences sequence reads allowed us to
find tandem repeat monomers up to 1,419 bp. High-copy centromere tandem repeats
were found in almost all animal and plant genomes, but repeat monomers were
highly variable in sequence composition and in length. Furthermore,
phylogenetic analysis of sequence homology showed little evidence of sequence
conservation beyond ~50 million years of divergence. We find that despite an
overall lack of sequence conservation, centromere tandem repeats from diverse
species showed similar modes of evolution, including the appearance of higher
order repeat structures in which several polymorphic monomers make up a larger
repeating unit. While centromere position in most eukaryotes is epigenetically
determined, our results indicate that tandem repeats are highly prevalent at
centromeres of both animals and plants. This suggests a functional role for
such repeats, perhaps in promoting concerted evolution of centromere DNA across
chromosomes
Nucleotide repeats in mitochondrial genome determine human lifespan
Direct nucleotide repeats can facilitate deletions of segments of mitochondrial genome1, leading to a wide range of neuromuscular disorders1,2 as well as aging2,3 in humans. We hypothesized that the number of the direct perfect repeats in human mitochondrial genomes influences longevity through the formation of harmful mtDNA deletions in the somatic cells. The analysis of the complete mitochondrial genomes of 762 unrelated Japanese individuals4-6 reveals a negative correlation between the abundance of the direct perfect repeats and the expected longevity. This association is largely due to the disruption of the common repeat (8470,13447) by a point mutation 8473C which occurred at the origin of the D4a haplogroup characterized by extreme longevity in Japan7. Our results provide the first evidence for correlation between the number of nucleotide repeats and the lifespan on intraspecific level
Coplanar Repeats by Energy Minimization
This paper proposes an automated method to detect, group and rectify
arbitrarily-arranged coplanar repeated elements via energy minimization. The
proposed energy functional combines several features that model how planes with
coplanar repeats are projected into images and captures global interactions
between different coplanar repeat groups and scene planes. An inference
framework based on a recent variant of -expansion is described and fast
convergence is demonstrated. We compare the proposed method to two widely-used
geometric multi-model fitting methods using a new dataset of annotated images
containing multiple scene planes with coplanar repeats in varied arrangements.
The evaluation shows a significant improvement in the accuracy of
rectifications computed from coplanar repeats detected with the proposed method
versus those detected with the baseline methods.Comment: 14 pages with supplemental materials attache
Evolution of genes and repeats in the Nimrod superfamily
The recently identified Nimrod superfamily is characterized by the presence of a special type of EGF repeat, the NIM repeat, located right after a typical CCXGY/W amino acid motif. On the basis of structural features, nimrod genes can be divided into three types. The proteins encoded by Draper-type genes have an EMI domain at the N-terminal part and only one copy of the NIM motif, followed by a variable number of EGF-like repeats. The products of Nimrod B-type and Nimrod C-type genes (including the eater gene) have different kinds of N-terminal domains, and lack EGF-like repeats but contain a variable number of NIM repeats. Draper and Nimrod C-type (but not Nimrod B-type) proteins carry a transmembrane domain. Several members of the superfamily were claimed to function as receptors in phagocytosis and/or binding of bacteria, which indicates an important role in the cellular immunity and the elimination of apoptotic cells. In this paper, the evolution of the Nimrod superfamily is studied with various methods on the level of genes and repeats. A hypothesis is presented in which the NIM repeat, along with the EMI domain, emerged by structural reorganizations at the end of an EGF-like repeat chain, suggesting a mechanism for the formation of novel types of repeats. The analyses revealed diverse evolutionary patterns in the sequences containing multiple NIM repeats. Although in the Nimrod B and Nimrod C proteins show characteristics of independent evolution, many internal NIM repeats in Eater sequences seem to have undergone concerted evolution. An analysis of the nimrod genes has been performed using phylogenetic and other methods and an evolutionary scenario of the origin and diversification of the Nimrod superfamily is proposed. Our study presents an intriguing example how the evolution of multigene families may contribute to the complexity of the innate immune response
- …