10,688 research outputs found

    Periodic correlation structures in bacterial and archaeal complete genomes

    Full text link
    The periodic transference of nucleotide strings in bacterial and archaeal complete genomes is investigated by using the metric representation and the recurrence plot method. The generated periodic correlation structures exhibit four kinds of fundamental transferring characteristics: a single increasing period, several increasing periods, an increasing quasi-period and almost noincreasing period. The mechanism of the periodic transference is further analyzed by determining all long periodic nucleotide strings in the bacterial and archaeal complete genomes and is explained as follows: both the repetition of basic periodic nucleotide strings and the transference of non-periodic nucleotide strings would form the periodic correlation structures with approximately the same increasing periods.Comment: 23 pages, 6 figures, 2 table

    DNA ANALYSIS USING GRAMMATICAL INFERENCE

    Get PDF
    An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

    Genetic Stratigraphy of Key Demographic Events in Arabia

    Get PDF
    The issue of admixture in human populations is normally addressed by genome-wide (GW) studies, and several approaches have been developed to date admixture events [1,2,3,4,5]. Admixed populations bear chromosomes with segments of DNA from all contributing source groups, the size of which decreases over successive generations until recombination renders them undetectably short. Several algorithms attempt to date admixture events by inferring the size of the nuclear ancestry segments, and these can work well when dating recent episodes in human history, such as the sub-Saharan African input into the New World [6], but they fail to detect several known episodes that took place at earlier times, such as the African input into Iberia [1] and genetic exchanges across the Red Sea [7]. Simulations with the suite of methods available at the ADMIXTOOLS package indicated that these methods could detect admixture events as early as 500 generation ago, but real data did not allow the tracing of such old events [8]. A recent improved algorithm, called GLOBETROTTER, has been used to tackle the detection of the co-occurrence of several mixture events by decomposing each chromosome into a series of haplotypic chunks and then analysing each chunk independently [3], but the problem of detecting ancient events remains. Its application to the systematic screening of worldwide admixture events was able to reveal around 100 events, but all occurring over only the past 4,000 years [3

    Ancient properties of spider silks revealed by the complete gene sequence of the prey-wrapping silk protein (AcSp1).

    Get PDF
    Spider silk fibers have impressive mechanical properties and are primarily composed of highly repetitive structural proteins (termed spidroins) encoded by a single gene family. Most characterized spidroin genes are incompletely known because of their extreme size (typically >9 kb) and repetitiveness, limiting understanding of the evolutionary processes that gave rise to their unusual gene architectures. The only complete spidroin genes characterized thus far form the dragline in the Western black widow, Latrodectus hesperus. Here, we describe the first complete gene sequence encoding the aciniform spidroin AcSp1, the primary component of spider prey-wrapping fibers. L. hesperus AcSp1 contains a single enormous (∼19 kb) exon. The AcSp1 repeat sequence is exceptionally conserved between two widow species (∼94% identity) and between widows and distantly related orb-weavers (∼30% identity), consistent with a history of strong purifying selection on its amino acid sequence. Furthermore, the 16 repeats (each 371-375 amino acids long) found in black widow AcSp1 are, on average, >99% identical at the nucleotide level. A combination of stabilizing selection on amino acid sequence, selection on silent sites, and intragenic recombination likely explains the extreme homogenization of AcSp1 repeats. In addition, phylogenetic analyses of spidroin paralogs support a gene duplication event occurring concomitantly with specialization of the aciniform glands and the tubuliform glands, which synthesize egg-case silk. With repeats that are dramatically different in length and amino acid composition from dragline spidroins, our L. hesperus AcSp1 expands the knowledge base for developing silk-based biomimetic technologies

    Substantial regional variation in substitution rates in the human genome: importance of GC content, gene density and telomere-specific effects

    Full text link
    This study presents the first global, 1 Mbp level analysis of patterns of nucleotide substitutions along the human lineage. The study is based on the analysis of a large amount of repetitive elements deposited into the human genome since the mammalian radiation, yielding a number of results that would have been difficult to obtain using the more conventional comparative method of analysis. This analysis revealed substantial and consistent variability of rates of substitution, with the variability ranging up to 2-fold among different regions. The rates of substitutions of C or G nucleotides with A or T nucleotides vary much more sharply than the reverse rates suggesting that much of that variation is due to differences in mutation rates rather than in the probabilities of fixation of C/G vs. A/T nucleotides across the genome. For all types of substitution we observe substantially more hotspots than coldspots, with hotspots showing substantial clustering over tens of Mbp's. Our analysis revealed that GC-content of surrounding sequences is the best predictor of the rates of substitution. The pattern of substitution appears very different near telomeres compared to the rest of the genome and cannot be explained by the genome-wide correlations of the substitution rates with GC content or exon density. The telomere pattern of substitution is consistent with natural selection or biased gene conversion acting to increase the GC-content of the sequences that are within 10-15 Mbp away from the telomere.Comment: 35 pages, 6 figure

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    A Window on the Genetics of Human Speech: The FOXP2 Gene

    Get PDF
    The development of human speech seems to be a species-specific and genetically determined capacity and is considered an extremely important step in the rise of modern humans, human culture and civilisation. The multidisciplinary efforts of psychiatrists, linguists and human geneticists led to the identification of genetic elements in cohorts of patients, performing speech and language disorders. A form of special language impairment (SLI) has been identified in the KE family in Britain, as a dominant, autosomal trait, affecting the family members in three generations. Molecular genetic studies revealed a mutation in the FOXP2 gene as possible basis of SLI in these patients. The unique, human variant of FOXP2 is shared with Neandertals, indicating a common, ancestral population 3-400,000 years ago. Imprecise imitation of the tutor’s song occurs in young canaries with lowered FoxP2 expression
    corecore