Search CORE

10,688 research outputs found

Periodic correlation structures in bacterial and archaeal complete genomes

Author: Wu Zuo-Bing
Publication venue
Publication date: 28/02/2013
Field of study

The periodic transference of nucleotide strings in bacterial and archaeal complete genomes is investigated by using the metric representation and the recurrence plot method. The generated periodic correlation structures exhibit four kinds of fundamental transferring characteristics: a single increasing period, several increasing periods, an increasing quasi-period and almost noincreasing period. The mechanism of the periodic transference is further analyzed by determining all long periodic nucleotide strings in the bacterial and archaeal complete genomes and is explained as follows: both the repetition of basic periodic nucleotide strings and the transference of non-periodic nucleotide strings would form the periodic correlation structures with approximately the same increasing periods.Comment: 23 pages, 6 figures, 2 table

arXiv.org e-Print Archive

Institute Of Mechanics,Chinese Academy of Sciences

DNA ANALYSIS USING GRAMMATICAL INFERENCE

Author: Cook Cory
Publication venue: SJSU ScholarWorks
Publication date: 14/06/2016
Field of study

An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

SJSU ScholarWorks

Genetic Stratigraphy of Key Demographic Events in Arabia

Author: Alshamali Farida
Cavadas Bruno
Chaubey Gyaneshwer
Fajkošová Zuzana
Fernandes Verónica
Machado Alison
Pereira Joana B.
Pereira Luísa
Richards Martin B.
Rito Teresa
Soares Pedro
Triska Petr
Černý Viktor
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

The issue of admixture in human populations is normally addressed by genome-wide (GW) studies, and several approaches have been developed to date admixture events [1,2,3,4,5]. Admixed populations bear chromosomes with segments of DNA from all contributing source groups, the size of which decreases over successive generations until recombination renders them undetectably short. Several algorithms attempt to date admixture events by inferring the size of the nuclear ancestry segments, and these can work well when dating recent episodes in human history, such as the sub-Saharan African input into the New World [6], but they fail to detect several known episodes that took place at earlier times, such as the African input into Iberia [1] and genetic exchanges across the Red Sea [7]. Simulations with the suite of methods available at the ADMIXTOOLS package indicated that these methods could detect admixture events as early as 500 generation ago, but real data did not allow the tracing of such old events [8]. A recent improved algorithm, called GLOBETROTTER, has been used to tackle the detection of the co-occurrence of several mixture events by decomposing each chromosome into a series of haplotypic chunks and then analysing each chunk independently [3], but the problem of detecting ancient events remains. Its application to the systematic screening of worldwide admixture events was able to reveal around 100 events, but all occurring over only the past 4,000 years [3

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Repositório Aberto da Universidade do Porto

University of Huddersfield Repository

FigShare

Huddersfield Research Portal

Ancient properties of spider silks revealed by the complete gene sequence of the prey-wrapping silk protein (AcSp1).

Author: Ayoub Nadia A
Garb Jessica E
Hayashi Cheryl Y
Kuelbs Amanda
Publication venue: eScholarship, University of California
Publication date: 15/11/2012
Field of study

Spider silk fibers have impressive mechanical properties and are primarily composed of highly repetitive structural proteins (termed spidroins) encoded by a single gene family. Most characterized spidroin genes are incompletely known because of their extreme size (typically >9 kb) and repetitiveness, limiting understanding of the evolutionary processes that gave rise to their unusual gene architectures. The only complete spidroin genes characterized thus far form the dragline in the Western black widow, Latrodectus hesperus. Here, we describe the first complete gene sequence encoding the aciniform spidroin AcSp1, the primary component of spider prey-wrapping fibers. L. hesperus AcSp1 contains a single enormous (∼19 kb) exon. The AcSp1 repeat sequence is exceptionally conserved between two widow species (∼94% identity) and between widows and distantly related orb-weavers (∼30% identity), consistent with a history of strong purifying selection on its amino acid sequence. Furthermore, the 16 repeats (each 371-375 amino acids long) found in black widow AcSp1 are, on average, >99% identical at the nucleotide level. A combination of stabilizing selection on amino acid sequence, selection on silent sites, and intragenic recombination likely explains the extreme homogenization of AcSp1 repeats. In addition, phylogenetic analyses of spidroin paralogs support a gene duplication event occurring concomitantly with specialization of the aciniform glands and the tubuliform glands, which synthesize egg-case silk. With repeats that are dramatically different in length and amino acid composition from dragline spidroins, our L. hesperus AcSp1 expands the knowledge base for developing silk-based biomimetic technologies

PubMed Central

eScholarship - University of California

Substantial regional variation in substitution rates in the human genome: importance of GC content, gene density and telomere-specific effects

Author: Arndt Peter F
Hwa Terence
Petrov Dmitri A
Publication venue
Publication date: 01/01/2005
Field of study

This study presents the first global, 1 Mbp level analysis of patterns of nucleotide substitutions along the human lineage. The study is based on the analysis of a large amount of repetitive elements deposited into the human genome since the mammalian radiation, yielding a number of results that would have been difficult to obtain using the more conventional comparative method of analysis. This analysis revealed substantial and consistent variability of rates of substitution, with the variability ranging up to 2-fold among different regions. The rates of substitutions of C or G nucleotides with A or T nucleotides vary much more sharply than the reverse rates suggesting that much of that variation is due to differences in mutation rates rather than in the probabilities of fixation of C/G vs. A/T nucleotides across the genome. For all types of substitution we observe substantially more hotspots than coldspots, with hotspots showing substantial clustering over tens of Mbp's. Our analysis revealed that GC-content of surrounding sequences is the best predictor of the rates of substitution. The pattern of substitution appears very different near telomeres compared to the rest of the genome and cannot be explained by the genome-wide correlations of the substitution rates with GC content or exon density. The telomere pattern of substitution is consistent with natural selection or biased gene conversion acting to increase the GC-content of the sequences that are within 10-15 Mbp away from the telomere.Comment: 35 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

PubMed Central

FigShare

A Window on the Genetics of Human Speech: The FOXP2 Gene

Author: Barabás Katalin
Mink Mátyás
Solymosi Mária Ágnes
Szűcs Edit
Publication venue: Association for the Study of Language in Prehistory
Publication date: 01/01/2007
Field of study

The development of human speech seems to be a species-specific and genetically determined capacity and is considered an extremely important step in the rise of modern humans, human culture and civilisation. The multidisciplinary efforts of psychiatrists, linguists and human geneticists led to the identification of genetic elements in cohorts of patients, performing speech and language disorders. A form of special language impairment (SLI) has been identified in the KE family in Britain, as a dominant, autosomal trait, affecting the family members in three generations. Molecular genetic studies revealed a mutation in the FOXP2 gene as possible basis of SLI in these patients. The unique, human variant of FOXP2 is shared with Neandertals, indicating a common, ancestral population 3-400,000 years ago. Imprecise imitation of the tutor’s song occurs in young canaries with lowered FoxP2 expression

Repository of the Academy's Library