46 research outputs found
Superhelical Duplex Destabilization and the Recombination Position Effect
The susceptibility to recombination of a plasmid inserted into a chromosome
varies with its genomic position. This recombination position effect is known to
correlate with the average G+C content of the flanking sequences. Here we
propose that this effect could be mediated by changes in the susceptibility to
superhelical duplex destabilization that would occur. We use standard
nonparametric statistical tests, regression analysis and principal component
analysis to identify statistically significant differences in the
destabilization profiles calculated for the plasmid in different contexts, and
correlate the results with their measured recombination rates. We show that the
flanking sequences significantly affect the free energy of denaturation at
specific sites interior to the plasmid. These changes correlate well with
experimentally measured variations of the recombination rates within the
plasmid. This correlation of recombination rate with superhelical
destabilization properties of the inserted plasmid DNA is stronger than that
with average G+C content of the flanking sequences. This model suggests a
possible mechanism by which flanking sequence base composition, which is not
itself a context-dependent attribute, can affect recombination rates at
positions within the plasmid
Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress
BACKGROUND: In our previous studies, we found that the sites in prokaryotic genomes which are most susceptible to duplex destabilization under the negative superhelical stresses that occur in vivo are statistically highly significantly associated with intergenic regions that are known or inferred to contain promoters. In this report we investigate how this structural property, either alone or together with other structural and sequence attributes, may be used to search prokaryotic genomes for promoters. RESULTS: We show that the propensity for stress-induced DNA duplex destabilization (SIDD) is closely associated with specific promoter regions. The extent of destabilization in promoter-containing regions is found to be bimodally distributed. When compared with DNA curvature, deformability, thermostability or sequence motif scores within the -10 region, SIDD is found to be the most informative DNA property regarding promoter locations in the E. coli K12 genome. SIDD properties alone perform better at detecting promoter regions than other programs trained on this genome. Because this approach has a very low false positive rate, it can be used to predict with high confidence the subset of promoters that are strongly destabilized. When SIDD properties are combined with -10 motif scores in a linear classification function, they predict promoter regions with better than 80% accuracy. When these methods were tested with promoter and non-promoter sequences from Bacillus subtilis, they achieved similar or higher accuracies. We also present a strictly SIDD-based predictor for annotating promoter sequences in complete microbial genomes. CONCLUSION: In this report we show that the propensity to undergo stress-induced duplex destabilization (SIDD) is a distinctive structural attribute of many prokaryotic promoter sequences. We have developed methods to identify promoter sequences in prokaryotic genomes that use SIDD either as a sole predictor or in combination with other DNA structural and sequence properties. Although these methods cannot predict all the promoter-containing regions in a genome, they do find large sets of potential regions that have high probabilities of being true positives. This approach could be especially valuable for annotating those genomes about which there is limited experimental data
Coupling models of cattle and farms with models of badgers for predicting the dynamics of bovine tuberculosis (TB)
Bovine TB is a major problem for the agricultural industry in several
countries. TB can be contracted and spread by species other than cattle and
this can cause a problem for disease control. In the UK and Ireland, badgers
are a recognised reservoir of infection and there has been substantial
discussion about potential control strategies. We present a coupling of
individual based models of bovine TB in badgers and cattle, which aims to
capture the key details of the natural history of the disease and of both
species at approximately county scale. The model is spatially explicit it
follows a very large number of cattle and badgers on a different grid size for
each species and includes also winter housing. We show that the model can
replicate the reported dynamics of both cattle and badger populations as well
as the increasing prevalence of the disease in cattle. Parameter space used as
input in simulations was swept out using Latin hypercube sampling and
sensitivity analysis to model outputs was conducted using mixed effect models.
By exploring a large and computationally intensive parameter space we show that
of the available control strategies it is the frequency of TB testing and
whether or not winter housing is practised that have the most significant
effects on the number of infected cattle, with the effect of winter housing
becoming stronger as farm size increases. Whether badgers were culled or not
explained about 5%, while the accuracy of the test employed to detect infected
cattle explained less than 3% of the variance in the number of infected cattle
The distribution of inverted repeat sequences in the Saccharomyces cerevisiae genome
Although a variety of possible functions have been proposed for inverted repeat sequences (IRs), it is not known which of them might occur in vivo. We investigate this question by assessing the distributions and properties of IRs in the Saccharomyces cerevisiae (SC) genome. Using the IRFinder algorithm we detect 100,514 IRs having copy length greater than 6 bp and spacer length less than 77 bp. To assess statistical significance we also determine the IR distributions in two types of randomization of the S. cerevisiae genome. We find that the S. cerevisiae genome is significantly enriched in IRs relative to random. The S. cerevisiae IRs are significantly longer and contain fewer imperfections than those from the randomized genomes, suggesting that processes to lengthen and/or correct errors in IRs may be operative in vivo. The S. cerevisiae IRs are highly clustered in intergenic regions, while their occurrence in coding sequences is consistent with random. Clustering is stronger in the 3′ flanks of genes than in their 5′ flanks. However, the S. cerevisiae genome is not enriched in those IRs that would extrude cruciforms, suggesting that this is not a common event. Various explanations for these results are considered