375 research outputs found

    Identifying Cis-Regulatory Sequences by Word Profile Similarity

    Get PDF
    Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples.We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila.Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz

    An integrated computational pipeline and database to support whole-genome sequence annotation

    Get PDF
    We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture

    Practical computational toolkits for dendrimers and dendrons structure design

    Get PDF
    Dendrimers and dendrons offer an excellent platform for developing novel drug delivery systems and medicines. The rational design and further development of these repetitively branched systems are restricted by difficulties in scalable synthesis and structural determination, which can be overcome by judicious use of molecular modelling and molecular simulations. A major difficulty to utilise in silico studies to design dendrimers lies in the laborious generation of their structures. Current modelling tools utilise automated assembly of simpler dendrimers or the inefficient manual assembly of monomer precursors to generate more complicated dendrimer structures. Herein we describe two novel graphical user interface (GUI) toolkits written in Python that provide an improved degree of automation for rapid assembly of dendrimers and generation of their 2D and 3D structures. Our first toolkit uses the RDkit library, SMILES nomenclature of monomers and SMARTS reaction nomenclature to generate SMILES and mol files of dendrimers without 3D coordinates. These files are used for simple graphical representations and storing their structures in databases. The second toolkit assembles complex topology dendrimers from monomers to construct 3D dendrimer structures to be used as starting points for simulation using existing and widely available software and force fields. Both tools were validated for ease-of-use to prototype dendrimer structure and the second toolkit was especially relevant for dendrimers of high complexity and size.Peer reviewe

    Computational Structural Analysis: Multiple Proteins Bound to DNA

    Get PDF
    BACKGROUND: With increasing numbers of crystal structures of proteinratioDNA and proteinratioproteinratioDNA complexes publically available, it is now possible to extract sufficient structural, physical-chemical and thermodynamic parameters to make general observations and predictions about their interactions. In particular, the properties of macromolecular assemblies of multiple proteins bound to DNA have not previously been investigated in detail. METHODOLOGY/PRINCIPAL FINDINGS: We have performed computational structural analyses on macromolecular assemblies of multiple proteins bound to DNA using a variety of different computational tools: PISA; PROMOTIF; X3DNA; ReadOut; DDNA and DCOMPLEX. Additionally, we have developed and employed an algorithm for approximate collision detection and overlapping volume estimation of two macromolecules. An implementation of this algorithm is available at http://promoterplot.fmi.ch/Collision1/. The results obtained are compared with structural, physical-chemical and thermodynamic parameters from proteinratioprotein and single proteinratioDNA complexes. Many of interface properties of multiple proteinratioDNA complexes were found to be very similar to those observed in binary proteinratioDNA and proteinratioprotein complexes. However, the conformational change of the DNA upon protein binding is significantly higher when multiple proteins bind to it than is observed when single proteins bind. The water mediated contacts are less important (found in less quantity) between the interfaces of components in ternary (proteinratioproteinratioDNA) complexes than in those of binary complexes (proteinratioprotein and proteinratioDNA).The thermodynamic stability of ternary complexes is also higher than in the binary interactions. Greater specificity and affinity of multiple proteins binding to DNA in comparison with binary protein-DNA interactions were observed. However, protein-protein binding affinities are stronger in complexes without the presence of DNA. CONCLUSIONS/SIGNIFICANCE: Our results indicate that the interface properties: interface area; number of interface residues/atoms and hydrogen bonds; and the distribution of interface residues, hydrogen bonds, van der Walls contacts and secondary structure motifs are independent of whether or not a protein is in a binary or ternary complex with DNA. However, changes in the shape of the DNA reduce the off-rate of the proteins which greatly enhances the stability and specificity of ternary complexes compared to binary ones

    Bianchi Type-II String Cosmological Models in Normal Gauge for Lyra's Manifold with Constant Deceleration Parameter

    Full text link
    The present study deals with a spatially homogeneous and anisotropic Bianchi-II cosmological models representing massive strings in normal gauge for Lyra's manifold by applying the variation law for generalized Hubble's parameter that yields a constant value of deceleration parameter. The variation law for Hubble's parameter generates two types of solutions for the average scale factor, one is of power-law type and other is of the exponential form. Using these two forms, Einstein's modified field equations are solved separately that correspond to expanding singular and non-singular models of the universe respectively. The energy-momentum tensor for such string as formulated by Letelier (1983) is used to construct massive string cosmological models for which we assume that the expansion (θ\theta) in the model is proportional to the component σ 11\sigma^{1}_{~1} of the shear tensor σij\sigma^{j}_{i}. This condition leads to A=(BC)mA = (BC)^{m}, where A, B and C are the metric coefficients and m is proportionality constant. Our models are in accelerating phase which is consistent to the recent observations. It has been found that the displacement vector β\beta behaves like cosmological term Λ\Lambda in the normal gauge treatment and the solutions are consistent with recent observations of SNe Ia. It has been found that massive strings dominate in the decelerating universe whereas strings dominate in the accelerating universe. Some physical and geometric behaviour of these models are also discussed.Comment: 24 pages, 10 figure

    Formation of regulatory modules by local sequence duplication

    Get PDF
    Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

    Absorbing and transferring risk: assessing the impact of a statewide high-risk-pregnancy telemedical program on VLBW maternal transports

    Get PDF
    BACKGROUND: Prior research has shown that resources have an impact on birth outcomes. In this paper we ask how combinations of telemedical and hospital-level resources impact transports of mothers expecting very low birth weight (VLBW) babies in Arkansas. METHODS: Using de-identified birth certificate data from the Arkansas Department of Health, data were gathered on transports of women carrying VLBW babies for two six-month periods: a period just before the start of ANGELS (12/02-05/03), a telemedical outreach program for high-risk pregnancies, and a period after the program had been running for six months (12/03-05/04). For each maternal transport, the following information was recorded: maternal race-ethnicity, maternal age, and the birth weight of the infant. Logistic regression was used to assess the relationship between the predictors (telemedicine, hospital level, maternal characteristics) and the probability of a transport. RESULTS: Having a telemedical site available increases the probability of a mother carrying a VLBW baby being transported to a level III facility either before or during birth. Having at least a level II nursery also increases the chance of a maternal transport. Where both level II nurseries and telemedical access are available, the odds of VLBW maternal transports are only modestly increased in comparison to the case where neither is present. At the individual level, Hispanic mothers were less likely to be transported than other mothers, and teenaged mothers were more likely to be transported than those 18 and over. A mother's being Black or being over 35 did not have an impact on the odds of being transported to a level III facility. CONCLUSION: Combinations of resources have an impact on physician decisions regarding VLBW transports and are interpretable in terms of the capacity to diagnose and absorb risk. We suggest a collegial review of transport patterns and birth outcomes from areas with different levels of resources as a vehicle for moving the entire system of care forward over time. With such an evidence-based review in place, the collegial relations among level III specialists and obstetricians from around the state can, over time, develop workable protocols for when and how level III facilities should be involved

    Whole genome sequencing reveals high clonal diversity of Escherichia coli isolated from patients in a tertiary care hospital in Moshi, Tanzania

    Get PDF
    Abstract Background Limited information regarding the clonality of circulating E. coli strains in tertiary care hospitals in low and middle-income countries is available. The purpose of this study was to determine the serotypes, antimicrobial resistance and virulence genes. Further, we carried out a phylogenetic tree reconstruction to determine relatedness of E. coli isolated from patients in a tertiary care hospital in Tanzania. Methods E. coli isolates from inpatients admitted at Kilimanjaro Christian Medical Centre between August 2013 and August 2015 were fully genome-sequenced at KCMC hospital. Sequence analysis was done for identification of resistance genes, Multi-Locus Sequence Typing, serotyping, and virulence genes. Phylogeny reconstruction using CSI Phylogeny was done to ascertain E. coli relatedness. Stata 13 (College Station, Texas 77,845 USA) was used to determine Cohen’s kappa coefficient of agreement between the phenotypically tested and whole genome sequence predicted antimicrobial resistance. Results Out of 38 E. coli isolates, 21 different sequence types (ST) were observed. Eight (21.1%) isolates belonged to ST131; of which 7 (87.5.%) were serotype O25:H4. Ten (18.4%) isolates belonged to ST10 clonal complex; of these, four (40.0%) were ST617 with serotype O89:H10. Twenty-eight (73.7%) isolates carried genes encoding beta-lactam resistance enzymes. On average, agreement across all drugs tested was 83.9%. Trimethoprim/sulphamethoxazole (co-trimoxazole) showed moderate agreement: 45.8%, kappa =15% and p = 0.08. Amoxicillin-clavulanate showed strongest agreement: 87.5%, kappa = 74% and p = 0.0001. Twenty-two (57.9%) isolates carried virulence factors for host cells adherence and 25 (65.7%) for factors that promote E. coli immune evasion by increasing survival in serum. The phylogeny analysis showed that ST131 clustering close together whereas ST10 clonal complex had a very clear segregation of the ST617 and a mix of the rest STs. Conclusion There is a high diversity of E. coli isolated from patients admitted to a tertiary care hospital in Tanzania. This underscores the necessity to routinely screen all bacterial isolates of clinical importance in tertiary health care facilities. WGS use for laboratory-based surveillance can be an effective early warning system for emerging pathogens and resistance mechanisms in LMICs

    Statistical significance of cis-regulatory modules

    Get PDF
    BACKGROUND: It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning. RESULTS: We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites. CONCLUSION: The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM) and MODSTORM software
    • …
    corecore