293 research outputs found

    Precision design of stable genetic circuits carried in highly‐insulated E. coli genomic landing pads

    Get PDF
    Abstract Genetic circuits have many applications, from guiding living therapeutics to ordering process in a bioreactor, but to be useful they have to be genetically stable and not hinder the host. Encoding circuits in the genome reduces burden, but this decreases performance and can interfere with native transcription. We have designed genomic landing pads in Escherichia coli at high‐expression sites, flanked by ultrastrong double terminators. DNA payloads >8 kb are targeted to the landing pads using phage integrases. One landing pad is dedicated to carrying a sensor array, and two are used to carry genetic circuits. NOT/NOR gates based on repressors are optimized for the genome and characterized in the landing pads. These data are used, in conjunction with design automation software (Cello 2.0), to design circuits that perform quantitatively as predicted. These circuits require fourfold less RNA polymerase than when carried on a plasmid and are stable for weeks in a recA+ strain without selection. This approach enables the design of synthetic regulatory networks to guide cells in environments or for applications where plasmid use is infeasible

    Genome reannotation of Escherichia coli CFT073 with new insights into virulence

    Get PDF
    BACKGROUND: As one of human pathogens, the genome of Uropathogenic Escherichia coli strain CFT073 was sequenced and published in 2002, which was significant in pathogenetic bacterial genomics research. However, the current RefSeq annotation of this pathogen is now outdated to some degree, due to missing or misannotation of some essential genes associated with its virulence. We carried out a systematic reannotation by combining automated annotation tools with manual efforts to provide a comprehensive understanding of virulence for the CFT073 genome. RESULTS: The reannotation excluded 608 coding sequences from the RefSeq annotation. Meanwhile, a total of 299 coding sequences were newly added, about one third of them are found in genomic island (GI) regions while more than one fifth of them are located in virulence related regions pathogenicity islands (PAIs). Furthermore, there are totally 341 genes were relocated with their translational initiation sites (TISs), which resulted in a high quality of gene start annotation. In addition, 94 pseudogenes annotated in RefSeq were thoroughly inspected and updated. The number of miscellaneous genes (sRNAs) has been updated from 6 in RefSeq to 46 in the reannotation. Based on the adjustment in the reannotation, subsequent analysis were conducted by both general and case studies on new virulence factors or new virulence-associated genes that are crucial during the urinary tract infections (UTIs) process, including invasion, colonization, nutrition uptaking and population density control. Furthermore, miscellaneous RNAs collected in the reannotation are believed to contribute to the virulence of strain CFT073. The reannotation including the nucleotide data, the original RefSeq annotation, and all reannotated results is freely available via http://mech.ctb.pku.edu.cn/CFT073/. CONCLUSION: As a result, the reannotation presents a more comprehensive picture of mechanisms of uropathogenicity of UPEC strain CFT073. The new genes change the view of its uropathogenicity in many respects, particularly by new genes in GI regions and new virulence-associated factors. The reannotation thus functions as an important source by providing new information about genomic structure and organization, and gene function. Moreover, we expect that the detailed analysis will facilitate the studies for exploration of novel virulence mechanisms and help guide experimental design

    Data Compression Concepts and Algorithms and Their Applications to Bioinformatics

    Get PDF
    Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences

    Redundancy of the genetic code enables translational pausing

    Get PDF
    Abstract The codon redundancy (degeneracy) found in protein-coding regions of mRNA also prescribes Translational Pausing (TP). When coupled with the appropriate interpreters, multiple meanings and functions are programmed into the same sequence of configurable switch-settings. This additional layer of Ontological Prescriptive Information (PIo) purposely slows or speeds up the translation-decoding process within the ribosome. Variable translation rates help prescribe functional folding of the nascent protein. Redundancy of the codon to amino acid mapping, therefore, is anything but superfluous or degenerate. Redundancy programming allows for simultaneous dual prescriptions of TP and amino acid assignments without cross-talk. This allows both functions to be coincident and realizable. We will demonstrate that the TP schema is a bona fide rule-based code, conforming to logical code-like properties. Second, we will demonstrate that this TP code is programmed into the supposedly degenerate redundancy of the codon table. We will show that algorithmic processes play a dominant role in the realization of this multi-dimensional code. <br/

    Application of Bioinformatics to Protein Domain, Protein Network, and Whole Genome Studies.

    Get PDF
    Bioinformatics primarily focuses on the study of sequence data. Analyzing both nucleotide and protein sequence data provides valuable insight into their function, evolution, and importance in organism adaptation. For this dissertation, I have applied bioinformatics to the study sequence data on three levels of complexity: protein domain, protein network, and whole genome. In the protein domain study, I used sequence similarity searches to identify a novel FIST (F-box and intracellular signal transduction proteins) domain. The domain was found to exist in all three kingdoms of life, pointing to its functional importance. Due to its presence exclusively with transducer and output domains, it was deduced that FIST functions as an input/sensory domain involved in signal transduction. Further functional characterization revealed FIST\u27s proximity to amino acid metabolism and transport genes. This suggested that FIST functions as a small ligand sensor. In the protein network study, I examined the evolution of the chemotaxis system within the clade of Escherichia. Our study confirmed previous results demonstrating that many urinary pathogenic Escherichia coli have lost two of their five chemotaxis receptors. However, sequence analysis demonstrates that this loss occurred as an ancestral event and was not a result of adaptive evolution. The retention of the core of the system in the vast majority of Escherichia confirms that chemotaxis is important for survival in all of Escherichia\u27s habitats. However analysis of the loss and gain of chemotaxis receptors suggests that the array of compounds that Escherichia needs to sense often does not require all 5 canonical receptors. In the genome study, I used comparative genomic analysis to examine the evolutionary history of Azospirillum, agriculturally important plant growth-promoting bacteria. Taxonomic and genomic studies have revealed that Azospirillum are very distinct from their closest relatives in both habitat and genome structure. Comparative genomic analysis revealed that Azospirillum had undergone massive horizontal gene transfer. Among acquired genes were many of those implicated in survival in the rhizosphere and in plant growth-promotion. It is proposed that this bacteria\u27s unique genome plasticity and ability to uptake large amounts of foreign DNA allowed azospirilla to transition from an aquatic to terrestrial environment

    Identification of possible differences in coding and non coding fragments of DNA sequences by using the method of the Recurrence Quantification Analysis

    Get PDF
    Starting with the results of Li et al. in 1992 there is valuable interest in finding long range correlations in DNA sequences since it raises questions about the role of introns and intron-containing genes. In the present paper we studied two sequences that are the human T-cell receptor alpha/delta locus, Gen-Bank name HUMTCRADCV, a noncoding chromosomal fragment of M = 97630 bases (composed of less than 10% of coding regions), and the Escherichia Coli K12, Gen-Bank name ECO110K, a genomic fragment with M = 111401 bases consisting of mostly coding regions and containing more that 80% of coding regions. We attributed the value (+1) to the purines and the value (-1) to the pirimidines and to such reconstructed random walk we applied the method of the Recurrence Quantification Analysis(RQA) that was introduced by Zbilut and Webber in 1994. By using dimension D=1 and Embedded Dimensions D=3 and D=5, we obtain some indicative results. Also by a simple eye examination of the reconstructed maps, the differences between coding and non coding regions are evident and impressive and consist in the presence in noncoding regions of long patches of the same colour that are absent in the coding sequence. At first sight this suggests a simple explanation to the concept of „long-range‟ correlation. On the quantitative plane, we used the %Rec., the %Det., the Ratio, the Entropy, the %Lam., and the Lmax that, as explained in detail in the text, represent the basic variables of RQA. The significant result that we have here is that both Lmax and Laminarity exhibit very large values in HUMTCRADCV and actually different in values respect to ECO110K where such variables assume more modest values. Therefore we suggest that there is the observed difference between HUMTCRADCV and ECO110K. The claimed higher long-range correlations of introns respect to exons from many authors may be explained here in reasonof such found higher values of Lmax and of Laminarity in HUMTCRADCV respect to ECO110K

    Analyzing and Modeling Real-World Phenomena with Complex Networks: A Survey of Applications

    Get PDF
    The success of new scientific areas can be assessed by their potential for contributing to new theoretical approaches and in applications to real-world problems. Complex networks have fared extremely well in both of these aspects, with their sound theoretical basis developed over the years and with a variety of applications. In this survey, we analyze the applications of complex networks to real-world problems and data, with emphasis in representation, analysis and modeling, after an introduction to the main concepts and models. A diversity of phenomena are surveyed, which may be classified into no less than 22 areas, providing a clear indication of the impact of the field of complex networks.Comment: 103 pages, 3 figures and 7 tables. A working manuscript, suggestions are welcome

    Event extraction from biomedical texts using trimmed dependency graphs

    Get PDF
    This thesis explores the automatic extraction of information from biomedical publications. Such techniques are urgently needed because the biosciences are publishing continually increasing numbers of texts. The focus of this work is on events. Information about events is currently manually curated from the literature by biocurators. Biocuration, however, is time-consuming and costly so automatic methods are needed for information extraction from the literature. This thesis is dedicated to modeling, implementing and evaluating an advanced event extraction approach based on the analysis of syntactic dependency graphs. This work presents the event extraction approach proposed and its implementation, the JReX (Jena Relation eXtraction) system. This system was used by the University of Jena (JULIE Lab) team in the "BioNLP 2009 Shared Task on Event Extraction" competition and was ranked second among 24 competing teams. Thereafter JReX was the highest scorer on the worldwide shared U-Compare event extraction server, outperforming the competing systems from the challenge. This success was made possible, among other things, by extensive research on event extraction solutions carried out during this thesis, e.g., exploring the effects of syntactic and semantic processing procedures on solving the event extraction task. The evaluations executed on standard and community-wide accepted competition data were complemented by real-life evaluation of large-scale biomedical database reconstruction. This work showed that considerable parts of manually curated databases can be automatically re-created with the help of the event extraction approach developed. Successful re-creation was possible for parts of RegulonDB, the world's largest database for E. coli. In summary, the event extraction approach justified, developed and implemented in this thesis meets the needs of a large community of human curators and thus helps in the acquisition of new knowledge in the biosciences
    corecore